thinking model stop only on thinking #686

vvaibhavv11 · 2025-02-25T21:35:56Z

when i load this DeepSeek-R1-Distill-Llama-8B-GGUF model its stop at only thinking not actual response
code

async fn response(state: State<'_, Mutex<ModelInfo>>, input: String) -> Result<(), ()> {
    let prompt = format!(
        "<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n",
        input
    );
    let app_state = state.lock().await;
    let state_model = app_state.model.as_ref().unwrap();
    let state_backend = app_state.backend.as_ref().unwrap();
    let state_ctx_params = app_state.ctx_params.as_ref().unwrap();
    let mut ctx = state_model
        .new_context(
            &state_backend,
            state_ctx_params
                .clone()
                .with_n_ctx(NonZero::new(4096))
                .with_n_threads(4),
        )
        .expect("unable to create the llama_context");

    let tokens_list = state_model
        .str_to_token(&prompt, AddBos::Always)
        .unwrap_or_else(|_| panic!("failed to tokenize {prompt}"));
    let mut batch = LlamaBatch::new(512, 1);
    // let n_len = 1024;

    let last_index = tokens_list.len() as i32 - 1;
    for (i, token) in (0_i32..).zip(tokens_list.into_iter()) {
        // llama_decode will output logits only for the last token of the prompt
        let is_last = i == last_index;
        batch.add(token, i, &[0], is_last).unwrap();
    }

    ctx.decode(&mut batch).expect("llama_decode() failed");

    let mut n_cur = batch.n_tokens();

    // The `Decoder`
    let mut decoder = encoding_rs::UTF_8.new_decoder();
    let mut sampler = LlamaSampler::greedy();

    loop {
        // sample the next token
        let token = sampler.sample(&ctx, batch.n_tokens() - 1);
        {

            sampler.accept(token);

            // is it an end of stream?

            let output_bytes = state_model
                .token_to_bytes(token, Special::Tokenize)
                .unwrap();
            // use `Decoder.decode_to_string()` to avoid the intermediate buffer
            let mut output_string = String::with_capacity(32);
            let _decode_result = decoder.decode_to_string(&output_bytes, &mut output_string, false);
            print!("{output_string}");

            batch.clear();
            batch.add(token, n_cur, &[0], true).unwrap();
        }

        if token == state_model.token_eos() {
            eprintln!();
            break;
        }

        n_cur += 1;

        ctx.decode(&mut batch).expect("failed to eval");
    }

    Ok(())
}

and the question

give the ts program that give the prime number upto 100

response is

Alright, so the user is asking for a TypeScript program that generates prime numbers up to 100. Hmm, okay, I need to figure out how to approach this. Let me break it down.

First, I should recall what prime numbers are. They are numbers greater than 1 that have no divisors other than 1 and themselves. So, the program needs to check each number from 2 up to 100 and determine if it's prime.

I think the Sieve of Eratosthenes would be a good algorithm to use here. It's an efficient way to find all primes up to a certain limit. Let me remember how it works. The basic idea is to create a list of boolean values representing numbers from 2 to 100. Initially, all are considered prime (true). Then, starting from the first prime number (2), we mark all its multiples as not prime. We repeat this process with the next unmarked prime number until we've processed all numbers up to the square root of 100, which is 10.

Wait, but in the Sieve, you only need to go up to the square root of the upper limit because any composite number beyond that would have been marked by a smaller prime. So, for 100, I only need to check up to 10.

Let me outline the steps:

1. Create an array of size 101 (since we include 100) initialized to true, where index 0 and 1 are false because 0 and 1 are not primes.
2. For each number starting from 2 up to the square root of 100 (which is 10):
   a. If the number is prime (true), mark all its multiples as false.
3. After processing, the array will have true values only for prime numbers.

But wait, in the Sieve, you start with 2 and mark multiples. So, for each prime number p, you start marking from p*p, but since we're only going up to 10, maybe it's simpler to just iterate from p to 100, stepping by p each time.

Let me think about how to implement this in TypeScript. I'll need to declare a variable, say 'isPrime', which is an array of booleans. Initialize it with all true values except index 0 and 1.

Then, loop from 2 to 10. For each i, if isPrime[i] is true, then loop from i*i to 100, stepping by i each time, and set those indices to false.

Wait, but in the Sieve, you start from i*i because any smaller multiples would have already been marked by smaller primes. But since we're only going up to 10, maybe it's more efficient to just start from i and go up by i each time. It might not make a big difference here, but for the sake of correctness, I should follow the standard Sieve method.

So, the code structure would be something like:

let isPrime = new Array(101).fill(true);
isPrime[0] = isPrime[1] = false;
for (let i = 2; i <= 10; i++) {
    if (isPrime[i]) {
        for (let j = i * i; j <= 100; j += i) {
            isPrime[j] = false;
        }
    }
}

Then, I can collect all the indices where isPrime is true, which will be the prime numbers up to 100.

But wait, the user asked for a program that gives the primes up to 100. So, the program should output these primes. How should I present them? Maybe as an array or by logging each prime.

I think the simplest way is to loop through the isPrime array and collect the indices that are true, then display them.

Putting it all together, the program would look like this:

- Initialize the isPrime array.
- Apply the Sieve of Eratosthenes.
- Collect and display the primes.

I should also consider edge cases, like when the user runs the program, it should correctly identify all primes up to 100 without any errors.

Let me test this logic mentally. For example, 2 is prime, so mark multiples of 2 starting from 4. Then 3 is prime, mark multiples starting from 9, and so on. This should correctly mark all non-primes up to 100.

I think this approach should work. Now, I'll write the code accordingly.

can some one help this all looks like the thinking only not the main content how i write the code to get the content also.

The text was updated successfully, but these errors were encountered:

MarcusDunn · 2025-02-26T00:28:21Z

you break out of the main loop here?

        if token == state_model.token_eos() {
            eprintln!();
            break;
        }

vvaibhavv11 · 2025-02-26T18:46:13Z

but do model do token_eos after the thinking why not after real response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

thinking model stop only on thinking #686

thinking model stop only on thinking #686

vvaibhavv11 commented Feb 25, 2025

MarcusDunn commented Feb 26, 2025

vvaibhavv11 commented Feb 26, 2025

thinking model stop only on thinking #686

thinking model stop only on thinking #686

Comments

vvaibhavv11 commented Feb 25, 2025

MarcusDunn commented Feb 26, 2025

vvaibhavv11 commented Feb 26, 2025