Redirect llama.cpp logs into tracing #637

vlovich · 2025-02-05T04:21:02Z

After this change, if you don't pass --verbose, you'll see that simple.exe doesn't print anything from llama.cpp If you do, the logs are formatted through the tracing module. This resolves issues #628

     Running `target\debug\simple.exe --verbose --prompt Hello hf-model hugging-quants/Llama-3.2-3B-Instruct-Q8_0-GGUF llama-3.2-3b-instruct-q8_0.gguf`
2025-02-05T04:19:14.536931Z  INFO hf_hub: Token file not found "C:\\Users\\vlovich\\.cache\\huggingface\\token"
2025-02-05T04:19:14.542020Z DEBUG load_from_file: llama-cpp-2: registered backend CPU (1 devices) module="ggml::register_backend"
2025-02-05T04:19:14.542195Z DEBUG load_from_file: llama-cpp-2: registered device CPU (AMD Ryzen 9 7845HX with Radeon Graphics        ) module="ggml::register_device"
2025-02-05T04:19:14.631339Z  INFO load_from_file: llama-cpp-2: loaded meta data with 30 key-value pairs and 255 tensors from C:\Users\vlovich\.cache\huggingface\hub\models--hugging-quants--Llama-3.2-3B-Instruct-Q8_0-GGUF\snapshots\7ef7efff7d2c14e5d6161a0c7006e1f2fea6ec79\llama-3.2-3b-instruct-q8_0.gguf (version GGUF V3 (latest)) module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.631581Z  INFO load_from_file: llama-cpp-2: Dumping metadata keys/values. Note: KV overrides do not apply in this output. module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.631727Z  INFO load_from_file: llama-cpp-2: - kv   0:                       general.architecture str              = llama module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.631842Z  INFO load_from_file: llama-cpp-2: - kv   1:                               general.type str              = model module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.631923Z  INFO load_from_file: llama-cpp-2: - kv   2:                               general.name str              = Llama 3.2 3B Instruct module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632005Z  INFO load_from_file: llama-cpp-2: - kv   3:                           general.finetune str              = Instruct module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632052Z  INFO load_from_file: llama-cpp-2: - kv   4:                           general.basename str              = Llama-3.2 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632097Z  INFO load_from_file: llama-cpp-2: - kv   5:                         general.size_label str              = 3B module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632171Z  INFO load_from_file: llama-cpp-2: - kv   6:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam... module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632232Z  INFO load_from_file: llama-cpp-2: - kv   7:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ... module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632282Z  INFO load_from_file: llama-cpp-2: - kv   8:                          llama.block_count u32              = 28 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632331Z  INFO load_from_file: llama-cpp-2: - kv   9:                       llama.context_length u32              = 131072 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632365Z  INFO load_from_file: llama-cpp-2: - kv  10:                     llama.embedding_length u32              = 3072 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632399Z  INFO load_from_file: llama-cpp-2: - kv  11:                  llama.feed_forward_length u32              = 8192 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632433Z  INFO load_from_file: llama-cpp-2: - kv  12:                 llama.attention.head_count u32              = 24 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632503Z  INFO load_from_file: llama-cpp-2: - kv  13:              llama.attention.head_count_kv u32              = 8 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632582Z  INFO load_from_file: llama-cpp-2: - kv  14:                       llama.rope.freq_base f32              = 500000.000000 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632628Z  INFO load_from_file: llama-cpp-2: - kv  15:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632666Z  INFO load_from_file: llama-cpp-2: - kv  16:                 llama.attention.key_length u32              = 128 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632702Z  INFO load_from_file: llama-cpp-2: - kv  17:               llama.attention.value_length u32              = 128 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632735Z  INFO load_from_file: llama-cpp-2: - kv  18:                          general.file_type u32              = 7 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632774Z  INFO load_from_file: llama-cpp-2: - kv  19:                           llama.vocab_size u32              = 128256 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632816Z  INFO load_from_file: llama-cpp-2: - kv  20:                 llama.rope.dimension_count u32              = 128 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632856Z  INFO load_from_file: llama-cpp-2: - kv  21:                       tokenizer.ggml.model str              = gpt2 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632894Z  INFO load_from_file: llama-cpp-2: - kv  22:                         tokenizer.ggml.pre str              = llama-bpe module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.695145Z  INFO load_from_file: llama-cpp-2: - kv  23:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ... module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.710077Z  INFO load_from_file: llama-cpp-2: - kv  24:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.846818Z  INFO load_from_file: llama-cpp-2: - kv  25:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.847012Z  INFO load_from_file: llama-cpp-2: - kv  26:                tokenizer.ggml.bos_token_id u32              = 128000 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.847146Z  INFO load_from_file: llama-cpp-2: - kv  27:                tokenizer.ggml.eos_token_id u32              = 128009 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.847240Z  INFO load_from_file: llama-cpp-2: - kv  28:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ... module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.847355Z  INFO load_from_file: llama-cpp-2: - kv  29:               general.quantization_version u32              = 2 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.847446Z  INFO load_from_file: llama-cpp-2: - type  f32:   58 tensors module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.847506Z  INFO load_from_file: llama-cpp-2: - type q8_0:  197 tensors module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.847572Z  INFO load_from_file: llama-cpp-2: file format = GGUF V3 (latest) module="llama.cpp::print_info"
2025-02-05T04:19:14.847630Z  INFO load_from_file: llama-cpp-2: file type   = Q8_0 module="llama.cpp::print_info"
2025-02-05T04:19:14.847698Z  INFO load_from_file: llama-cpp-2: file size   = 3.18 GiB (8.50 BPW)  module="llama.cpp::print_info"
2025-02-05T04:19:15.259516Z DEBUG load_from_file: llama-cpp-2: initializing tokenizer for type 2 module="llama.cpp::init_tokenizer"
2025-02-05T04:19:15.311221Z DEBUG load_from_file: llama-cpp-2: control token: 128098 '<|reserved_special_token_90|>' is not marked as EOG module="llama.cpp::load"
2025-02-05T04:19:15.311473Z DEBUG load_from_file: llama-cpp-2: control token: 128191 '<|reserved_special_token_183|>' is not marked as EOG module="llama.cpp::load"
2025-02-05T04:19:15.311779Z DEBUG load_from_file: llama-cpp-2: control token: 128130 '<|reserved_special_token_122|>' is not marked as EOG module="llama.cpp::load"
2025-02-05T04:19:15.312237Z DEBUG load_from_file: llama-cpp-2: control token: 128119 '<|reserved_special_token_111|>' is not marked as EOG module="llama.cpp::load"
2025-02-05T04:19:15.312319Z DEBUG load_from_file: llama-cpp-2: control token: 128136 '<|reserved_special_token_128|>' is not marked as EOG module="llama.cpp::load"

If the user requests different properties for how llama.cpp is built, trigger a rebuild.

Instead of cp -r / robocopy, build from the source directory. This mildly speeds up the build although probably not noticeable on NVME drives. The cargo-cmake crate will automatically place output in the out/ folder for us. Additionally, walk the source tree to tell cargo that a rebuild is necessary if anything changes from the source. This ensures that changes in the llama.cpp code trigger a rebuild which makes hacking on things a bit easier. Looks like this copying logic was copied from sherpa-onnx given the comments seem to be copy-pasted so remove those references.

MarcusDunn

Super thoughtfully done. Thanks for the PR.

The simple example now needs a --verbose argument to be passed to have the llama.cpp logs printed to the screen.

MarcusDunn · 2025-02-06T00:52:04Z

I attempted to publish this:

https://github.com/utilityai/llama-cpp-rs/actions/runs/13165562823/job/36744752959

Could you take a look @vlovich

vlovich · 2025-02-06T03:23:51Z

Hmmmm @MarcusDunn I don't see any errors in the build aside from the tarball failing. Maybe try rerunning? Not sure what the issue is.

MarcusDunn · 2025-02-06T23:47:13Z

Caused by:
Source directory was modified by build.rs during cargo publish. Build scripts should not modify anything outside of OUT_DIR.
Added: /home/runner/work/llama-cpp-rs/llama-cpp-rs/target/package/llama-cpp-sys-2-0.1.93/llama.cpp/common/build-info.cpp

Looking over your code, I don't see what could cause this, but considering the last release was fine (and this is the only new PR in the release) I think it must be from here. Any ideas?

MarcusDunn · 2025-02-06T23:48:36Z

could emitting cargo:rerun-if-changed on build-info.cpp cause it to be included in the tar (or the hash)

    for entry in walkdir::WalkDir::new(&llama_src).into_iter().filter_entry(|e| !is_hidden(e)) {
        let entry = entry.expect("Failed to obtain entry");
        let rebuild = entry.file_name().to_str().map(|f| f.starts_with("CMake")).unwrap_or_default() || rebuild_on_children_of.iter().any(|src_folder| entry.path().starts_with(src_folder));
        if rebuild {
            println!("cargo:rerun-if-changed={}", entry.path().display());
        }
    }

vlovich · 2025-02-06T23:49:16Z

Oh that must be something to do with me cleaning up cmake to run directly from the submodule as input without copying to the output but that's supposed to work fine.

Any tips on how I can repro locally?

vlovich · 2025-02-06T23:50:06Z

I don't think it's the rerun-if-changed

MarcusDunn · 2025-02-07T19:41:36Z

cargo publish --dry-run -p llama-cpp-sys-2 should reproduce (I think it will fail with the same error before credentials are required)

MarcusDunn · 2025-02-07T19:49:27Z

https://github.com/edgenai/llama_cpp-rs/blob/main/crates/llama_cpp_sys/include/build-info.h

They solve the problem like this - I'm not a huge fan of this solution, but also do not know enough to make a great alternative suggestion.

vlovich · 2025-02-10T00:26:37Z

Kk. Sorry I haven't fixed this yet. Planning on taking a look at it in a couple of hours. Worst case I'll undo my build cleanup but hopefully I can figure out how to make things work. Generating files within the src directory is bad form even in cmake builds

vlovich · 2025-02-10T01:35:57Z

Ok I think #639 should fix it.

vlovich added 2 commits February 4, 2025 15:23

Rebuild if llama.cpp build environment changes

2590bb6

If the user requests different properties for how llama.cpp is built, trigger a rebuild.

vlovich force-pushed the logging-intercept branch from 45d1b6a to eb8542e Compare February 5, 2025 04:25

MarcusDunn approved these changes Feb 5, 2025

View reviewed changes

Redirect llama.cpp logs into tracing module.

373f8c6

The simple example now needs a --verbose argument to be passed to have the llama.cpp logs printed to the screen.

vlovich force-pushed the logging-intercept branch from eb8542e to 373f8c6 Compare February 5, 2025 18:45

MarcusDunn merged commit 773d2c0 into utilityai:main Feb 5, 2025
2 of 5 checks passed

MarcusDunn mentioned this pull request Feb 5, 2025

Any ability to redirect / silence output? #628

Closed

vlovich deleted the logging-intercept branch February 6, 2025 03:22

vlovich mentioned this pull request Feb 10, 2025

Can't seem to turn off logs tazz4843/whisper-rs#196

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redirect llama.cpp logs into tracing #637

Redirect llama.cpp logs into tracing #637

vlovich commented Feb 5, 2025 •

edited

Loading

MarcusDunn left a comment

MarcusDunn commented Feb 6, 2025

vlovich commented Feb 6, 2025

MarcusDunn commented Feb 6, 2025

MarcusDunn commented Feb 6, 2025 •

edited

Loading

vlovich commented Feb 6, 2025

vlovich commented Feb 6, 2025

MarcusDunn commented Feb 7, 2025

MarcusDunn commented Feb 7, 2025

vlovich commented Feb 10, 2025

vlovich commented Feb 10, 2025

Redirect llama.cpp logs into tracing #637

Redirect llama.cpp logs into tracing #637

Conversation

vlovich commented Feb 5, 2025 • edited Loading

MarcusDunn left a comment

Choose a reason for hiding this comment

MarcusDunn commented Feb 6, 2025

vlovich commented Feb 6, 2025

MarcusDunn commented Feb 6, 2025

MarcusDunn commented Feb 6, 2025 • edited Loading

vlovich commented Feb 6, 2025

vlovich commented Feb 6, 2025

MarcusDunn commented Feb 7, 2025

MarcusDunn commented Feb 7, 2025

vlovich commented Feb 10, 2025

vlovich commented Feb 10, 2025

vlovich commented Feb 5, 2025 •

edited

Loading

MarcusDunn commented Feb 6, 2025 •

edited

Loading