Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redirect llama.cpp logs into tracing #637

Merged
merged 3 commits into from
Feb 5, 2025

Conversation

vlovich
Copy link
Contributor

@vlovich vlovich commented Feb 5, 2025

After this change, if you don't pass --verbose, you'll see that simple.exe doesn't print anything from llama.cpp If you do, the logs are formatted through the tracing module. This resolves issues #628

     Running `target\debug\simple.exe --verbose --prompt Hello hf-model hugging-quants/Llama-3.2-3B-Instruct-Q8_0-GGUF llama-3.2-3b-instruct-q8_0.gguf`
2025-02-05T04:19:14.536931Z  INFO hf_hub: Token file not found "C:\\Users\\vlovich\\.cache\\huggingface\\token"
2025-02-05T04:19:14.542020Z DEBUG load_from_file: llama-cpp-2: registered backend CPU (1 devices) module="ggml::register_backend"
2025-02-05T04:19:14.542195Z DEBUG load_from_file: llama-cpp-2: registered device CPU (AMD Ryzen 9 7845HX with Radeon Graphics        ) module="ggml::register_device"
2025-02-05T04:19:14.631339Z  INFO load_from_file: llama-cpp-2: loaded meta data with 30 key-value pairs and 255 tensors from C:\Users\vlovich\.cache\huggingface\hub\models--hugging-quants--Llama-3.2-3B-Instruct-Q8_0-GGUF\snapshots\7ef7efff7d2c14e5d6161a0c7006e1f2fea6ec79\llama-3.2-3b-instruct-q8_0.gguf (version GGUF V3 (latest)) module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.631581Z  INFO load_from_file: llama-cpp-2: Dumping metadata keys/values. Note: KV overrides do not apply in this output. module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.631727Z  INFO load_from_file: llama-cpp-2: - kv   0:                       general.architecture str              = llama module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.631842Z  INFO load_from_file: llama-cpp-2: - kv   1:                               general.type str              = model module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.631923Z  INFO load_from_file: llama-cpp-2: - kv   2:                               general.name str              = Llama 3.2 3B Instruct module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632005Z  INFO load_from_file: llama-cpp-2: - kv   3:                           general.finetune str              = Instruct module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632052Z  INFO load_from_file: llama-cpp-2: - kv   4:                           general.basename str              = Llama-3.2 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632097Z  INFO load_from_file: llama-cpp-2: - kv   5:                         general.size_label str              = 3B module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632171Z  INFO load_from_file: llama-cpp-2: - kv   6:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam... module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632232Z  INFO load_from_file: llama-cpp-2: - kv   7:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ... module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632282Z  INFO load_from_file: llama-cpp-2: - kv   8:                          llama.block_count u32              = 28 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632331Z  INFO load_from_file: llama-cpp-2: - kv   9:                       llama.context_length u32              = 131072 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632365Z  INFO load_from_file: llama-cpp-2: - kv  10:                     llama.embedding_length u32              = 3072 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632399Z  INFO load_from_file: llama-cpp-2: - kv  11:                  llama.feed_forward_length u32              = 8192 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632433Z  INFO load_from_file: llama-cpp-2: - kv  12:                 llama.attention.head_count u32              = 24 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632503Z  INFO load_from_file: llama-cpp-2: - kv  13:              llama.attention.head_count_kv u32              = 8 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632582Z  INFO load_from_file: llama-cpp-2: - kv  14:                       llama.rope.freq_base f32              = 500000.000000 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632628Z  INFO load_from_file: llama-cpp-2: - kv  15:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632666Z  INFO load_from_file: llama-cpp-2: - kv  16:                 llama.attention.key_length u32              = 128 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632702Z  INFO load_from_file: llama-cpp-2: - kv  17:               llama.attention.value_length u32              = 128 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632735Z  INFO load_from_file: llama-cpp-2: - kv  18:                          general.file_type u32              = 7 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632774Z  INFO load_from_file: llama-cpp-2: - kv  19:                           llama.vocab_size u32              = 128256 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632816Z  INFO load_from_file: llama-cpp-2: - kv  20:                 llama.rope.dimension_count u32              = 128 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632856Z  INFO load_from_file: llama-cpp-2: - kv  21:                       tokenizer.ggml.model str              = gpt2 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.632894Z  INFO load_from_file: llama-cpp-2: - kv  22:                         tokenizer.ggml.pre str              = llama-bpe module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.695145Z  INFO load_from_file: llama-cpp-2: - kv  23:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ... module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.710077Z  INFO load_from_file: llama-cpp-2: - kv  24:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.846818Z  INFO load_from_file: llama-cpp-2: - kv  25:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.847012Z  INFO load_from_file: llama-cpp-2: - kv  26:                tokenizer.ggml.bos_token_id u32              = 128000 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.847146Z  INFO load_from_file: llama-cpp-2: - kv  27:                tokenizer.ggml.eos_token_id u32              = 128009 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.847240Z  INFO load_from_file: llama-cpp-2: - kv  28:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ... module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.847355Z  INFO load_from_file: llama-cpp-2: - kv  29:               general.quantization_version u32              = 2 module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.847446Z  INFO load_from_file: llama-cpp-2: - type  f32:   58 tensors module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.847506Z  INFO load_from_file: llama-cpp-2: - type q8_0:  197 tensors module="llama.cpp::llama_model_loader"
2025-02-05T04:19:14.847572Z  INFO load_from_file: llama-cpp-2: file format = GGUF V3 (latest) module="llama.cpp::print_info"
2025-02-05T04:19:14.847630Z  INFO load_from_file: llama-cpp-2: file type   = Q8_0 module="llama.cpp::print_info"
2025-02-05T04:19:14.847698Z  INFO load_from_file: llama-cpp-2: file size   = 3.18 GiB (8.50 BPW)  module="llama.cpp::print_info"
2025-02-05T04:19:15.259516Z DEBUG load_from_file: llama-cpp-2: initializing tokenizer for type 2 module="llama.cpp::init_tokenizer"
2025-02-05T04:19:15.311221Z DEBUG load_from_file: llama-cpp-2: control token: 128098 '<|reserved_special_token_90|>' is not marked as EOG module="llama.cpp::load"
2025-02-05T04:19:15.311473Z DEBUG load_from_file: llama-cpp-2: control token: 128191 '<|reserved_special_token_183|>' is not marked as EOG module="llama.cpp::load"
2025-02-05T04:19:15.311779Z DEBUG load_from_file: llama-cpp-2: control token: 128130 '<|reserved_special_token_122|>' is not marked as EOG module="llama.cpp::load"
2025-02-05T04:19:15.312237Z DEBUG load_from_file: llama-cpp-2: control token: 128119 '<|reserved_special_token_111|>' is not marked as EOG module="llama.cpp::load"
2025-02-05T04:19:15.312319Z DEBUG load_from_file: llama-cpp-2: control token: 128136 '<|reserved_special_token_128|>' is not marked as EOG module="llama.cpp::load"

If the user requests different properties for how llama.cpp is built,
trigger a rebuild.
Instead of cp -r / robocopy, build from the source directory. This
mildly speeds up the build although probably not noticeable on NVME
drives. The cargo-cmake crate will automatically place output in the
out/ folder for us.

Additionally, walk the source tree to tell cargo that a rebuild is
necessary if anything changes from the source. This ensures that changes
in the llama.cpp code trigger a rebuild which makes hacking on things a
bit easier.

Looks like this copying logic was copied from sherpa-onnx given the
comments seem to be copy-pasted so remove those references.
Copy link
Contributor

@MarcusDunn MarcusDunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super thoughtfully done. Thanks for the PR.

The simple example now needs a --verbose argument to be passed to have
the llama.cpp logs printed to the screen.
@MarcusDunn MarcusDunn merged commit 773d2c0 into utilityai:main Feb 5, 2025
2 of 5 checks passed
@MarcusDunn
Copy link
Contributor

I attempted to publish this:

https://github.com/utilityai/llama-cpp-rs/actions/runs/13165562823/job/36744752959

Could you take a look @vlovich

@vlovich vlovich deleted the logging-intercept branch February 6, 2025 03:22
@vlovich
Copy link
Contributor Author

vlovich commented Feb 6, 2025

Hmmmm @MarcusDunn I don't see any errors in the build aside from the tarball failing. Maybe try rerunning? Not sure what the issue is.

@MarcusDunn
Copy link
Contributor

Caused by:
Source directory was modified by build.rs during cargo publish. Build scripts should not modify anything outside of OUT_DIR.
Added: /home/runner/work/llama-cpp-rs/llama-cpp-rs/target/package/llama-cpp-sys-2-0.1.93/llama.cpp/common/build-info.cpp

Looking over your code, I don't see what could cause this, but considering the last release was fine (and this is the only new PR in the release) I think it must be from here. Any ideas?

@MarcusDunn
Copy link
Contributor

MarcusDunn commented Feb 6, 2025

could emitting cargo:rerun-if-changed on build-info.cpp cause it to be included in the tar (or the hash)

    for entry in walkdir::WalkDir::new(&llama_src).into_iter().filter_entry(|e| !is_hidden(e)) {
        let entry = entry.expect("Failed to obtain entry");
        let rebuild = entry.file_name().to_str().map(|f| f.starts_with("CMake")).unwrap_or_default() || rebuild_on_children_of.iter().any(|src_folder| entry.path().starts_with(src_folder));
        if rebuild {
            println!("cargo:rerun-if-changed={}", entry.path().display());
        }
    }

@vlovich
Copy link
Contributor Author

vlovich commented Feb 6, 2025

Oh that must be something to do with me cleaning up cmake to run directly from the submodule as input without copying to the output but that's supposed to work fine.

Any tips on how I can repro locally?

@vlovich
Copy link
Contributor Author

vlovich commented Feb 6, 2025

I don't think it's the rerun-if-changed

@MarcusDunn
Copy link
Contributor

cargo publish --dry-run -p llama-cpp-sys-2 should reproduce (I think it will fail with the same error before credentials are required)

@MarcusDunn
Copy link
Contributor

https://github.com/edgenai/llama_cpp-rs/blob/main/crates/llama_cpp_sys/include/build-info.h

They solve the problem like this - I'm not a huge fan of this solution, but also do not know enough to make a great alternative suggestion.

@vlovich
Copy link
Contributor Author

vlovich commented Feb 10, 2025

Kk. Sorry I haven't fixed this yet. Planning on taking a look at it in a couple of hours. Worst case I'll undo my build cleanup but hopefully I can figure out how to make things work. Generating files within the src directory is bad form even in cmake builds

@vlovich
Copy link
Contributor Author

vlovich commented Feb 10, 2025

Ok I think #639 should fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants