Introduce llama-run #10291

ericcurtin · 2024-11-14T11:33:23Z

It's like simple-chat but it uses smart pointers to avoid manual
memory cleanups. Less memory leaks in the code now. Avoid printing
multiple dots. Split code into smaller functions. Uses no exception
handling.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

ericcurtin · 2024-11-14T14:51:58Z

Some of these builds seem hardcoded to c++11, when we use a feature from c++14.

Any reason we aren't using say c++17

Any reasonable platform should be up-to date with c++17 I think

ericcurtin · 2024-11-14T16:28:03Z

Converted to C++11 only

ericcurtin · 2024-11-15T17:41:19Z

@slaren @ggerganov PTAL, I'm hoping to add other features to this example such as, read prompt from '-p' arg and read prompt from stdin

slaren · 2024-11-15T17:50:30Z

It would be good to have a more elaborated chat example, but the goal of this example is to show in the simplest way possible how to use the llama.cpp API. I don't think that these changes achieve that, I think that users will have a harder time understanding the llama.cpp API with all the extra boilerplate that is being added here.

If you want to use this as the base of a new example that adds more features that would be great, but I think we should keep this example as simple as possible.

ericcurtin · 2024-11-15T18:00:08Z

Cool sounds good @slaren, mind if I call the new example ramalama-core ?

ericcurtin · 2024-11-16T14:31:41Z

We can do things like:

git diff | llama-ramalama-core -m some-model -ngl 99 -p "Write a git commit message for this change:"

with this example.

ggerganov · 2024-11-16T19:19:33Z

@ericcurtin Do you plan to develop this into a more advanced example/application in the future? With the existing functionality, I am not sure it has enough value to justify adding it to the project and maintaining it.

ericcurtin · 2024-11-16T19:55:13Z

@ggerganov I plan to continue adding more features.

ericcurtin · 2024-11-16T19:58:25Z

I would prefer a more neutral name, but I will defer to @ggerganov about that.

Btw, it would be good to have smart pointer types in llama.h (guarded by an #ifdef __cplusplus) similar to the ones in https://github.com/ggerganov/llama.cpp/blob/master/ggml/include/ggml-cpp.h

Btw, did you mean as part of this PR or in general?

ggerganov · 2024-11-16T19:59:52Z

No need to be in this PR, just in general. If the example will continue to be developed, then it would make sense to merge it now.

slaren · 2024-11-16T20:03:03Z

Any name based on the functionality of the example is fine. "ramalama-core" doesn't mean anything to me, or to most users I would expect. The smart pointers can be added in this PR or a separate one, it doesn't matter to me. I mentioned it because since you are already using them, it may be worth to spend the time to make proper ones using structs instead of function pointers (which are very annoying to use due to having to initialize it with the pointer at every time, and probably also less efficient since it stores an additional pointer in the unique_ptr).

ericcurtin · 2024-11-16T21:22:16Z

llama-inferencer? I really don't care too much about the name, just want to agree on one to unblock this PR.

@rhatdan @slp any suggestions on names? I plan on using this as the main program during "ramalama run" but I'm happy for anyone to use it or make changes to it to suit their needs. It's like a drastically simplified version of llama-cli, with one or two additional features, read from stdin, read from -p flag.

But it does seem stabler and less error prone than llama-cli also. And the verbose info is all cleaned up to only spit out errors. It was based on llama-simple-chat initially.

slaren · 2024-11-16T21:25:38Z

Maybe something like llama-chat. I mentioned before that I think it would be good to have a example focused on chat only, that does that very well, that in time could replace the current llama-cli as the main program of llama.cpp, which at this is point is basically unmaintainable and should be retired.

ericcurtin · 2024-11-16T21:28:21Z

SGTM

ericcurtin · 2024-11-17T00:51:09Z

It's also tempting to call this something like run and use a kinda RamaLama, LocalAI, Ollama type CLI interface to interact with models. Kinda like daemonless Ollama:

llama-run file://somedir/somefile.gguf

ericcurtin · 2024-11-17T01:12:57Z

We could even possibly add https://, http://, ollama://, hf:// as valid syntaxes to pull models since they all are just a http pull in the end

ericcurtin · 2024-11-17T01:33:49Z

That might be implemented as llama-pull that llama-run can fork/exec (or they share a common library)

It's like simple-chat but it uses smart pointers to avoid manual memory cleanups. Less memory leaks in the code now. Avoid printing multiple dots. Split code into smaller functions. Uses no exception handling. Signed-off-by: Eric Curtin <[email protected]>

ericcurtin · 2024-11-17T23:42:49Z

I will be afk for 3 weeks, so expect inactivity in this PR. I did the rename in case we want to merge as is and not leave this go stale.

Although the syntax will completely change to:

llama-run [file://]somedir/somefile.gguf [prompt] [flags]

file:// will be optional, but will set up the possibility of adding the pullers discussed above.

ericcurtin · 2024-11-18T18:13:02Z

This drives the compiler crazy FWIW:

diff --git a/include/llama.h b/include/llama.h
index 5e742642..c3285da3 100644
--- a/include/llama.h
+++ b/include/llama.h
@@ -537,6 +537,13 @@ extern "C" {
                          int32_t   il_start,
                          int32_t   il_end);

+#ifdef __cplusplus
+    // Smart pointers
+    typedef std::unique_ptr<llama_model, decltype(&llama_free_model)> llama_model_ptr;
+    typedef std::unique_ptr<llama_context, decltype(&llama_free)> llama_context_ptr;
+    typedef std::unique_ptr<llama_sampler, decltype(&llama_sampler_free)> llama_sampler;
+#endif
+
     //
     // KV cache
     //

seems like it only wants to build C code in that file or something.

github-actions bot added the examples label Nov 14, 2024

ericcurtin force-pushed the simple-chat-smart branch 11 times, most recently from b2a336e to 17f086b Compare November 14, 2024 14:00

ericcurtin force-pushed the simple-chat-smart branch 2 times, most recently from bf26504 to 0d016a4 Compare November 14, 2024 16:27

ericcurtin force-pushed the simple-chat-smart branch 3 times, most recently from 0af3f55 to 33eb456 Compare November 14, 2024 17:00

ericcurtin mentioned this pull request Nov 14, 2024

Switch to llama-simple-chat containers/ramalama#454

Merged

ericcurtin force-pushed the simple-chat-smart branch 4 times, most recently from d2711eb to ca45737 Compare November 15, 2024 12:06

ericcurtin mentioned this pull request Nov 15, 2024

Add .clang-format file #10308

Open

4 tasks

ericcurtin force-pushed the simple-chat-smart branch from ca45737 to 0fbc0ae Compare November 15, 2024 17:53

ericcurtin force-pushed the simple-chat-smart branch from 0fbc0ae to 83988df Compare November 15, 2024 18:07

ericcurtin force-pushed the simple-chat-smart branch 12 times, most recently from f547f18 to 6e87e58 Compare November 16, 2024 05:45

ericcurtin force-pushed the simple-chat-smart branch from 6e87e58 to 1d7f97c Compare November 17, 2024 21:17

Introduce llama-run

4defcf2

It's like simple-chat but it uses smart pointers to avoid manual memory cleanups. Less memory leaks in the code now. Avoid printing multiple dots. Split code into smaller functions. Uses no exception handling. Signed-off-by: Eric Curtin <[email protected]>

ericcurtin force-pushed the simple-chat-smart branch from 1d7f97c to 4defcf2 Compare November 17, 2024 23:40

ericcurtin changed the title ~~Introduce ramalama-core~~ Introduce llama-run Nov 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce llama-run #10291

Introduce llama-run #10291

ericcurtin commented Nov 14, 2024 •

edited

Loading

ericcurtin commented Nov 14, 2024

ericcurtin commented Nov 14, 2024

ericcurtin commented Nov 15, 2024

slaren commented Nov 15, 2024

ericcurtin commented Nov 15, 2024

ericcurtin commented Nov 16, 2024 •

edited

Loading

ggerganov commented Nov 16, 2024

ericcurtin commented Nov 16, 2024

ericcurtin commented Nov 16, 2024

ggerganov commented Nov 16, 2024

slaren commented Nov 16, 2024

ericcurtin commented Nov 16, 2024 •

edited

Loading

slaren commented Nov 16, 2024

ericcurtin commented Nov 16, 2024

ericcurtin commented Nov 17, 2024 •

edited

Loading

ericcurtin commented Nov 17, 2024 •

edited

Loading

ericcurtin commented Nov 17, 2024

ericcurtin commented Nov 17, 2024 •

edited

Loading

ericcurtin commented Nov 18, 2024

Introduce llama-run #10291

Are you sure you want to change the base?

Introduce llama-run #10291

Conversation

ericcurtin commented Nov 14, 2024 • edited Loading

ericcurtin commented Nov 14, 2024

ericcurtin commented Nov 14, 2024

ericcurtin commented Nov 15, 2024

slaren commented Nov 15, 2024

ericcurtin commented Nov 15, 2024

ericcurtin commented Nov 16, 2024 • edited Loading

ggerganov commented Nov 16, 2024

ericcurtin commented Nov 16, 2024

ericcurtin commented Nov 16, 2024

ggerganov commented Nov 16, 2024

slaren commented Nov 16, 2024

ericcurtin commented Nov 16, 2024 • edited Loading

slaren commented Nov 16, 2024

ericcurtin commented Nov 16, 2024

ericcurtin commented Nov 17, 2024 • edited Loading

ericcurtin commented Nov 17, 2024 • edited Loading

ericcurtin commented Nov 17, 2024

ericcurtin commented Nov 17, 2024 • edited Loading

ericcurtin commented Nov 18, 2024

ericcurtin commented Nov 14, 2024 •

edited

Loading

ericcurtin commented Nov 16, 2024 •

edited

Loading

ericcurtin commented Nov 16, 2024 •

edited

Loading

ericcurtin commented Nov 17, 2024 •

edited

Loading

ericcurtin commented Nov 17, 2024 •

edited

Loading

ericcurtin commented Nov 17, 2024 •

edited

Loading