Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce llama-run #10291

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ericcurtin
Copy link
Contributor

@ericcurtin ericcurtin commented Nov 14, 2024

It's like simple-chat but it uses smart pointers to avoid manual
memory cleanups. Less memory leaks in the code now. Avoid printing
multiple dots. Split code into smaller functions. Uses no exception
handling.

@ericcurtin ericcurtin force-pushed the simple-chat-smart branch 11 times, most recently from b2a336e to 17f086b Compare November 14, 2024 14:00
@ericcurtin
Copy link
Contributor Author

Some of these builds seem hardcoded to c++11, when we use a feature from c++14.

Any reason we aren't using say c++17

Any reasonable platform should be up-to date with c++17 I think

@ericcurtin ericcurtin force-pushed the simple-chat-smart branch 2 times, most recently from bf26504 to 0d016a4 Compare November 14, 2024 16:27
@ericcurtin
Copy link
Contributor Author

Converted to C++11 only

@ericcurtin ericcurtin force-pushed the simple-chat-smart branch 3 times, most recently from 0af3f55 to 33eb456 Compare November 14, 2024 17:00
@ericcurtin ericcurtin force-pushed the simple-chat-smart branch 4 times, most recently from d2711eb to ca45737 Compare November 15, 2024 12:06
@ericcurtin ericcurtin mentioned this pull request Nov 15, 2024
4 tasks
@ericcurtin
Copy link
Contributor Author

@slaren @ggerganov PTAL, I'm hoping to add other features to this example such as, read prompt from '-p' arg and read prompt from stdin

@slaren
Copy link
Collaborator

slaren commented Nov 15, 2024

It would be good to have a more elaborated chat example, but the goal of this example is to show in the simplest way possible how to use the llama.cpp API. I don't think that these changes achieve that, I think that users will have a harder time understanding the llama.cpp API with all the extra boilerplate that is being added here.

If you want to use this as the base of a new example that adds more features that would be great, but I think we should keep this example as simple as possible.

@ericcurtin
Copy link
Contributor Author

Cool sounds good @slaren, mind if I call the new example ramalama-core ?

@ericcurtin ericcurtin force-pushed the simple-chat-smart branch 12 times, most recently from f547f18 to 6e87e58 Compare November 16, 2024 05:45
@ericcurtin
Copy link
Contributor Author

ericcurtin commented Nov 16, 2024

We can do things like:

git diff | llama-ramalama-core -m some-model -ngl 99 -p "Write a git commit message for this change:"

with this example.

@ggerganov
Copy link
Owner

@ericcurtin Do you plan to develop this into a more advanced example/application in the future? With the existing functionality, I am not sure it has enough value to justify adding it to the project and maintaining it.

@ericcurtin
Copy link
Contributor Author

@ggerganov I plan to continue adding more features.

@ericcurtin
Copy link
Contributor Author

I would prefer a more neutral name, but I will defer to @ggerganov about that.

Btw, it would be good to have smart pointer types in llama.h (guarded by an #ifdef __cplusplus) similar to the ones in https://github.com/ggerganov/llama.cpp/blob/master/ggml/include/ggml-cpp.h

Btw, did you mean as part of this PR or in general?

@ggerganov
Copy link
Owner

No need to be in this PR, just in general. If the example will continue to be developed, then it would make sense to merge it now.

@slaren
Copy link
Collaborator

slaren commented Nov 16, 2024

Any name based on the functionality of the example is fine. "ramalama-core" doesn't mean anything to me, or to most users I would expect. The smart pointers can be added in this PR or a separate one, it doesn't matter to me. I mentioned it because since you are already using them, it may be worth to spend the time to make proper ones using structs instead of function pointers (which are very annoying to use due to having to initialize it with the pointer at every time, and probably also less efficient since it stores an additional pointer in the unique_ptr).

@ericcurtin
Copy link
Contributor Author

ericcurtin commented Nov 16, 2024

llama-inferencer? I really don't care too much about the name, just want to agree on one to unblock this PR.

@rhatdan @slp any suggestions on names? I plan on using this as the main program during "ramalama run" but I'm happy for anyone to use it or make changes to it to suit their needs. It's like a drastically simplified version of llama-cli, with one or two additional features, read from stdin, read from -p flag.

But it does seem stabler and less error prone than llama-cli also. And the verbose info is all cleaned up to only spit out errors. It was based on llama-simple-chat initially.

@slaren
Copy link
Collaborator

slaren commented Nov 16, 2024

Maybe something like llama-chat. I mentioned before that I think it would be good to have a example focused on chat only, that does that very well, that in time could replace the current llama-cli as the main program of llama.cpp, which at this is point is basically unmaintainable and should be retired.

@ericcurtin
Copy link
Contributor Author

SGTM

@ericcurtin
Copy link
Contributor Author

ericcurtin commented Nov 17, 2024

It's also tempting to call this something like run and use a kinda RamaLama, LocalAI, Ollama type CLI interface to interact with models. Kinda like daemonless Ollama:

llama-run file://somedir/somefile.gguf

@ericcurtin
Copy link
Contributor Author

ericcurtin commented Nov 17, 2024

We could even possibly add https://, http://, ollama://, hf:// as valid syntaxes to pull models since they all are just a http pull in the end

@ericcurtin
Copy link
Contributor Author

That might be implemented as llama-pull that llama-run can fork/exec (or they share a common library)

It's like simple-chat but it uses smart pointers to avoid manual
memory cleanups. Less memory leaks in the code now. Avoid printing
multiple dots. Split code into smaller functions. Uses no exception
handling.

Signed-off-by: Eric Curtin <[email protected]>
@ericcurtin ericcurtin changed the title Introduce ramalama-core Introduce llama-run Nov 17, 2024
@ericcurtin
Copy link
Contributor Author

ericcurtin commented Nov 17, 2024

I will be afk for 3 weeks, so expect inactivity in this PR. I did the rename in case we want to merge as is and not leave this go stale.

Although the syntax will completely change to:

llama-run [file://]somedir/somefile.gguf [prompt] [flags]

file:// will be optional, but will set up the possibility of adding the pullers discussed above.

@ericcurtin
Copy link
Contributor Author

This drives the compiler crazy FWIW:

diff --git a/include/llama.h b/include/llama.h
index 5e742642..c3285da3 100644
--- a/include/llama.h
+++ b/include/llama.h
@@ -537,6 +537,13 @@ extern "C" {
                          int32_t   il_start,
                          int32_t   il_end);

+#ifdef __cplusplus
+    // Smart pointers
+    typedef std::unique_ptr<llama_model, decltype(&llama_free_model)> llama_model_ptr;
+    typedef std::unique_ptr<llama_context, decltype(&llama_free)> llama_context_ptr;
+    typedef std::unique_ptr<llama_sampler, decltype(&llama_sampler_free)> llama_sampler;
+#endif
+
     //
     // KV cache
     //

seems like it only wants to build C code in that file or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants