-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce llama-run #10291
base: master
Are you sure you want to change the base?
Introduce llama-run #10291
Conversation
b2a336e
to
17f086b
Compare
Some of these builds seem hardcoded to c++11, when we use a feature from c++14. Any reason we aren't using say c++17 Any reasonable platform should be up-to date with c++17 I think |
bf26504
to
0d016a4
Compare
Converted to C++11 only |
0af3f55
to
33eb456
Compare
d2711eb
to
ca45737
Compare
@slaren @ggerganov PTAL, I'm hoping to add other features to this example such as, read prompt from '-p' arg and read prompt from stdin |
It would be good to have a more elaborated chat example, but the goal of this example is to show in the simplest way possible how to use the llama.cpp API. I don't think that these changes achieve that, I think that users will have a harder time understanding the llama.cpp API with all the extra boilerplate that is being added here. If you want to use this as the base of a new example that adds more features that would be great, but I think we should keep this example as simple as possible. |
ca45737
to
0fbc0ae
Compare
Cool sounds good @slaren, mind if I call the new example ramalama-core ? |
0fbc0ae
to
83988df
Compare
f547f18
to
6e87e58
Compare
We can do things like:
with this example. |
@ericcurtin Do you plan to develop this into a more advanced example/application in the future? With the existing functionality, I am not sure it has enough value to justify adding it to the project and maintaining it. |
@ggerganov I plan to continue adding more features. |
Btw, did you mean as part of this PR or in general? |
No need to be in this PR, just in general. If the example will continue to be developed, then it would make sense to merge it now. |
Any name based on the functionality of the example is fine. "ramalama-core" doesn't mean anything to me, or to most users I would expect. The smart pointers can be added in this PR or a separate one, it doesn't matter to me. I mentioned it because since you are already using them, it may be worth to spend the time to make proper ones using structs instead of function pointers (which are very annoying to use due to having to initialize it with the pointer at every time, and probably also less efficient since it stores an additional pointer in the |
llama-inferencer? I really don't care too much about the name, just want to agree on one to unblock this PR. @rhatdan @slp any suggestions on names? I plan on using this as the main program during "ramalama run" but I'm happy for anyone to use it or make changes to it to suit their needs. It's like a drastically simplified version of llama-cli, with one or two additional features, read from stdin, read from -p flag. But it does seem stabler and less error prone than llama-cli also. And the verbose info is all cleaned up to only spit out errors. It was based on llama-simple-chat initially. |
Maybe something like |
SGTM |
It's also tempting to call this something like run and use a kinda RamaLama, LocalAI, Ollama type CLI interface to interact with models. Kinda like daemonless Ollama:
|
We could even possibly add https://, http://, ollama://, hf:// as valid syntaxes to pull models since they all are just a http pull in the end |
That might be implemented as llama-pull that llama-run can fork/exec (or they share a common library) |
6e87e58
to
1d7f97c
Compare
It's like simple-chat but it uses smart pointers to avoid manual memory cleanups. Less memory leaks in the code now. Avoid printing multiple dots. Split code into smaller functions. Uses no exception handling. Signed-off-by: Eric Curtin <[email protected]>
1d7f97c
to
4defcf2
Compare
I will be afk for 3 weeks, so expect inactivity in this PR. I did the rename in case we want to merge as is and not leave this go stale. Although the syntax will completely change to: llama-run [file://]somedir/somefile.gguf [prompt] [flags] file:// will be optional, but will set up the possibility of adding the pullers discussed above. |
This drives the compiler crazy FWIW:
seems like it only wants to build C code in that file or something. |
It's like simple-chat but it uses smart pointers to avoid manual
memory cleanups. Less memory leaks in the code now. Avoid printing
multiple dots. Split code into smaller functions. Uses no exception
handling.