-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database Access RFC #161
base: main
Are you sure you want to change the base?
Database Access RFC #161
Conversation
```md | ||
# schema {name} for {dbname} | ||
column 1, column 2, column 3 | ||
example 1, example 2, example 3 | ||
example 4, example 5, example 6 | ||
## index [{name} columns] {column name 1, column name 2} | ||
## join {column name} = {schema name}.{column name} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this an usage example? I think I do not fully follow the ## index
and ## join
syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's sort of an unholy mixture of SQL and Markdown. Declare an index, optionally name it, if you name it include the columns
keyword and then put the comma separated column names after that.
Similarly for declaring a join column. It's not as flexible as real SQL and this is a negative of this approach because any feature in any database we support may be requested by a user to be added and further bulk up the complexity of this syntax.
Just alternatives right now as we dig into the possibilities here. The solution ideally should: | ||
|
||
1. Declare what database will be used by the application, whether a local sqlite file, a remote Postgres/MySQL/etc server, or perhaps something more esoteric like MongoDB, Redis, etc. | ||
2. Declare the connection is read-only vs read-write, so it *never* tries to mutate a read-only database connection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this is a complexity we do not need to address now? we might rely on the db user and roles management and the credentials given to connect with marsha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I agree that it would be a later feature that may not be implemented for a while, but I don't think relying on credential permissions is the right approach here: how often were you given a separate set of read-only credentials for a database in a company when you were allowed write access to said database?
Basically there can be times when you want to reduce the "blast radius" but had to do it by being careful instead of relying on the DBA to be responsive :) This is basically telling Marsha to be careful. ;)
|
||
#### Embedded syntax | ||
|
||
Databases would still be declared top-level, and the `# schema` type would also exist, but queries would be in a new `## query` subsection for a function, instead. This would have to be after the description section but before the function examples, so the query would be defined *after* it's "use" in the description, which might be a bit weird, but if defined before the description we would need a `## description` sub-heading to make it clear again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, not sure if adding complexity to the function definitions makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a path to consider, though, since it reduces the number of top-level things to learn.
Co-authored-by: Alejandro Guillen <[email protected]>
|
||
But for something like the `# references` block to make any sense, it also needs to *modify* the behavior of the `# func` block to update it with the documents the function description references, if any. This implies that it may be a better idea for the extensions to decide to either replace or extend the existing behavior of `# something` blocks they have referenced. The current behavior that parses the `# func` blocks into more verbose markdown would then have that extended markdown intercepted by the references extension and modified with the references used by the function. | ||
|
||
Similarly, the current type-insertion behavior for `# func` blocks that specify an explicit type could be turned into an extension interception behavior in that way, simplifying and segregating those different concerns, likely with near-zero latency impact (the function could not be parsed by the LLM until the dependent type is turned into a Python `class`, but that is true today. It would just insert an extra function call in the Markdown parse and re-generation, afaict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, not sure if I follow this, but the classes and the functions are generated in the same call, there's no previous call to generate classes and then another one for functions, is all in one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was reading through it more closely this evening and noticed that, however that doesn't mean it can't be broken into separate paths to do just a text transform, then merged together and then fed to the LLM as separate event handlers so we can add new # block
s independently.
|
||
LLM abilities probably haven't plateaued yet, and I see value in both the Fully-separated syntax and the References syntax -- which is desired may depend on personal preference more than anything, but needing to choose one over the other may limit the future of Marsha in undesirable ways. At least while we're still figuring out what to leave to the LLM, what to guide the LLM with, and what to explicitly require the user to provide, it could be useful to have users specify which manipulations of the Markdown syntax they even want to have with special interpreter directives defined at the beginning of the file, kinda like [Raku](https://www.raku.org/) or [Racket](https://racket-lang.org/). | ||
|
||
We could define a `# using {extension 1} [, {extension 2} ...]` header that *must* be the first if present, specifying what parsing extensions are to be used on the text after it. We could even potentially put *all* functionality behind different extensions to give us the flexibility to make breaking changes to things like functions without breaking any code still using the older function behavior, but that could be a wonky barrier to entry for people who are not programmers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, when I started reading it my thought was that las sentence, but thinking about it, while we get a "final" solution users that are not programmers either are not going to use these "advance modes" so often or they have not discovered marsha yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're actively developing on it during this alpha state I think that's fine, but I do think a default "base" extension that's automatically used and extensions long-term are just development artifacts and/or for esoteric purposes in the future. I think I mention something like that in the following paragraph.
```md | ||
# func get_users(): list of user objects | ||
|
||
This function gets all user records from [the database][1] and returns them, or fails if unable to connect to the database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems a bit tricky what substitution would need to be done here depending on each case for the final prompt for the LLM to know what to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so I was thinking it would lightly transform the [the database][1]
into [the database](#the-database)
where #the-database
is whatever the title is of the reference after it is turned into a markdown document itself following this convention
The exact transform would depend on the extension and how it deals with the URL (https
should be queried and maybe it drops part of the <body>
into the document if it's an HTML document, while a file path that ends in .sqlite
reads that file with the sqlite3 standard library and provides a summary of the schema).
No description provided.