Database Access RFC #161

dfellis · 2023-08-03T02:45:09Z

No description provided.

aguillenv · 2023-08-03T10:01:30Z

rfcs/002 - Database Access.md

+```md
+# schema {name} for {dbname}
+column 1, column 2, column 3
+example 1, example 2, example 3
+example 4, example 5, example 6
+## index [{name} columns] {column name 1, column name 2}
+## join {column name} = {schema name}.{column name}
+```


is this an usage example? I think I do not fully follow the ## index and ## join syntax

So it's sort of an unholy mixture of SQL and Markdown. Declare an index, optionally name it, if you name it include the columns keyword and then put the comma separated column names after that.

Similarly for declaring a join column. It's not as flexible as real SQL and this is a negative of this approach because any feature in any database we support may be requested by a user to be added and further bulk up the complexity of this syntax.

rfcs/002 - Database Access.md

aguillenv · 2023-08-03T10:04:34Z

rfcs/002 - Database Access.md

+Just alternatives right now as we dig into the possibilities here. The solution ideally should:
+
+1. Declare what database will be used by the application, whether a local sqlite file, a remote Postgres/MySQL/etc server, or perhaps something more esoteric like MongoDB, Redis, etc.
+2. Declare the connection is read-only vs read-write, so it *never* tries to mutate a read-only database connection.


maybe this is a complexity we do not need to address now? we might rely on the db user and roles management and the credentials given to connect with marsha

So, I agree that it would be a later feature that may not be implemented for a while, but I don't think relying on credential permissions is the right approach here: how often were you given a separate set of read-only credentials for a database in a company when you were allowed write access to said database?

Basically there can be times when you want to reduce the "blast radius" but had to do it by being careful instead of relying on the DBA to be responsive :) This is basically telling Marsha to be careful. ;)

aguillenv · 2023-08-03T10:05:24Z

rfcs/002 - Database Access.md

+
+#### Embedded syntax
+
+Databases would still be declared top-level, and the `# schema` type would also exist, but queries would be in a new `## query` subsection for a function, instead. This would have to be after the description section but before the function examples, so the query would be defined *after* it's "use" in the description, which might be a bit weird, but if defined before the description we would need a `## description` sub-heading to make it clear again.


yes, not sure if adding complexity to the function definitions makes sense

It is a path to consider, though, since it reduces the number of top-level things to learn.

Co-authored-by: Alejandro Guillen <[email protected]>

aguillenv · 2023-08-04T10:34:01Z

rfcs/002 - Database Access.md

+
+But for something like the `# references` block to make any sense, it also needs to *modify* the behavior of the `# func` block to update it with the documents the function description references, if any. This implies that it may be a better idea for the extensions to decide to either replace or extend the existing behavior of `# something` blocks they have referenced. The current behavior that parses the `# func` blocks into more verbose markdown would then have that extended markdown intercepted by the references extension and modified with the references used by the function.
+
+Similarly, the current type-insertion behavior for `# func` blocks that specify an explicit type could be turned into an extension interception behavior in that way, simplifying and segregating those different concerns, likely with near-zero latency impact (the function could not be parsed by the LLM until the dependent type is turned into a Python `class`, but that is true today. It would just insert an extra function call in the Markdown parse and re-generation, afaict.


hmm, not sure if I follow this, but the classes and the functions are generated in the same call, there's no previous call to generate classes and then another one for functions, is all in one.

Yeah, I was reading through it more closely this evening and noticed that, however that doesn't mean it can't be broken into separate paths to do just a text transform, then merged together and then fed to the LLM as separate event handlers so we can add new # blocks independently.

aguillenv · 2023-08-04T10:36:26Z

rfcs/002 - Database Access.md

+
+LLM abilities probably haven't plateaued yet, and I see value in both the Fully-separated syntax and the References syntax -- which is desired may depend on personal preference more than anything, but needing to choose one over the other may limit the future of Marsha in undesirable ways. At least while we're still figuring out what to leave to the LLM, what to guide the LLM with, and what to explicitly require the user to provide, it could be useful to have users specify which manipulations of the Markdown syntax they even want to have with special interpreter directives defined at the beginning of the file, kinda like [Raku](https://www.raku.org/) or [Racket](https://racket-lang.org/).
+
+We could define a `# using {extension 1} [, {extension 2} ...]` header that *must* be the first if present, specifying what parsing extensions are to be used on the text after it. We could even potentially put *all* functionality behind different extensions to give us the flexibility to make breaking changes to things like functions without breaking any code still using the older function behavior, but that could be a wonky barrier to entry for people who are not programmers.


yes, when I started reading it my thought was that las sentence, but thinking about it, while we get a "final" solution users that are not programmers either are not going to use these "advance modes" so often or they have not discovered marsha yet?

While we're actively developing on it during this alpha state I think that's fine, but I do think a default "base" extension that's automatically used and extensions long-term are just development artifacts and/or for esoteric purposes in the future. I think I mention something like that in the following paragraph.

aguillenv · 2023-08-04T10:38:09Z

rfcs/002 - Database Access.md

+```md
+# func get_users(): list of user objects
+
+This function gets all user records from [the database][1] and returns them, or fails if unable to connect to the database.


seems a bit tricky what substitution would need to be done here depending on each case for the final prompt for the LLM to know what to do.

Oh, so I was thinking it would lightly transform the [the database][1] into [the database](#the-database) where #the-database is whatever the title is of the reference after it is turned into a markdown document itself following this convention

The exact transform would depend on the extension and how it deals with the URL (https should be queried and maybe it drops part of the <body> into the document if it's an HTML document, while a file path that ends in .sqlite reads that file with the sqlite3 standard library and provides a summary of the schema).

Database Access RFC

4268bf1

dfellis self-assigned this Aug 3, 2023

dfellis requested review from depombo and aguillenv August 3, 2023 02:45

aguillenv reviewed Aug 3, 2023

View reviewed changes

dfellis and others added 2 commits August 3, 2023 09:35

Update rfcs/002 - Database Access.md

5852869

Co-authored-by: Alejandro Guillen <[email protected]>

This RFC is going in a very different direction as I think over things

64ffb46

aguillenv reviewed Aug 4, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database Access RFC #161

Database Access RFC #161

dfellis commented Aug 3, 2023

aguillenv Aug 3, 2023

dfellis Aug 3, 2023

aguillenv Aug 3, 2023

dfellis Aug 3, 2023

aguillenv Aug 3, 2023

dfellis Aug 3, 2023

aguillenv Aug 4, 2023

dfellis Aug 4, 2023

aguillenv Aug 4, 2023

dfellis Aug 4, 2023

aguillenv Aug 4, 2023

dfellis Aug 4, 2023


		#### Embedded syntax

		Databases would still be declared top-level, and the `# schema` type would also exist, but queries would be in a new `## query` subsection for a function, instead. This would have to be after the description section but before the function examples, so the query would be defined after it's "use" in the description, which might be a bit weird, but if defined before the description we would need a `## description` sub-heading to make it clear again.


		But for something like the `# references` block to make any sense, it also needs to modify the behavior of the `# func` block to update it with the documents the function description references, if any. This implies that it may be a better idea for the extensions to decide to either replace or extend the existing behavior of `# something` blocks they have referenced. The current behavior that parses the `# func` blocks into more verbose markdown would then have that extended markdown intercepted by the references extension and modified with the references used by the function.

		Similarly, the current type-insertion behavior for `# func` blocks that specify an explicit type could be turned into an extension interception behavior in that way, simplifying and segregating those different concerns, likely with near-zero latency impact (the function could not be parsed by the LLM until the dependent type is turned into a Python `class`, but that is true today. It would just insert an extra function call in the Markdown parse and re-generation, afaict.


		LLM abilities probably haven't plateaued yet, and I see value in both the Fully-separated syntax and the References syntax -- which is desired may depend on personal preference more than anything, but needing to choose one over the other may limit the future of Marsha in undesirable ways. At least while we're still figuring out what to leave to the LLM, what to guide the LLM with, and what to explicitly require the user to provide, it could be useful to have users specify which manipulations of the Markdown syntax they even want to have with special interpreter directives defined at the beginning of the file, kinda like [Raku](https://www.raku.org/) or [Racket](https://racket-lang.org/).

		We could define a `# using {extension 1} [, {extension 2} ...]` header that must be the first if present, specifying what parsing extensions are to be used on the text after it. We could even potentially put all functionality behind different extensions to give us the flexibility to make breaking changes to things like functions without breaking any code still using the older function behavior, but that could be a wonky barrier to entry for people who are not programmers.

Database Access RFC #161

Are you sure you want to change the base?

Database Access RFC #161

Conversation

dfellis commented Aug 3, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment