Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support migrations between different file ID managers #49
base: main
Are you sure you want to change the base?
support migrations between different file ID managers #49
Changes from 4 commits
b364fa5
fe5e0ea
acd0e2a
b02ce6d
6ab34e9
4975d71
6fca139
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this lead to thrashing? For example, I use two classes of FIDs (e.g., LFID and AFID) configured on my system, neither of which have bothered to configure the db path (plausible if the LFID created the Files table since its schema is a superset of AFID), then won't the AFID trigger migration when it runs, and vice versa when the LFID starts?
I really think this kind of functionality should be in the user's face and performed by a CLI tool that also does
import
(from a source),export
(to a destination), andmigration
using both. With a CLI tool, you can also support any database type or storage type and no need for aManifest
table. In addition, you could support migration between databases of the same type (e.g., SQLite).If you're too strict about class names in the manifest, you could also run into issues where subclasses are treated as distinct classes, despite the fact that their schemas (and base functionality) are identical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the migration is being done automatically at the present moment.
Wait, but what would
import
andexport
do? These CLI options don't make sense to me. Importing requires somewhere to export to, and vice versa.I do agree that migrations should be made explicit and only done via a CLI tool. This allows us to convey the backup database path to the end user more easily. However, I'm wary of the old behavior this would lead to:
Do we know of a way to stop the server from starting if the server extension fails? That way, we can force the user to either migrate their database or switch back to their old FIDM.
Right, but without a manifest table, how would we force the server extension to fail at extension load time rather than at run time? We need some reflection in the database itself to tell users to migrate at load time rather than giving them some unreadable SQLite error at run time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. Import and Export need sources and destinations and it would be up to the CLI tool to define the types of sources (FIDMs, csv files, jsons, etc.). Migration essentially reverses those, where the source is exported from and the destination is imported to and would minimally support FIDM "source types". I.e., the initial implementation of the tool would take FIDM classes from which instances are instantiated and used. Support for other types could be added as necessary.
Regarding the failure scenario bullets...
Why does the FileID service fail to load when switching to the LFIDM - is it because the schema from the AFIDM is not sufficient and they're both configured to use the same database file (which strikes me as an administration bug)?
Won't every call to the FID service then fail and wouldn't that only go unnoticed if the calling application ignored exceptions? I hope callers are robust enough to recognize exceptions, but I might be missing something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I'm not understanding you fully, but it almost sounds like you want two additional CLI options in addition to
jupyter server fileid migrate
:jupyter server fileid import
jupyter server fileid export
My question is what would these options do differently than
migrate
? It seems importing a file ID database and exporting to a different one is the exact same as migrate.Yeah, so it'll fail at runtime because the database schemas are different. While this is partially an administration bug, there is a legitimate use case for migrations regardless. Right now, if you want to switch FIDM implementations, you're forced to use a separate DB file, meaning you just lost all the file IDs you previously indexed. Thus you also broke all extensions that depend on file ID.
Right, but there remains the issue of how to communicate to the user that "because you switched file ID managers, the best way to fix these errors is to migrate your database". I would prefer the server to just not start if the file ID database hasn't been migrated rather than have the user try and debug this on their own. That way, we force the user to take one of three paths:
I don't want people to have to scratch their heads and figure out why file ID isn't working after switching implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just saying that since migration is the "pipe between export and import" a CLI tool could offer different sources. What we've been primarily talking about are sources of FIDM instances (since they front the database), but I'm saying that you could also support different output destinations where the CLI can store exported data, and different input sources where the CLI tool can get data for importing.
So I could perform an equivalent migration by exporting from FIDM1 to a file
exported.json
and importing fromexported.json
to FIDM2. That is, the CLI tool manages the "pipe" since it is the caller ofFIDM1.export_rows()
andFIDM2.import_rows()
. This can then provide users the ability to have more choices for how they want to migrate (or backup) their FID databases.One application for this may be that the user needs to perform some transformations on the exported rows (e.g.,
exported.json
) prior to their import as part of their "migration plan".We don't necessarily need to expose other sources and split
migrate
intoexport
andimport
right away but should have that kind of thing in mind during implementation so that "the pipe" is extensible such that a user could manually intervene between "export" and "import" as part of their migration.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking a bit more on this, the first non-FIDM source type to support would probably want to be
csv
, since you want the ability to read rows from a file and JSON doesn't really lend itself to that format (that I'm aware of)..csv
would also be immediately usable in spreadsheet applications in which the transformations could be performed, etc.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to support import/export to some secondary file format in this PR. Let's limit the scope of this PR for just migrating between SQLite DBs. However, these are great ideas we should definitely think about implementing later down the road.