-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor metadata generation and add huey for async task processing #790
Open
sgfost
wants to merge
9
commits into
main
Choose a base branch
from
refactor/metadata-generation
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sgfost
changed the title
Refactor/metadata generation
refactor metadata generation and add huey for async task processing
Jan 9, 2025
https://huey.readthedocs.io/ the huey consumer runs as a runit daemon in the server service the default dev mode behavior of immediate=False (tasks run synchronously) is currently disabled for testing purposes
adding these manually was an easily forgotten step that wouldn't be noticed in dev but would fail to build in prod
codemeta_snapshot will be used to keep a codemeta data structure updated along with changes to metadata, which makes it easier to watch for changes and speeds up access license text is created from a template for each release and included in the fs package as LICENSE file ref comses/planning#234
and replace redis caching of codebase all_contributor lists with querysets (did save a few queries but doesn't seem to have any meaningful performance impact) codebases were considering any citable release contributor as an author and releases considered anyone with a role of "author" to be an author. Now we use a union of the two -- not sure if this is the best way but regardless, its easier to change since it all stems from authors() and nonauthors() on the ReleaseContributorQuerySet codebase and release both now have 'nonauthor contributor' accessors, which is useful because this is what things like codemeta/datacite/etc. consider 'contributors'
move metadata conversion to a metadata module which provides converters for different formats. codemeta is used as the primary format which the others (datacite, cff) can be derived from the primary codemeta accessor is the codemeta_snapshot json field, which is rebuilt each time a codebase/release is saved * add `update_codebase_metadata` command to update the codemeta snapshot for all objects, then update packages on the fs * add CITATION.cff file to fs package
usage of the old datacite metadata generation still needs to be replaced
* visually indicate that the release metadata form is saving, since this takes a little bit longer now
sgfost
force-pushed
the
refactor/metadata-generation
branch
from
January 14, 2025 00:23
17f1895
to
9cbc597
Compare
and resolve some edge case bugs with metadata generation. test_codemeta was primarily checking to make sure that codemeta was conforming to the expected schema, and this is implicit now we may still want some test module that uses hypothesis, but it would be even more useful to do this at a higher level e.g. create a bunch of codebase+releases and see if anything goes wrong downstream
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
additions
huey
for async task processinglibrary.metadata.CodeMetaConverter
which usescodemeticulous
and comes with some updates to the transformationlibrary.metadata.CitationFileFormatConverter
library.metadata.DataCiteConverter
DataCiteSchema
, but does not yet do so. Should happen with comses/planning#286, this can either happen in this PR or latercodemeta_snapshot
is used as the primary accessor for codemeta (and anything derived from it)Codebase
andCodebaseRelease
that is updated when the object is savedLICENSE
andCITATION.cff
files are now included in release packagesdeploying
since codemeta is no longer re-generated each time it's accessed, it needs to be seeded with
./manage.py update_codebase_metadata
, which will build and store a codemeta representation for every codebase/release. This shouldn't ever need to be ran again unless there are changes to the codemeta generation in the future that we want to be applied everywhere