Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor metadata generation and add huey for async task processing #790

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

sgfost
Copy link
Contributor

@sgfost sgfost commented Jan 9, 2025

additions

  • adds huey for async task processing
  • replaces codemeta generation with library.metadata.CodeMetaConverter which uses codemeticulous and comes with some updates to the transformation
  • adds citation file format generation with library.metadata.CitationFileFormatConverter
  • adds library.metadata.DataCiteConverter
    • intended to replace DataCiteSchema, but does not yet do so. Should happen with comses/planning#286, this can either happen in this PR or later
  • codemeta_snapshot is used as the primary accessor for codemeta (and anything derived from it)
    • a json field on Codebase and CodebaseRelease that is updated when the object is saved
    • keeps a 'cached' version for quicker access (mostly relevant for the release detail page)
    • allows for an easy way to track when changes happen to metadata in order to trigger events
  • whenever a change in codemeta is detected for a release, the release package will be rebuilt with new metadata files
  • LICENSE and CITATION.cff files are now included in release packages

deploying

since codemeta is no longer re-generated each time it's accessed, it needs to be seeded with ./manage.py update_codebase_metadata, which will build and store a codemeta representation for every codebase/release. This shouldn't ever need to be ran again unless there are changes to the codemeta generation in the future that we want to be applied everywhere

@sgfost sgfost changed the title Refactor/metadata generation refactor metadata generation and add huey for async task processing Jan 9, 2025
https://huey.readthedocs.io/

the huey consumer runs as a runit daemon in the server service

the default dev mode behavior of immediate=False (tasks run
synchronously) is currently disabled for testing purposes
adding these manually was an easily forgotten step that wouldn't be
noticed in dev but would fail to build in prod
codemeta_snapshot will be used to keep a codemeta data structure updated
along with changes to metadata, which makes it easier to watch for
changes and speeds up access

license text is created from a template for each release and included in
the fs package as LICENSE file

ref comses/planning#234
and replace redis caching of codebase all_contributor lists with
querysets (did save a few queries but doesn't seem to have any
meaningful performance impact)

codebases were considering any citable release contributor as an author
and releases considered anyone with a role of "author" to be an author.
Now we use a union of the two -- not sure if this is the best way but
regardless, its easier to change since it all stems from authors()
and nonauthors() on the ReleaseContributorQuerySet

codebase and release both now have 'nonauthor contributor' accessors,
which is useful because this is what things like codemeta/datacite/etc.
consider 'contributors'
move metadata conversion to a metadata module which provides converters
for different formats. codemeta is used as the primary format which the
others (datacite, cff) can be derived from

the primary codemeta accessor is the codemeta_snapshot json field, which
is rebuilt each time a codebase/release is saved

* add `update_codebase_metadata` command to update the codemeta snapshot
  for all objects, then update packages on the fs
* add CITATION.cff file to fs package
usage of the old datacite metadata generation still needs to be replaced
* visually indicate that the release metadata form is saving, since this
  takes a little bit longer now
@sgfost sgfost force-pushed the refactor/metadata-generation branch from 17f1895 to 9cbc597 Compare January 14, 2025 00:23
and resolve some edge case bugs with metadata generation.

test_codemeta was primarily checking to make sure that codemeta was
conforming to the expected schema, and this is implicit now

we may still want some test module that uses hypothesis, but it would be
even more useful to do this at a higher level e.g. create a bunch of
codebase+releases and see if anything goes wrong downstream
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant