Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase Findability of CDEs #106

Open
jkyu opened this issue Jun 26, 2024 · 2 comments
Open

Increase Findability of CDEs #106

jkyu opened this issue Jun 26, 2024 · 2 comments
Assignees
Labels
XA6.4 Increase CDE and metadata specification libraries availability

Comments

@jkyu
Copy link

jkyu commented Jun 26, 2024

Task description:

Incorporate public CDEs from the NIH and other CDE repositories into the metadata specification tool. Enhance search functionalities to ensure the straightforward identification and retrieval of CDEs. Ensure the code is hosted in an accessible repository, accompanied by comprehensive documentation, and fully integrated into the tool. If time allows, enhance the Data Hub search engine to support finding CDEs

@jkyu jkyu self-assigned this Jun 26, 2024
@jkyu jkyu added the XA6.4 Increase CDE and metadata specification libraries availability label Jun 26, 2024
@jkyu
Copy link
Author

jkyu commented Jul 16, 2024

The first item we will address is the formal specification of the CDEs used in RADx projects. We have verified that the Tier 1 CDEs used in RADx do not match any NIH-endorsed CDEs present in the NIH CDE Repository. RADx data elements are currently defined only in data dictionaries and spreadsheets, so they are are difficult to find, maintain, and reuse.

To address this, we will develop a software tool to ingest CDEs used by RADx projects and provide a CDE specification in the CEDAR JSON-LD format. We will need to ingest the contents of the global codebook for Tier-1 data elements as well as data dictionaries for Tier-2 data elements. These CDE definitions will then be stored on CEDAR, which allows us to establish a formal source of truth for CDEs with version control. These CDEs can be uploaded to the NIH CDE Repoistory at a later date.

Ingestion of the Tier-1 data elements is straightforward. Tier-2 data elements are trickier, since they are converted from data dictionaries. Data dictionaries that follow the RADx specification can be ingested with processing by the components of the data dictionary validator. Some data dictionaries do not follow the specification and require bespoke code.

@jkyu
Copy link
Author

jkyu commented Aug 7, 2024

Developed a small package to convert and ingest CDEs into CEDAR from the global codebook (code here).

This task also includes the work to ingest CDEs from the NIH CDE Repository (code here).

Working document detailing the technical work for this task is here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
XA6.4 Increase CDE and metadata specification libraries availability
Projects
None yet
Development

No branches or pull requests

1 participant