Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saved queries from earlier versions of the LeafDB #508

Open
artgoldberg opened this issue Apr 14, 2022 · 1 comment
Open

Saved queries from earlier versions of the LeafDB #508

artgoldberg opened this issue Apr 14, 2022 · 1 comment

Comments

@artgoldberg
Copy link

Dear Leaf folks

Our clinical data warehouse changes daily, so we update the LeafDB frequently. We plan to start doing it daily, hopefully soon. Sometime the update does not change the concepts available in Leaf, and sometimes it does. But to make the update code simple, I have it erase these tables and then reload them:

/* Empty indices */
DELETE FROM LeafDB.app.ConceptForwardIndex;
DELETE FROM LeafDB.app.ConceptInvertedIndex;
DELETE FROM LeafDB.app.ConceptTokenizedIndex;

/* Empty dependency maps */
DELETE FROM LeafDB.rela.QueryConceptDependency;

/* Empty Concepts */
DELETE FROM LeafDB.app.Concept;

/* Empty ConceptSqlSet */
DELETE FROM LeafDB.app.ConceptSqlSet;

/* Empty DemographicQuery */
DELETE FROM LeafDB.app.DemographicQuery;

But I'm now noticing that older Saved Queries stop working. Here's the error that occurs when I open old ones:
image

My guess is that their Concepts refer to entries in ConceptSqlSet that no longer exist or have some other reference that break when we erase and update LeafDB. Does that make sense? (I confess that I'm being lazy now and not investigating these failures in the Leaf log. It's been a long week and it's getting late.)

Perhaps it was not part of the initial design, but It seems to me that it would be a valuable and reasonable functionality to support both

  1. frequent updates to the LeafDB, and
  2. long-lived Saved Queries, subject to the constraint that old Queries that depend on concepts (little c, as in ideas) that are no longer in LeafDB cannot possibly work.

What are your thoughts about what's causing older Saved Queries to fail, and about how to design the frequent updates to the LeafDB and the Saved Queries so that features 1 and 2 are both satisfied?

Regards
Arthur

@ndobb
Copy link
Member

ndobb commented Apr 22, 2022

Hi @artgoldberg,

Sure, this makes sense. Yes, a given Saved Query contains a JSON-based definition of the dependent Concepts it needs, which are key'd by the ConceptId in LeafDB.app.Concept.Id. So deleting a Concept and repopulating the database with a new concept that looks the same but has a new, different ConceptId would indeed break an existing Saved Query.

The intended way to handle Concept updates (and used by us at UW) is to leverage the ExternalId field in LeafDB.app.Concept.ExternalId. So for example, if you have a Concept for Demographics -> Language -> French, you could populate the Concept's ExternalId with something like "demographics:language:french" (or whatever you want).

Assuming you are adding new Concepts by SQL script, you would thus change the final INSERT INTO LeafDB.app.Concept statement to include a WHERE clause like:

INSERT INTO LeafDB.app.Concept
SELECT ...
FROM <new_concepts> AS N

/* Don't add if Concept already exists */
WHERE NOT EXISTS (SELECT 1 FROM LeafDB.app.Concept AS C WHERE N.ExternalId = C.ExternalId)

Thus if you can deterministically generate consistent ExternalIds each script run, you can preserve the existing Concepts (and delete old ones if needed) while still adding new ones as your data changes.

Of course, this takes a bit more work in terms of scripting, but not too much. There are other benefits to this approach too - for example, if you ever want to check the Leaf logs to see which Concepts are being used most frequently over time, having consistent durable ConceptIds is a necessity.

Hope this helps,
-nic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants