Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In release_summary_with_data show compiled release not record. #5

Closed
wants to merge 1 commit into from

Conversation

kindly
Copy link
Contributor

@kindly kindly commented Mar 19, 2019

Also remove field_count downgrade as fie

Copy link

@pindec pindec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed - for more flexibility, we should have separate columns with the source_data and a new column with the release_data. They will be the same for releases, and different for records. This enables analysts to both write the same queries for releases and records, and have access to the source data.

@jpmckinney
Copy link
Member

@pindec would the source_data column behave the same as the current data column (in which case, we can just leave the name as-is) and the release_data column would behave like the proposed change (in which case we can add that as a new column)?

@pindec
Copy link

pindec commented Mar 3, 2020

Yes, including leaving the name as-is.

To clarify (and noting in more detail the rationale for the change for future ref):
As an analyst, I want to be able to write queries on release data columns that work in the same way, regardless of whether the underlying data is a release or a record, so that I can investigate and compare datasets more easily.

Current data columns in release_summary_with_data differ in structure for releases and records (records data contain jsonb representations of each record, with top-level keys potentially including compiledRelease, release and versionedRelease, while release data columns contain jsonb representations of each release, no top-level key enclosing release data), so presently analysts have to write different queries that contain jsonb paths for the data column depending on release_type, with record queries requiring an additional path step into compiledRelease etc..

However, we should preserve backwards compatibility for existing queries, so the existing data columns should remain the same, containing the source data (either full release or full record data).

The release_data column (proposed change) contains the same data structure regardless of release_type (i.e. jsonb representations of the contents of either a release or a compiledRelease, no top-level key enclosing the data). So in future, analysts can run the same jsonb path queries against the new release_data column, regardless of release_type, with the caveat that the full record in data may contain additional information such as a versioned release.

@jpmckinney
Copy link
Member

jpmckinney commented Mar 3, 2020

If we close open-contracting/kingfisher-process#63 (adding support for records to the compile step in Kingfisher Process), then the compiled_release table will be populated if the source is records (it doesn't presently), in which case we don't want to double-up by having a release_data column for the compiledRelease from the record table, which will already appear in the data column for the corresponding row in the compiled_release table. (I also don't like the idea of doubling the disk space used by having the same data in data and release_data when the release_type is release or compiled_release.)

In other words, for individual/compiled releases, queries should only be written for release_type of release or compiled_release.

Right now the problem is that if the source is records then there won't be any corresponding rows in compiled_release. I'm proposing that we fix that, instead of adding a workaround.

If that makes sense, we can close this PR and prioritize the above issue.

@jpmckinney
Copy link
Member

I think this is superseded by #43 and/or #110

@jpmckinney jpmckinney closed this Jul 31, 2020
@jpmckinney jpmckinney deleted the minor-bugfixs-field-counts-release-view branch July 31, 2020 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants