Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for releases published as part of a record (release_type = 'embedded_release') #110

Closed
duncandewhurst opened this issue Apr 2, 2020 · 13 comments
Assignees
Labels
sql Relating to the SQL files in the sql directory

Comments

@duncandewhurst
Copy link
Contributor

My understanding is that, for record collections, currently kingfisher-views only summarises the compiledRelease and doesn't do anything with the releases array.

This means that for publishers that only publish records, we can't do release-level analysis using kingfisher-views.

How much work would it be to also summarise the individual releases provided as part of a record?

We'd need some way of distinguishing these from the compiled release summaries, which I guess could be done by adding a new release_type.

For context, this came up when reviewing Paraguay's draft data, which was only shared as records.

@robredpath
Copy link
Contributor

@kindly
Copy link
Contributor

kindly commented Apr 29, 2020

I do not see that as a problem. Only we do not want to do it for records with linked releases, so as long as we find a way of detecting them we can another release_type for this. Take about a day to do.

@duncandewhurst
Copy link
Contributor Author

Thanks @kindly

In OpenDataServices/cove#895 we decided the best was to differentiate between linked and embedded releases was to check for the id field.

@jpmckinney
Copy link
Member

The specific comment is: OpenDataServices/cove#895 (comment)

OCDS Kit has a somewhat different test: https://ocdskit.readthedocs.io/en/latest/api/util.html#ocdskit.util.is_linked_release The main addition is that it tests the number of fields.

@Bjwebb Bjwebb self-assigned this May 7, 2020
Bjwebb added a commit that referenced this issue May 29, 2020
* Add digit 3 to the end to disambiguate from other calculated release
  ids, including others we might add in the future.
* Reduce the multiplicative factor to 1000000, because the current
  maximum number of releases in a record is 363

#110
@Bjwebb
Copy link
Contributor

Bjwebb commented Jun 4, 2020

@duncandewhurst This has been deployed to the server. Could you check this now works as expected?

@duncandewhurst
Copy link
Contributor Author

Thanks @Bjwebb

I created view_data_collection_1261 to test this. It looks how I would expect apart from a couple of things:

  • In the release_summary_with_data table, the data column for releases with type 'releases_within_record' contains the whole record. I would expect it to contain just the release which is summarised on that row.
  • In the field_counts table all paths (e.g. compiledRelease/..., versionedRelease/..., releases/... etc.) are listed twice, once with release_type set to 'record' and once with it set to 'releases_within_records'. For the latter I would expect to see just the paths under releases/.
  • We tend to refer to releases within records as embedded releases, so 'embedded_release' might be a better value to use in release_type than 'releases_within_records'.

Also, the documentation will need updating (the release_type column description in the database tables reference section and the 'how tables are related' section).

Bjwebb added a commit that referenced this issue Jul 8, 2020
The data column of the release_summary_with_data table should contain
only the release, in the case of embedded releases, not the record it is
embedded in.

#110
@Bjwebb
Copy link
Contributor

Bjwebb commented Jul 8, 2020

Thanks @duncandewhurst. Those are all useful suggestions, and look easy to change.

When I update the data column of release_summary_with_data, I automatically get a change to field_counts:
Screenshot from 2020-07-08 17-38-05
This has just the paths under releases/, but without that prefix. Is this okay? I can add the prefix back in if that's useful, it just might be some amount of faff.

@duncandewhurst
Copy link
Contributor Author

Without the prefix is better! Thanks

@Bjwebb
Copy link
Contributor

Bjwebb commented Jul 16, 2020

@duncandewhurst The latest changes are now on the kingfisher server, if you want to review.

Just noticed that the docs still need updating, so I'll look into that.

@Bjwebb
Copy link
Contributor

Bjwebb commented Jul 20, 2020

I've made some update to the docs: PR, view on Read the Docs (not merged into master/latest yet) (this IS NOW MERGED into master/latest)

@jpmckinney
Copy link
Member

Is #43 related / duplicate?

@jpmckinney jpmckinney added the sql Relating to the SQL files in the sql directory label Jul 31, 2020
@jpmckinney jpmckinney changed the title Add support for releases published as part of a record Add support for releases published as part of a record (release_type = 'embedded_release') Jul 31, 2020
@duncandewhurst
Copy link
Contributor Author

duncandewhurst commented Aug 3, 2020

As @yolile said, #43 relates to summarizing compiled releases published as part of a record, whereas this issue relates to summarizing embedded releases

@duncandewhurst
Copy link
Contributor Author

@Bjwebb the changes and docs look good, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql Relating to the SQL files in the sql directory
Projects
None yet
Development

No branches or pull requests

5 participants