Best practice for generating OCD IDs #103

todrobbins · 2017-11-30T16:43:40Z

I've seen UUIDs within California Civic Data datasets (e.g. https://calaccess.californiacivicdata.org/documentation/processed-files/ballot-measures/) and wondered if there are best practices for ID generation. Thanks!

Examples:

ocd-contest/0ba0ecc5-5fb5-47a4-9750-4d6187b54f29
ocd-election/7c01ac66-b870-4c02-b705-18d83fd7c233

The text was updated successfully, but these errors were encountered:

gordonje · 2017-12-01T05:42:19Z

@todrobbins I can only really speak to how the OCD ids are implemented, if that's helpful.

OCDIDField is a custom Django field from which the id fields on Election, BallotMeasureContest and other models all inherit. There's an ocd_type kwarg for setting the prefix before the UUID.

The UUID itself is randomly generated via Python's builtin uuid.uuid4().

python-opencivicdata had all this set up for us before we came along and implemented the election module. The bigger challenge for us was ensuring that our daily ETL process preserves the previously generated ids without inserting duplicate records.

If you're working on something outside the OCD ecosystem, but still in Django, you might consider just using the UUIDField.

Also, if you're storing your data in postgres, either pgcrypto or uuid-ossp are useful extensions.

Over in django-calaccess-processed-data, we're using pgcrypto's gen_random_uuid() function to create the OCD ids in bulk, for example, when creating hundreds of thousands of filings in bulk.

Hope that's helpful. If you're looking for more general guidance about assigning ids for data intended for public consumption, I think this is something @fgregg has been researching recently.

jpmckinney added the OCDEP 2: Division Identifiers label May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practice for generating OCD IDs #103

Best practice for generating OCD IDs #103

todrobbins commented Nov 30, 2017

gordonje commented Dec 1, 2017

Best practice for generating OCD IDs #103

Best practice for generating OCD IDs #103

Comments

todrobbins commented Nov 30, 2017

gordonje commented Dec 1, 2017