Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practice for generating OCD IDs #103

Open
todrobbins opened this issue Nov 30, 2017 · 1 comment
Open

Best practice for generating OCD IDs #103

todrobbins opened this issue Nov 30, 2017 · 1 comment

Comments

@todrobbins
Copy link

I've seen UUIDs within California Civic Data datasets (e.g. https://calaccess.californiacivicdata.org/documentation/processed-files/ballot-measures/) and wondered if there are best practices for ID generation. Thanks!

Examples:

  • ocd-contest/0ba0ecc5-5fb5-47a4-9750-4d6187b54f29
  • ocd-election/7c01ac66-b870-4c02-b705-18d83fd7c233
@gordonje
Copy link
Contributor

gordonje commented Dec 1, 2017

@todrobbins I can only really speak to how the OCD ids are implemented, if that's helpful.

OCDIDField is a custom Django field from which the id fields on Election, BallotMeasureContest and other models all inherit. There's an ocd_type kwarg for setting the prefix before the UUID.

The UUID itself is randomly generated via Python's builtin uuid.uuid4().

python-opencivicdata had all this set up for us before we came along and implemented the election module. The bigger challenge for us was ensuring that our daily ETL process preserves the previously generated ids without inserting duplicate records.

If you're working on something outside the OCD ecosystem, but still in Django, you might consider just using the UUIDField.

Also, if you're storing your data in postgres, either pgcrypto or uuid-ossp are useful extensions.

Over in django-calaccess-processed-data, we're using pgcrypto's gen_random_uuid() function to create the OCD ids in bulk, for example, when creating hundreds of thousands of filings in bulk.

Hope that's helpful. If you're looking for more general guidance about assigning ids for data intended for public consumption, I think this is something @fgregg has been researching recently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants