Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use geocodio instead of google maps #394
Use geocodio instead of google maps #394
Changes from 5 commits
f6bb956
4724e9e
8d4c579
9a0a22f
baaa91d
6820ac3
e7d7400
c34d8f4
8bed81f
ffc857f
f0eca90
46724d6
802429b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sort of path seems ripe for configuration with an env variable instead of hard-coding. e.g. I don't have
/app
on my computer.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True! We've been hard coding these paths because this ETL should only be run in a docker container which has a consistent file structure. I think it'd be wise to use an env var anyway! There are lots of locations in the code that reference
/app/data
so I'll make this change in a separate PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any of these fields we would always expect? Country maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the docs say anything about values that will always be present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ever expect there to be default values here? Seems like we want to enforce that we always get a lat and a long, and never just fill a missing value with 0...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Removed the default value for these attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like Geocodio always expects to give you an accuracy type + score back, so again these don't need default values. Seems also like
accuracy_type
has a limited set it could be drawn from, as opposed to being a string. We could probably tighten up all of these type definitions.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why include a bunch of rows with empty strings here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is handling the case when the API can't geocode an address. I updated it to be a list of
None
's.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be cleaner to add this logic to the pydantic classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah,
locality_name
could be a computed property, and thenresults_df
could just be:or the for-loop equivalent since that's a NASTY comprehension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I find it confusing to have variables that aren't
pd.DataFrame
that end in_df
. This happens in the batch munging code above too.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments in this block as before - this hardcoded path seems ripe for env vars, and 2**19 B is not 100 KB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this logic from
dbcp.transform.helpers
because it is specific to the Google Maps API.