-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean overdose death dataset #5
Comments
I am going to start working on this! |
Here are the different categories for cause of deaths... I am assuming we need to just filter by causes that say drug overdose then look at the attached codes for each overdose category. However, do I need to keep the categories so we are able to predict the drug overdose categories that are missing . One method I am proposing for missing data is to take the total deaths in a county - the other categories = remaining deaths... then do some sort of manipulation for remaining categories. This is just one idea... please let me know if on the wrong track. |
Please see most up to date work on this issue here |
@lisawym I looked into the questions you asked and converted everything into text files in case u want to look at them. I pushed it to my local branch. I am just waiting for total population so that I can filter by larger counties and get mortality rates which can then be used to help fill in missing data. Find that work here : https://github.com/MIDS-at-Duke/opioid-2023-kml/tree/usVital_missing_data let me know if u have questions |
Based on our earlier discussion in Slack and our earlier feedback from our instructor. I think we can handle the missing values in the following way by setting a population threshold for the counties, and dropping the counties with smaller population. We can find the threshold of the counties in the below way:
We talked about another approach - get total death data and compute the missing values. I think it is really good, but it's a little bit complicated, I am not sure if we have time to implement it. |
I agree that the other approach would work but getting total death data for more recent years was very challenging I browsed for about 2-3 hours yesterday. I have already done step 1 (see text file). I am worried setting the population threshold like that will very much minimize our data especially in more rural states like WV, TX, and TN (control states). We might have to set this population threshold by STATE, to properly account for varying size and geographically different states. I think this is the way to help fix the missing data but we may still be left with some missing data between years. The last step would be to use the rates but this needs both drug deaths/total population. Are you going to begin this merging step or should I? I will submit a pull request for my branch so that someone can begin merging our two datasets together to hopefully somewhat fix missing data. |
Yeah, valid point, that states are different in nature, and we might overlooked some rural areas with the threshold. Would you like to merge the population data and see if we can get a better understanding of the data with population data? Hopefully we will be lucky to find some reasonable and easy way to deal with the missing data. I noticed that there is a county code in death dataset. I think it is related to FIPS code. I add extra 0 to the beginning of the states with only one digit for state code but the death dataset omit the 0. Please let me know if there are any issues merging the population data! At the same time, I think I will take a look at the transaction data, and see if there is a FIPS code in it. If not, I will try to add a column for FIPS based on the county name. And see how we can merge population to transaction data to calculate the transaction per capita for the other research question. Thanks! @katelyn-hucker |
Good Idea on how to split this up. I will try to merge population with us vital stats. I'm going to submit a pull request for my branch first |
No description provided.
The text was updated successfully, but these errors were encountered: