Clean overdose death dataset #5

katelyn-hucker · 2023-11-07T19:30:20Z

No description provided.

katelyn-hucker · 2023-11-10T00:12:26Z

I am going to start working on this!

katelyn-hucker · 2023-11-11T19:54:01Z

array(['Drug poisonings (overdose) Unintentional (X40-X44)',
       'All other alcohol-induced causes',
       'All other non-drug and non-alcohol causes',
       'Drug poisonings (overdose) Suicide (X60-X64)',
       'All other drug-induced causes',
       'Drug poisonings (overdose) Undetermined (Y10-Y14)',
       'Alcohol poisonings (overdose) (X45, X65, Y15)'], dtype=object)

Here are the different categories for cause of deaths... I am assuming we need to just filter by causes that say drug overdose then look at the attached codes for each overdose category. However, do I need to keep the categories so we are able to predict the drug overdose categories that are missing .

One method I am proposing for missing data is to take the total deaths in a county - the other categories = remaining deaths... then do some sort of manipulation for remaining categories. This is just one idea... please let me know if on the wrong track.

katelyn-hucker · 2023-11-11T20:31:41Z

Please see most up to date work on this issue here

katelyn-hucker · 2023-11-27T00:43:29Z

@lisawym I looked into the questions you asked and converted everything into text files in case u want to look at them. I pushed it to my local branch. I am just waiting for total population so that I can filter by larger counties and get mortality rates which can then be used to help fill in missing data. Find that work here : https://github.com/MIDS-at-Duke/opioid-2023-kml/tree/usVital_missing_data

let me know if u have questions

lisawym · 2023-11-27T17:46:13Z

Based on our earlier discussion in Slack and our earlier feedback from our instructor. I think we can handle the missing values in the following way by setting a population threshold for the counties, and dropping the counties with smaller population. We can find the threshold of the counties in the below way:

We find all the unique counties with missing values in deaths.
We find the population of the counties with missing values.
We set the population threshold to be the highest population of missing counties. (for example, suppose county ABC has NA values in 2005, and the population of ABC in 2005 is 29,740. And 29,740 is the highest number of population we have among other county/year combinations. And we can set our threshold to be 30,000)
We dropped all the counties with population less than the threshold we set. As a result, all the rows with missing values will be dropped. And we are not biased against the counties with missing values. Because we are selecting observations based on predictors' value, and all other counties with low population will also be dropped.

We talked about another approach - get total death data and compute the missing values. I think it is really good, but it's a little bit complicated, I am not sure if we have time to implement it.
Also, if you are interested in the mortality rate approach, and find ways to implement it. Please go with it!
@katelyn-hucker

katelyn-hucker · 2023-11-27T17:53:13Z

I agree that the other approach would work but getting total death data for more recent years was very challenging I browsed for about 2-3 hours yesterday.

I have already done step 1 (see text file).

I am worried setting the population threshold like that will very much minimize our data especially in more rural states like WV, TX, and TN (control states). We might have to set this population threshold by STATE, to properly account for varying size and geographically different states. I think this is the way to help fix the missing data but we may still be left with some missing data between years. The last step would be to use the rates but this needs both drug deaths/total population.

Are you going to begin this merging step or should I? I will submit a pull request for my branch so that someone can begin merging our two datasets together to hopefully somewhat fix missing data.
@lisawym

lisawym · 2023-11-27T18:19:12Z

Yeah, valid point, that states are different in nature, and we might overlooked some rural areas with the threshold.

Would you like to merge the population data and see if we can get a better understanding of the data with population data? Hopefully we will be lucky to find some reasonable and easy way to deal with the missing data.

I noticed that there is a county code in death dataset. I think it is related to FIPS code. I add extra 0 to the beginning of the states with only one digit for state code but the death dataset omit the 0. Please let me know if there are any issues merging the population data!

At the same time, I think I will take a look at the transaction data, and see if there is a FIPS code in it. If not, I will try to add a column for FIPS based on the county name. And see how we can merge population to transaction data to calculate the transaction per capita for the other research question.

Thanks! @katelyn-hucker

katelyn-hucker · 2023-11-27T18:49:53Z

Good Idea on how to split this up. I will try to merge population with us vital stats. I'm going to submit a pull request for my branch first

katelyn-hucker self-assigned this Nov 10, 2023

katelyn-hucker pinned this issue Nov 11, 2023

katelyn-hucker assigned lisawym Nov 27, 2023

katelyn-hucker closed this as completed Dec 13, 2023

katelyn-hucker unpinned this issue Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean overdose death dataset #5

Clean overdose death dataset #5

katelyn-hucker commented Nov 7, 2023

katelyn-hucker commented Nov 10, 2023

katelyn-hucker commented Nov 11, 2023

katelyn-hucker commented Nov 11, 2023

katelyn-hucker commented Nov 27, 2023

lisawym commented Nov 27, 2023

katelyn-hucker commented Nov 27, 2023

lisawym commented Nov 27, 2023 •

edited

Loading

katelyn-hucker commented Nov 27, 2023

Clean overdose death dataset #5

Clean overdose death dataset #5

Comments

katelyn-hucker commented Nov 7, 2023

katelyn-hucker commented Nov 10, 2023

katelyn-hucker commented Nov 11, 2023

katelyn-hucker commented Nov 11, 2023

katelyn-hucker commented Nov 27, 2023

lisawym commented Nov 27, 2023

katelyn-hucker commented Nov 27, 2023

lisawym commented Nov 27, 2023 • edited Loading

katelyn-hucker commented Nov 27, 2023

lisawym commented Nov 27, 2023 •

edited

Loading