Skip to content

simrunsharma/Practical_Data_Science

Repository files navigation

Practical Data Science

This course was developed because often students are taught advanced math and statistics however don't know how to utilize these concepts with data science tools. This course gives students like myself the ability to manipulate and analyze real(messy,error-ridden) data using a bread-and-butter Python Data tools.

Specfic Skills: -Manipulate and analyze data in any format, including cleaning, merging, and summarizing all standard tabular formats and levels of cleanliness, as well as large datasets and GIS data, -Identify and resolve data issues using defensive programming practices, -Setup and manage a data science programming environment on their own computers, including installing Python, managing packages with pip and conda, setting PATH variables, and working with VS Code, -Collaborate with colleagues effectively using git and github, -Plan and execute a full data science project from planning data manipulations through analysis and presentation of findings.

Data Science Branches

There are, broadly speaking, two branches of what is often referred to as Data Science, which I will term Software Development Data Science and Data Analysis Data Science.

Software Development Data Science

In Software Development Data Science, programmers write programs that get bundled up in software and distributed widely, or gets run on the cloud for millions of people. For example, software development data scientists wrote the recommendation engine that lets Netflix tell you what movies you might enjoy, or what people might be your friends on Facebook. As a result, they generally write generalizable code that is designed to run on data with a known structure.

Data Analysis Data Science

In Data Analysis Data Science, the data scientist is generally employed to answer a single, specific question. For example, a Data Analysis Data Scientist may be hired to figure out how to reduce antibiotic-resistant infections in a hospital, or to identify what campaign promises are most likely to convince voters to support a politician. As a result, Data Analysis Data Scientists are generally writing code that is only meant to be used for their specific project. Moreover, Data Analysis Data Scientists don't generally have the luxury of working with data with a known structure – where a Netflix Data Scientist may get data from a company database that's clean and well organized, a Data Analysis Data Scientist may have to work with data that has come from lots of different sources and which no one has cleaned and organized (e.g. notes from nurses, or voting data from different states compiled by hand by minimum wage government employees).

To be clear, these branches are not completely distinct. Most data scientists do things that fall into both categories (for example, even a Software Developer will likely do some ad hoc analyses before developing a fully deployable tool). But these two types of data science do emphasize different skills. Software Development Data Scientists, for example, are well served by traditional computer science curricula, and need a much deeper understanding of concepts like object-oriented programming, and software deployment. By contrast, Data Analysis Data Scientists need to be comfortable working with data in different formats, and to understand how to clean and fit together datasets that were never actually built to be integrated.

The focus of this course will be on the skills of Data Analysis Data Science: cleaning and merging data, data exploration, and designing projects to answer very specific questions. If you're interested in policy analysis, or health-sector analysis, or applied empirical research, this course is for you; if you're interested in developing programs you can deploy in an iPhone app to improve recommendations, then while there will be material that will be of use to you (the Python data science stack, working at the command line, git and github), the emphasis of the material won't quite be what you're looking for.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published