Skip to content

Latest commit

 

History

History
197 lines (140 loc) · 13.2 KB

README.md

File metadata and controls

197 lines (140 loc) · 13.2 KB
title
README

This Learning Portal is an effort of the DataRookies community to help those who are starting to learn Data Science find the right resources for the right level. The portal is meant to be a collation of articles, tutorials, and other learning materials that could serve as a guide for one's learning journey.

This is an open community project, so feel free to push your own commits to this repo with updated links or better tutorials to help those who are just starting out!

Tutorials

In an effort to make learning Data Science less intimidating, we broke down the references into levels so that you can quickly go to the level most relevant for you.

The levels are not also strict learning flows, feel free to jump around them or cherry-pick parts to match a learning style where you feel you're learning best.

Level 0: Starting from Scratch

Here you can find links to overviews on Data Science and related fields. You will not get too many technical references here but it will help you build the context surrounding Data Science to help you better appreciate this domain.

Articles
YouTube Videos
Real World Case Scenarios

Level 1: Coding Basics

Here you can find links to programming tutorials which you will use in your Data Science Practice. They will tend focus more on the fundamentals of the programming languages (such as syntax) and help you get comfortable with code.

Python

Python is a more general programming language which made its way into becoming a powerful Data Science tool because of its readability and simplicity.

R

R is a programming language which has its roots in the sciences and research. Its extensive background in statistics and computation makes it another powerful Data Science tool.

Python vs R

This is a question that is frequently asked. Many use both, some have strong preferences, and we don't want to say one is better than the other. These resources should help you which you might want to focus on or learn first!

SQL

SQL (or Sequel) is the most widely used database querying structures. If you wanna get data from your company's database, it's likely going to require you to use SQL.

Unix Commands (to help you with the command line/prompt)

Level 2: Tools for Data Science

Here you can find links to tutorials which focus the programming languages you learned earlier towards applications in Data Science. You will find some package specific tutorials and references here, alongside some Business Intelligence (BI) Tools.

Python

numpy

pandas

matplotlib

scikit-learn

R

Most of the Data Science related packages of R fall under the Tidyverse, you can find the project's website link here for an overview and link to resources. But we will break down a few that are commmonly used.

dplyr

ggplot2

Tableau
Text Editors and IDEs

Level 3: Managing Data Workflows

Data Science and its related fields have a tendency to be messy and iterrative. To try and solve that, others have developed tools to try to tame that mess and prevent your code from breaking. Here are just a few examples of tools you could use to help organize and maintain your code.

Git

Git is some form of version control (and the most widely used, if we're not mistaken) which helps you maintain your code and save checkpoints. When paired with Github and other cloud based repository managment tools, you're sure your code is with you whenever you need it!

Anaconda

Python requires you to install packages and dependencies in order to run cool Data Science tasks. However, there is a tendency for packages to need other packages, packages to conflict with other packages, things going missing, things randomly breaking...it can be a chore. Anaconda helps clean that all up, and we highly encourage everyone to use Anaconda to manage their Python environments to prevent unnecessary debugging. It can be chore to learn at first, but we promise it's worth it!

Cookiecutter

In the process of analyzing and working with data, you will start to feel the need to organize your files. Keeping your files organized and seeting up a project well will stop you from eventually getting lost in a mountain of messy files when you need to look for a specific function you wrote weeks back. This is where cookiecutter comes in, with just a few lines on your terminal, it will create all the folders you need to effectively tackle your data science project!

Docker

As you do your analysis or build tools with your Data Science skills, you will eventually need to share your work or run the process in a much stronger computer. But transferring all your files won't ensure that it works, the other computer might lack packages, dependencies, and the like. Docker helps solve this problem by letting you build containers that will make it easy to run code on other computers and save you the trouble of fixing dependencies when they don't work.

Bonus 1: Complete Bootcamp Classes

Here you can find links to end-to-end Data Science classes. While these may be more comprehensive, easier to follow, and have lots of resources, they also tend to be relatively expensive.

Coursera
Udemy
YouTube Guided

Bonus 2: Competitions and Challenges

Here you can find links to websites where you can participate in Data Science challenges.

This list was taken from this Medium Article written by Opetunde Adepoju.

Bonus 3: Mailing Lists and Podcasts

Here you can find links to Data Science related Mailing Lists and Podcasts.

Mailing List
Podcasts

Bonus 4: Free Resources

Here you can find links to Free Books and other resources for Data Science.

Levels to come in the near future with your help:

None of us are specialists at the moment, and would like to ask you to contribute if you are one! Feel free to reach out to us here or on our Facebook Page, or go a head and create a pull request if you want to add the resources here straight!

Specialization 1: Machine Learning

Specialization 2: Data Engineering

Specialization 3: Data Visualization

Disclaimer

This project is definitely a work in progress and we would love to see your contributions in this. Our limited experience will limit the resources shown here, but with your help we can make this the hub for anyone who wants to learn Data Science related skills and tools!