Kaggle-collegescorecard

The origin of this project is from the Kaggle Competition located here: Kaggle College Scorecard

What is the problem you want to solve?

Quoted from the Kaggle description: "While it's understood that students from elite colleges tend to earn more than graduates from less prestigious universities, the finer relationships between future income and university attendance are quite murky."

If university attendance is viewed a financial investment, then it makes sense to properly understand the expected risk and return of those investments. However, it is one of the more difficult investments to anticipate return, and so I hope to find relationships in this dataset that predict return on investment.

My guiding question is: "What features of a university education correlate with a better return of investment?"

Who is your client and why do they care about this problem? In other words, what will your client DO or DECIDE based on your analysis that they wouldn’t have otherwise?

The client for this problem is future or returning university students. The former consists of newly graduated high school students and adults attending university for the first time. The latter can be classified as adults returning to restart and finish a degree, or to further their education.

The client would be able to better rank the effectiveness of their university choices based on return on investment and could make a more accurate decision about what university best suits them.

What data are you going to use for this? How will you acquire this data?

At this time, the data that will be used for this is the data provided by the Kaggle competition. This data set, as described by the Kaggle webpage says "...the US Department of Education has matched information from the student financial aid system with federal tax returns to create the College Scorecard dataset."

Further data sets may be used if the need arises.

In brief, outline your approach to solving this problem (knowing that this might change later).

My approach to solving this problem will first start with initially becoming familiar with the data and making sure that it is ready for data analysis. This will be facilitated through some simple graphs. Perhaps some clustering algorithms, once I understand how to implement those.

After that, I'll start to ask questions of the dataset and develop hypothesis of small complexity to test. As I feel I understand the data more comfortably, those hypothesis shall grow in complexity. I hope to end with some null hypothesis tests about predictive features that indicate a higher likelihood of a good return on investment.

What are your deliverables? Typically, this would include code, along with a paper and/or a slide deck.

The deliverables will be a R markdown document detailing any discoveries along with the code that led to those discoveries.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
milestone_files/figure-html		milestone_files/figure-html
.RData		.RData
.Rhistory		.Rhistory
.gitignore		.gitignore
CollegeScorecardDataDictionary-09-12-2015.pdf		CollegeScorecardDataDictionary-09-12-2015.pdf
Exploration.R		Exploration.R
FullDataDocumentation.pdf		FullDataDocumentation.pdf
README.md		README.md
Readme PST 10_08.txt		Readme PST 10_08.txt
codesnippets.R		codesnippets.R
finalreport.Rmd		finalreport.Rmd
finalreport.html		finalreport.html
finalreportv2.Rmd		finalreportv2.Rmd
finalreportv2.html		finalreportv2.html
milestone.Rmd		milestone.Rmd
milestone.html		milestone.html
mobility.R		mobility.R
rpp0615.xlsx		rpp0615.xlsx
technicallog.Rmd		technicallog.Rmd
technicallog.html		technicallog.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kaggle-collegescorecard

What is the problem you want to solve?

Who is your client and why do they care about this problem? In other words, what will your client DO or DECIDE based on your analysis that they wouldn’t have otherwise?

What data are you going to use for this? How will you acquire this data?

In brief, outline your approach to solving this problem (knowing that this might change later).

What are your deliverables? Typically, this would include code, along with a paper and/or a slide deck.

About

Releases

Packages

Languages

eistre91/Kaggle-collegescorecard

Folders and files

Latest commit

History

Repository files navigation

Kaggle-collegescorecard

What is the problem you want to solve?

Who is your client and why do they care about this problem? In other words, what will your client DO or DECIDE based on your analysis that they wouldn’t have otherwise?

What data are you going to use for this? How will you acquire this data?

In brief, outline your approach to solving this problem (knowing that this might change later).

What are your deliverables? Typically, this would include code, along with a paper and/or a slide deck.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages