Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-stage programming model with K-means, EM, LinearReg, and LogisticReg #19

Merged
merged 47 commits into from
Apr 29, 2015

Conversation

kijungs
Copy link
Contributor

@kijungs kijungs commented Apr 5, 2015

Related issues: #14, #16 , #17

Other algorithmic improvements:

  • Users can specify the number of clusters
  • Initial centroids are assigned randomly
  • The amount of data broadcast and gathered is reduced

Result of the correctness test:

  1. K-Means and EM
  • Data: sample_cluster
  • Actual number of clusters: 3
  • Actual means of clusters: {1,3,5}, {6,4,2}, {3,5,7}
  • Actual (diagonal) covariances of clusters: {1,2,3}, {3,2,1}, {2,2,2}
  • Means obtained by K-means: {1.09,2.55,4.70}, {6.33,3.90,2.00} , {2.89,5.23,7.17}
  • Means obtained by EM: {1.028, 2.565, 4.839}, {6.145, 3.864, 2.000}, {2.863, 5.088, 7.015}
  • Covariances obtained by EM: {0.809, 1.683, 2.951}, {3.443, 1.566, 0.972}, {2.472, 1.874, 1.869}
  1. Logistic Regression
  • Data: sample_classification
  • Actual model: 0.5 - 3x1 + 2x2 -1.5x3
  • Actual accuracy: 90%
  • model obtained by Logistic Regression: 0.0007-0.006x1+0.0036x2-0.0027x3 (only ratio matters)
  • accuracy obtained by Logistic Regression: 87%
  1. Linear Regression
  • Data: sample_regression
  • Actual model: 0.5 - 3x1 + 2x2 -1.5x3 + gaussian random error with 1.0 standard deviation
  • model obtained by Linear Regression: 0.5-2.6x1+1.7x2-1.2x3

Jason and others added 30 commits November 19, 2014 16:52
import org.apache.reef.tang.annotations.Name;

/**
* Information of a stage, which corresponds to a BSP algorithm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an init stage a BSP algorithm? This sounds strange.

Later we have an asynchronous execution engine, we need to separate out the BSP-specific part from the general part.

@bgchun
Copy link
Contributor

bgchun commented Apr 25, 2015

@kijungshin @jsjason The code looks nicer than before.

I left non-trivial amount of comments.
There are several issues to be addressed. It's not desirable to address all of them in this pull request.
I suggest the following. Add proper license headers, year, comments, and address coding style comments (e.g., line spacing) in this pull request.
Address the comments about code that can be easily addressable.
For the rest of comments that need to be addressed, we create issues and keep track of them.

Let's address the ones you can do quickly and merge this gigantic pull request ASAP.

@kijungs
Copy link
Contributor Author

kijungs commented Apr 27, 2015

@bgchun @jsjason I will address trivial comments and create issues for non-trivial comments.

@kijungs
Copy link
Contributor Author

kijungs commented Apr 28, 2015

I removed unnecessary empty lines in every code according to the sun java coding convention.
I also added a license header to every code.

kijungs added 2 commits April 28, 2015 16:55
1) Tasks log more detailed information
2) reduce the dependency between the group communication service and the data loading service
3) reduce the number of fields using Optional
@bgchun
Copy link
Contributor

bgchun commented Apr 29, 2015

@jsjason @kijungshin If you agree that minor comments are addressed and major comments are registered as issues, please merge this pull request. This has been long overdue.

@jsjason
Copy link
Contributor

jsjason commented Apr 29, 2015

@bgchun We agree that this is taking much longer than expected. Although @kijungshin has addressed most of the comments, there is still a few more he'd like to address; we decided to create a separate issue regarding the remaining comments, since almost all of them are about lacking explanations, renaming classes, etc. We'll get the job done shortly.

@jsjason
Copy link
Contributor

jsjason commented Apr 29, 2015

All comments have been either addressed or marked as a separate issue. I'll merge this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants