Imagenet Pipeline #120

shivaram · 2015-05-18T08:59:40Z

Closes #11

This PR adds the SIFT + LCS + FV imagenet pipeline. This includes changes to a bunch of things that help us avoid doing multiple passes of SIFT over the data (e.g., VectorSplitter and the Sampling node)

The pipeline still looks a bit complex due to the sample re-use stuff across PCA, GMM -- Let me know if you can think of ways to make this better

…enet-sift-fv

Also pass in numFeatures to block solver

Also add utility function to shuffle an Array

Also pass in appropriate values to sampler, solver to avoid multiple passes. Also caches the right set of things now

…enet-sift-fv

… into imagenet-sift-fv

tomerk · 2015-05-18T17:38:44Z

src/main/scala/utils/MatrixUtils.scala

+  // In place deterministic shuffle
+  def shuffleArray[T](arr: Array[T]) = {
+    // Shuffle each row in the same fashion
+    val rnd = new java.util.Random(42)


We should probably take seed as a parameter. also is Breeze's shuffle not good enough for what you're trying to do?

Added seed param. This is different from Breeze's shuffle in that I am trying to shuffle Array[DenseVector[Double]] which is the output of calling collect on RDD[DV[Double]]

shivaram · 2015-05-19T01:00:19Z

I've addressed the comments and also added the SignedHellinger after SIFTs (before PCA). Note that I had to make a new BatchedHellingerMapper and that this uses DenseMatrix[Float] (we really need the Numeric Transformer stuff urgently)

etrain · 2015-05-19T15:50:06Z

src/main/scala/nodes/learning/BlockLinearMapper.scala

@@ -159,8 +159,15 @@ class BlockLeastSquaresEstimator(blockSize: Int, numIter: Int, lambda: Double =
  override def fit(
      trainingFeatures: RDD[DenseVector[Double]],
      trainingLabels: RDD[DenseVector[Double]]): BlockLinearMapper = {
-    val vectorSplitter = new VectorSplitter(blockSize)


Is it a problem to have a single version of these with None or does it break the Estimator API?

Yeah I tried it and it breaks the api.

etrain · 2015-05-19T16:10:58Z

Awesome stuff @shivaram! I had a few minor things - one about refactoring to reuse some code in the SIFT/LCS pipeline and one about not reimplementing shuffle. If you want to merge this as-is and save for a future PR, I'm good with this!

shivaram · 2015-05-19T18:18:17Z

Alright merging this to hit milestone 0.1 !

Imagenet Pipeline

shivaram and others added 9 commits May 17, 2015 20:16

Add ImageNet SIFT+LCS+FV pipeline

222ac0c

Merge branch 'master' of https://github.com/amplab/keystone into imag…

502b06e

…enet-sift-fv

Fix shape of LCS output

6d11876

Add ability to pass on numFeatures to VectorSplitter

8ef4b58

VOC: Set stepScale back to zero.

c9e21ed

Also pass in numFeatures to block solver

Add option to pass in numRows to ColumnSampler

ae8e72c

Also add utility function to shuffle an Array

Use Top-5 error, re-use samples for PCA,GMM

8564ae0

Also pass in appropriate values to sampler, solver to avoid multiple passes. Also caches the right set of things now

Merge branch 'master' of https://github.com/amplab/keystone into imag…

f8b7718

…enet-sift-fv

Merge branch 'imagenet-sift-fv' of https://github.com/shivaram/keystone…

71c1431

… into imagenet-sift-fv

tomerk reviewed May 18, 2015
View reviewed changes

Address code review, add sqrt to sift

5416570

etrain reviewed May 19, 2015
View reviewed changes

Pull out fisherFeaturizer into a function

7e93dc7

shivaram added a commit that referenced this pull request May 19, 2015

Merge pull request #120 from shivaram/imagenet-sift-fv

e221536

Imagenet Pipeline

shivaram merged commit e221536 into amplab:master May 19, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Imagenet Pipeline #120

Imagenet Pipeline #120

shivaram commented May 18, 2015

tomerk May 18, 2015

shivaram May 19, 2015

shivaram commented May 19, 2015

etrain May 19, 2015

shivaram May 19, 2015

etrain commented May 19, 2015

shivaram commented May 19, 2015

Imagenet Pipeline #120

Imagenet Pipeline #120

Conversation

shivaram commented May 18, 2015

tomerk May 18, 2015

Choose a reason for hiding this comment

shivaram May 19, 2015

Choose a reason for hiding this comment

shivaram commented May 19, 2015

etrain May 19, 2015

Choose a reason for hiding this comment

shivaram May 19, 2015

Choose a reason for hiding this comment

etrain commented May 19, 2015

shivaram commented May 19, 2015