Integrate Block Operators more neatly with the DAG #214

tomerk · 2016-01-26T18:41:40Z

Currently so as to be easily chainable with the rest of the code, block operators (such as block solves and block transformers) take a single complete RDD and manually split it into multiple blocks in a way that is hidden from the DAG.

If we add some DAG rewriting rules to detect this and integrate block operators better with the DAG, we should be able to take advantage of optimizations like auto-caching more effectively, and we can allow the block operators to operate on blocks lazily.

etrain · 2016-01-26T20:33:20Z

One thing that makes the block solves tricky is that the blocks are not independent. That is - we pass a Seq[RDD[T]] because the solution to the second block depends on the solution to the first block. It is not clear to me how to capture this in the DAG.

tomerk · 2016-01-26T20:37:48Z

I think it should be able to work the same way the GatherTransformer works: a TransformerNode that takes multiple RDDs together as input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate Block Operators more neatly with the DAG #214

Integrate Block Operators more neatly with the DAG #214

tomerk commented Jan 26, 2016

etrain commented Jan 26, 2016

tomerk commented Jan 26, 2016

Integrate Block Operators more neatly with the DAG #214

Integrate Block Operators more neatly with the DAG #214

Comments

tomerk commented Jan 26, 2016

etrain commented Jan 26, 2016

tomerk commented Jan 26, 2016