Control flow #249

ezyang · 2017-09-27T21:14:35Z

Here is our current thinking on control flow.

The primary use cases for supporting control flow:

Eliminating Python interpreter overhead from executing loops. This is blocked on porting more autograd ops from Python to C++, because right now we are swamped from the cost of executing traced operators by calling back into Python, and it is hard to characterize how much we could win by eliminating top level RNN loops and similar.
Batching. Right now people batch their code by hand, and it would be great if the JIT could let people automatically batch. A DyNet style dynamic batcher won't work for us, because it takes too long to run the Python code to generate the initial trace (which would then be batched.) We need a way of generating traces quickly, and you need control flow for that. And if you want to write a fancy whole function vectorizer, you need an actual control flow.

Non use-cases for supporting control flow:

Persistent kernels. Our understanding is that supporting this would require a lot of fiddly writing of SAS to actually convince CUDA to keep our data in memory, so we are not going to target it.

Frontend implementation possibilities:

Parse Python AST and reinterpret it into the JIT IR, so we execute it. Obviously, you are not going to support all of Python, so you are going to make lots of assumptions (e.g., all builtins haven't been overwritten, function calls are not considered, fancy looping constructs not supported). In effect, you are writing a Python-like DSL, reusing Python's parser as the frontend. The primary reason we are deprioritizing this in the short term is as follows: we want to fallback on regular Python when our DSL doesn't support a given construct, but even detecting that such a situation has occurred requires a bit of engineering work. It is almost better to not claim to be Python, and force end users to learn the new rules of the game, because then they won't try something and expect it to work.
Tensorflow Fold/etc style "higher order" combinators, which capture control flow patterns explicitly. Annoying for users to write, but it might be easier to get off the ground than something that looks more like Python looping. A big problem with this style is that you often need to add a lot of knobs to the interface to make sure that it actually is expressive enough for all of the loop dependencies you're interested in.
A skeevy non-tracing frontend that goes into Python IR. This is the most reasonably implementation strategy if you want to write a paper about whole function vectorization

Backend implementation considerations:

Are you going to add a full, general purpose non-Tensor language into the IR to support looping? If not, you somehow need to figure out how to make your way with Tensors (and Scalars) only. Different to encode "feed this Python list of tokens into the RNN until you're done."
Difficult to say what form of AST the vectorizer is best suited for working with, without actually building the vectorizer first
Absolutely do not want to reimplement the Python interpreter; it's complicated, and there is no reason to believe our version will be faster

Summary:

Control flow as a way to sidestep Python interpreter blocked on removing other Python overheads from JIT trace execution
Parsing AST and supporting a "Python subset" is hard because you can't easily tell if you actually understand some Python code or not (easy to silently do the wrong thing.) A complete break from Python, but having loops (rather than clunky tf.fold) would be easier to implement but difficult for users to understand.

Provide feedback