Workflow

Authors: Tóth Tamás, Bergmann Gábor, Semeráth Oszkár

Workflow implementation using Generics in Java

During this laboratory session, the students implement the running example as a configuration of a simple workflow engine that enables the definition of action nodes, and data flow connections between the outputs and inputs of such nodes. To make the workflow engine type safe, the students have to utilize the programming facilities of Java Generics.

Implementation task

The solution that students submit after the session has to meet the following criteria.

The solution has to be based on a simple workflow engine implemented by the students: it must enable the instantiation of (abstract) action nodes that execute predefined tasks, and the creation of connectors that pass data from the output of one node to the input of another node.
A single instance of the workflow engine must support the execution of several instances of a workflow (started with different input values and parameters)
- each of these process instances must go through the same steps of computation
- the engine must tolerate if the execution of multiple process instances overlap in time, or even overtake each other
The workflow engine must provide abstract action nodes and connectors, where action nodes are responsible for
- receiving input through one or more input connectors (see e.g. scalar product)
  - it is sufficient to support up to 3 input connectors, not necessary to implement abstract workflow nodes that perform computations with 4 or more inputs
  - however, workflow nodes with multiple input connectors must not assume that all inputs have the same type; they must allow for each input connector to have a different type
    
    For instance, though not part of the running example, the workflow engine must be able to represent a workflow node that takes a tokenized document as one input and a single token as another, and return a tokenized document that is filtered to only those blocks that contain the given token at least once.
- correlating input tokens that belong to the same process instance (e.g. scalar products computed from the same pair of documents) even if they arrive out of order, using e.g. a process instance identifier
  
  For instance, if there are two process instances (1 and 2) with two inputs (a1 and b1 for process 1, and a2 and b2 for process 2), take care not to mix input a1 with b2, as there is no guarantee that b1 arrives before b2.
- executing the business logic, which is provided to the workflow node implementation as a pure function, either in the form of a function object or an abstract method,
- delivering the outputs produced by the business logic to given output connectors that propagate them further to downstream action nodes. Ensure the type safety of the connectors using Java Generics, so two nodes could be connected only if the second node accepts the output of the first node. The workflow engine must allow the output of a node to be forwarded to multiple downstream nodes, which might not even perform the same kind of computation.
Students may choose to provide more than one abstract action node class, depending on the number of inputs / outputs, or whether the action node denotes the initial or final node of the entire process.
The implementation of the workflow engine shall be independent from any particular workflows; the running example must be merely one possible configuration. Such configurations are determined by specifying which action node instance consumes the output of which other action node, and by providing the business logic of each action node. To demonstrate that the same workflow engine can be configured for different purposes, students must utilize it to execute a different workflow as well (even reusing some of the business logic code), which is described below in the section "Alternative workflow task".
The business logic methods that parametrize these action nodes must be pure functions that only take business data as input and output. Message delivery and correlation must be services provided by the workflow engine, and separated from business logic. In other words, business logic methods must themselves be independent of:
- the identity of any ancestor / descendant action nodes that produce / consume the inputs/outputs of the business logic method
- connectors and the way inputs are received or outputs are delivered by connectors (which will change in subsequent lab sessions)
- process instance identifiers and overlapping process instances
- in multi-input workflow nodes, the order in which the incoming messages carrying the input arguments actually arrive from the input connectors
The type safety of the solution has to be enforced by the compiler: the solution should not contain type casts that can be avoided by the use of generics.
In the current lab exercise, the solution has to be multi-threaded: the behavior of each action node has to be executed on a separate thread. The abstract action node and connectors must be implemented accordingly.

Alternative workflow task (diversity degree list).

Take a single document as input, tokenize it, and emit the degree of diversity list. The latter is defined as a list of positive integers, one for each block of the document (in order), which count the number of unique tokens within the block (so that the same token occurring twice within a block only counts as one). For example, with character-level tokenization, the document This ball here is better would be turned into the list [4, 3, 3, 2, 4], as e.g. in case of better both repeating letters t and e are counted only once.

Requirements

Students shall develop a reusable workflow engine that meets the above outline criteria.

The accompanying documentation must give a brief overview of

the architecture of the workflow engine,
the key concepts of the provided API for specifying workflows
the most important Java classes on the API, and their type parameters.

It shall demonstrate (by giving an overview and linking to code) the application of the API for succinctly implementing both

the original computation task (document similarity estimation),
as well as the alternative task (diversity degree list). additionally, it shall present
how the type safety of nodes and channels are implemented
how the synchronization in nodes with multiple input is implemented

Self-evaluation questions

Explain the following concept: type erasure / reification / PECS / covariance / invariance / contravariance / use-site variance / declaration site variance / heap pollution.
Give three code snippets that each explain a restriction on generic types in Java.
Given an example and a counter-example of a 'pure function'.