-
Notifications
You must be signed in to change notification settings - Fork 0
Workflow
Authors: Tóth Tamás, Bergmann Gábor, Semeráth Oszkár
During this laboratory session, the students implement the running example as a configuration of a simple workflow engine that enables the definition of action nodes, and data flow connections between the outputs and inputs of such nodes. To make the workflow engine type safe, the students have to utilize the programming facilities of Java Generics.
The solution that students submit after the session has to meet the following criteria.
- The solution has to be based on a simple workflow engine implemented by the students: it must enable the instantiation of (abstract) action nodes that execute predefined tasks, and the creation of connectors that pass data from the output of one node to the input of another node.
- A single instance of the workflow engine must support the execution of several instances of a workflow (started with different input values and parameters)
- each of these process instances must go through the same steps of computation
- the engine must tolerate if the execution of multiple process instances overlap in time, or even overtake each other
- The workflow engine must provide abstract action nodes and connectors, where action nodes are responsible for
- receiving input through one or more input connectors (see e.g. scalar product)
- it is sufficient to support up to 3 input connectors, not necessary to implement abstract workflow nodes that perform computations with 4 or more inputs
- however, workflow nodes with multiple input connectors must not assume that all inputs have the same type; they must allow for each input connector to have a different type
For instance, though not part of the running example, the workflow engine must be able to represent a workflow node that takes a tokenized document as one input and a single token as another, and return a tokenized document that is filtered to only those blocks that contain the given token at least once.
- correlating input tokens that belong to the same process instance (e.g. scalar products computed from the same pair of documents) even if they arrive out of order, using e.g. a process instance identifier
For instance, if there are two process instances (1 and 2) with two inputs (a1 and b1 for process 1, and a2 and b2 for process 2), take care not to mix input a1 with b2, as there is no guarantee that b1 arrives before b2.
- executing the business logic, which is provided to the workflow node implementation as a pure function, either in the form of a function object or an abstract method,
- delivering the outputs produced by the business logic to given output connectors that propagate them further to downstream action nodes. Ensure the type safety of the connectors using Java Generics, so two nodes could be connected only if the second node accepts the output of the first node. The workflow engine must allow the output of a node to be forwarded to multiple downstream nodes, which might not even perform the same kind of computation.
- receiving input through one or more input connectors (see e.g. scalar product)
- Students may choose to provide more than one abstract action node class, depending on the number of inputs / outputs, or whether the action node denotes the initial or final node of the entire process.
- The implementation of the workflow engine shall be independent from any particular workflows; the running example must be merely one possible configuration. Such configurations are determined by specifying which action node instance consumes the output of which other action node, and by providing the business logic of each action node. To demonstrate that the same workflow engine can be configured for different purposes, students must utilize it to execute a different workflow as well (even reusing some of the business logic code), which is described below in the section "Alternative workflow task".
- The business logic methods that parametrize these action nodes must be pure functions that only take business data as input and output. Message delivery and correlation must be services provided by the workflow engine, and separated from business logic. In other words, business logic methods must themselves be independent of:
- the identity of any ancestor / descendant action nodes that produce / consume the inputs/outputs of the business logic method
- connectors and the way inputs are received or outputs are delivered by connectors (which will change in subsequent lab sessions)
- process instance identifiers and overlapping process instances
- in multi-input workflow nodes, the order in which the incoming messages carrying the input arguments actually arrive from the input connectors
- The type safety of the solution has to be enforced by the compiler: the solution should not contain type casts that can be avoided by the use of generics.
- In the current lab exercise, the solution has to be multi-threaded: the behavior of each action node has to be executed on a separate thread. The abstract action node and connectors must be implemented accordingly.
Take a single document as input, tokenize it, and emit the degree of diversity list. The latter is defined as a list of positive integers, one for each block of the document (in order), which count the number of unique tokens within the block (so that the same token occurring twice within a block only counts as one). For example, with character-level tokenization, the document This ball here is better
would be turned into the list [4, 3, 3, 2, 4]
, as e.g. in case of better
both repeating letters t
and e
are counted only once.
Students shall develop a reusable workflow engine that meets the above outline criteria.
The accompanying documentation must give a brief overview of
- the architecture of the workflow engine,
- the key concepts of the provided API for specifying workflows
- the most important Java classes on the API, and their type parameters.
It shall demonstrate (by giving an overview and linking to code) the application of the API for succinctly implementing both
- the original computation task (document similarity estimation),
- as well as the alternative task (diversity degree list). additionally, it shall present
- how the type safety of nodes and channels are implemented
- how the synchronization in nodes with multiple input is implemented
- Explain the following concept: type erasure / reification / PECS / covariance / invariance / contravariance / use-site variance / declaration site variance / heap pollution.
- Give three code snippets that each explain a restriction on generic types in Java.
- Given an example and a counter-example of a 'pure function'.
- [1] Tutorial on Java Generics.
- [2] Wikipedia: Generics in Java.
- [3] Stackoverflow: What is PECS (Producer Extends Consumer Super)?
- [4] Stackoverflow: Covariance, Invariance and Contravariance explained in plain English?
- [5] Stackoverflow: Why should I care that Java doesn't have reified generics?
- [6] Stackoverflow: How does Java's use-site variance compare to C#'s declaration site variance?
- [7] Blog post: Covariance and Contravariance In Java.
- [8] Blog post: Reified Generics for Java.
- [9] Angelica Langer's Generics FAQ.
- [10] Project Valhalla.
- [11] Joshua Bloch (2008). Effective Java (2nd Edition). Addison-Wesley.
- [12] Maurice Naftalin and Philip Wadler (2006). Java Generics and Collections. O'Reilly Media.
- [13] Wikipedia: Pure function.