To run the application use ./sbt run
.
To run tests use ./sbt test
.
The configuration file at src/main/resources/application.conf
contains the definition of the readers
along with other settings that can be adjusted.
In the abscence of specific requirements regarding frameworks used, I decided to go with Akka streams. Since asynchronous producers are involved, streaming seems to fit the task naturally.
The idea is to adapt CharacterReader
interface to become an Akka Source
and build a stream
to perform the given task.
I only added short comments to indicate classes purpose, otherwise I let signatures speak for themselves.
To finish in a reasonable amount of time, some functionality has been intentionally omitted:
- there is no logging and no error reporting
- support for special characters is limited
- ActorSystem is slow to shutdown after hitting ctrl-c
- the reported results are eventual in nature due to Akka operators' internal buffers, but it is close enough
I followed TDD process closely. The only exception is App
which I consider to be outside of the solution
boundary. It would take significant effort to test it, and it only contains straightforward wiring code.
The code is formatted according to the official style guide with rare exceptions where it makes sense.
CharacterReaderInputStream
is used to ingest elements from CharacterReader
into an Akka stream.
Since Akka already contains InputStream
to Source
adapter and CharacterReader
resembles
InputStream
closely, I decided to make CharacterReader
to InputStream
adapter.
The whole conversion scheme looks like this:
CharacterReader
-> CharacterReaderInputStream extends InputStream
-> InputStreamSource extends Source
WordCount
contains stream building blocks that form the solution.
Two .source
methods allow to get an Akka source given one or more CharacterReader
. Resulting source emits
whole words. It is essential to combine characters into words before merging readers to avoid mingling
characters from different readers.
Setting chunkSize
to 1 is important. By default InputStreamSource
tries to read 8K elements
before emitting anything.
Contains the heart of the solution. .scan
operator is used to fold over the stream and aggregate
words into a word-to-count Map
.
.conflate
operator is used to prevent slow downstream of backpressuring the upstream. It does it
by dropping old word-to-count Map
s and keeping just the latest one.
.extrapolate
operator repeats the latest word-to-count Map
indefenitely to make it always
available for consuming.
Lastly, the flow is zipped with a tick source to only emit when a tick is received.
.sink
formats the word-to-count Map
into a String
and prints it. The formatting part is
extracted outside of the .flow
to ease testing.