Core Language Modules

May 3, 2016

(Language-side modules)

This documentation is intended to describe the function and implementation of the four primary language-side modules of the NLU system. As mentioned in the system overview, each of these "core" modules can be extended for a given domain, to fulfill any new tasks or requirements for an application. Hopefully, the language-side modules will not require too many changes; we foresee the primary deltas being new n-tuple templates for the Specializer, as well as new vocabulary and ontology items for the ECG Grammar.

Core UI-Agent

The Core UI-Agent (file) modulates input/output interactions with the user. The UI-Agent receives text/speech as input, and produces an n-tuple as output, which it sends to a Problem Solver. Like the Core Problem Solver, the UI-Agent subclasses the Core-Agent module (file), declaring a channel name and subscribing to the "ProblemSolver" channel. This is how a line of communication is established between the language and action sides of the system.

Internally, the process of producing an n-tuple from text input consists of several steps:

  1. First, the text is input to the ECG Analyzer. In the current implementation, this runs via Jython on a local server, and the UI-Agent connects to it through a proxy Analyzer.
  2. The ECG Analyzer uses an ECG Grammar to parse the text input, and outputs a syntactic and semantic analysis (SemSpec).
  3. The UI-Agent then feeds the SemSpec to the Core Specializer, which uses n-tuple templates to determine which information from the SemSpec is relevant for a given application domain.

Relevant Methods

output_stream(self, tag, message)

Importantly, the Core UI-Agent interacts with the user with the output_stream method. By default, this simply prints the MESSAGE to the Terminal/Bash window, prefaced by the TAG (e.g. "UI-AGENT"), but system integrators could easily subclass the UI-Agent and define a new mode of user interaction, such as speech or even a GUI.


This begins a loop wherein the UI-Agent continually prompts the user for input. This includes "control" input, like 'd' (for debug mode, which prints out n-tuples using the output_stream method), and 'q' (which quits the system), but also normal language input, like "John ran into the room", or "Robot1, push the blue box 3 inches north!"

In the case of language input, the UI-Agent calls process_input, and if a valid JSON n-tuple is produced, it sends the n-tuple to the Problem Solver.

process_input(self, msg)

This is the main control method of the UI-Agent. It takes a String as input, which it feeds to the ECG Analyzer. The UI-Agent then passes the resulting SemSpec (or SemSpecs) to the Specializer, to produce an n-tuple, and returns a JSON-encoded version of the n-tuple.

process_clarification(self, tag, msg, ntuple)

This method is called when the Problem Solver requests clarification on an under-specified input, such as "the red box" when there are multiple red boxes. In this case, the UI-Agent prompts the user for more input, using the message sent back from the Problem Solver, e.g.:

> which red box?

The user's response is then parsed, specialized, and integrated back into the original n-tuple, which is then sent back to the Problem Solver.

callback(self, ntuple)

All classes that inherit from the Core-Agent have a callback method. When an Agent subscribes to a particular channel, it specifies that this callback method should be called when messages are received on that channel. Multiple callbacks can be defined for different channels.

The Core UI-Agent defines only one callback method, which is used when it subscribes to the Problem-Solver channel. In the case of most n-tuples from the Problem Solver, the UI-Agent simply prints the encoded message to the screen using output_stream, such as:

>> FED1_ProblemSolver: Sorry, I don't know what you mean by 'the purple box'.

In the case of a request for clarification, the process_clarification method is called.

ECG Analyzer

The ECG Analyzer (Bryant 2008) uses an ECG Grammar to produce a syntactic and semantic analysis of an utterance, called a Semantic Specification (SemSpec). Below is the SemSpec for the sentence "John saw the box"; note that the nested boxes with a red "S" denote schemas and their roles (the meaning pole), and the boxes with a green "C" denote constructional spans and their features.

(Sample SemSpec for "John saw the box")

In terms of the underlying theory, this SemSpec contains the parameters to produce action or simulated action. Details of the parsing procedure are discussed in Bryant's dissertation, but for our purposes, the most important aspect is that the Analyzer does not change across applications - the same "core" Analyzer can be used for any purpose.

Since the ECG Analyzer is written in Java, our system runs this via Jython on a local server, using a customized Analyzer class (file).

Relevant methods

parse(self, sentence)

This returns a dictionary containing:

  1. 'parse': a list of potential parses, which are simplified versions of the SemSpec feature-structure
  2. 'costs': the unification cost for each of the above parses
  3. 'spans': the constructional spans, which can be used for coreference resolution

issubtype(self, typsystem, child, parent

This returns a boolean if CHILD is a subtype of PARENT in the given TYPESYSTEM:

>>> analyzer.issubtype("SCHEMA", "MotionPath", "Process")

Core ECG Grammar

The Core ECG Grammar is a collection of constructions, schemas, and ontology items, which, once domain vocabulary is added, can be used to produce semantic analyses of text. The grammar is partitioned into "packages" according to particular grammatical and semantic categories. These packages can then be imported as needed for a given domain.

The grammars are found in the ecg-grammars repository and much more information about ECG can be found in the corresponding wiki.

Core Specializer

The Core Specializer receives a SemSpec as input, and produces an n-tuple as output. Considerably more documentation about the Specializer can be found here.