Skip to content

Core Language Modules

Sean Trott edited this page May 3, 2016 · 50 revisions

(Language-side modules)

This documentation is intended to describe the function and implementation of the four primary language-side modules of the NLU system. As mentioned in the system overview, each of these "core" modules can be extended for a given domain, to fulfill any new tasks or requirements for an application. Hopefully, the language-side modules will not require too many changes; we foresee the primary deltas being new n-tuple templates for the Specializer, as well as new vocabulary and ontology items for the ECG Grammar.

Core UI-Agent

The Core UI-Agent (file) modulates input/output interactions with the user. The UI-Agent receives text/speech as input, and produces an n-tuple as output, which it sends to a Problem Solver. Like the Core Problem Solver, the UI-Agent subclasses the Core-Agent module (file), declaring a channel name and subscribing to the "ProblemSolver" channel. This is how a line of communication is established between the language and action sides of the system.

Internally, the process of producing an n-tuple from text input consists of several steps:

  1. First, the text is input to the ECG Analyzer. In the current implementation, this runs via Jython on a local server, and the UI-Agent connects to it through a proxy Analyzer.
  2. The ECG Analyzer uses an ECG Grammar to parse the text input, and outputs a syntactic and semantic analysis (SemSpec).
  3. The UI-Agent then feeds the SemSpec to the Core Specializer, which uses n-tuple templates to determine which information from the SemSpec is relevant for a given application domain.

Relevant Methods

output_stream(self, tag, message)

Importantly, the Core UI-Agent interacts with the user with the output_stream method. By default, this simply prints the MESSAGE to the Terminal/Bash window, prefaced by the TAG (e.g. "UI-AGENT"), but system integrators could easily subclass the UI-Agent and define a new mode of user interaction, such as speech or even a GUI.

prompt(self)

This begins a loop wherein the UI-Agent continually prompts the user for input. This includes "control" input, like 'd' (for debug mode, which prints out n-tuples using the output_stream method), and 'q' (which quits the system), but also normal language input, like "John ran into the room", or "Robot1, push the blue box 3 inches north!"

In the case of language input, the UI-Agent calls process_input, and if a valid JSON n-tuple is produced, it sends the n-tuple to the Problem Solver.

process_input(self, msg)

This is the main control method of the UI-Agent. It takes a String as input, which it feeds to the ECG Analyzer. The UI-Agent then passes the resulting SemSpec (or SemSpecs) to the Specializer, to produce an n-tuple, and returns a JSON-encoded version of the n-tuple.

process_clarification(self, tag, msg, ntuple)

This method is called when the Problem Solver requests clarification on an under-specified input, such as "the red box" when there are multiple red boxes. In this case, the UI-Agent prompts the user for more input, using the message sent back from the Problem Solver, e.g.:

> which red box?

The user's response is then parsed, specialized, and integrated back into the original n-tuple, which is then sent back to the Problem Solver.

callback(self, ntuple)

All classes that inherit from the Core-Agent have a callback method. When an Agent subscribes to a particular channel, it specifies that this callback method should be called when messages are received on that channel. Multiple callbacks can be defined for different channels.

The Core UI-Agent defines only one callback method, which is used when it subscribes to the Problem-Solver channel. In the case of most n-tuples from the Problem Solver, the UI-Agent simply prints the encoded message to the screen using output_stream, such as:

>> FED1_ProblemSolver: Sorry, I don't know what you mean by 'the purple box'.

In the case of a request for clarification, the process_clarification method is called.

ECG Analyzer

The ECG Analyzer (Bryant 2008) uses an ECG Grammar to produce a syntactic and semantic analysis of an utterance, called a Semantic Specification (SemSpec). Below is the SemSpec for the sentence "John saw the box"; note that the nested boxes with a red "S" denote schemas and their roles (the meaning pole), and the boxes with a green "C" denote constructional spans and their features.

(Sample SemSpec for "John saw the box")

In terms of the underlying theory, this SemSpec contains the parameters to produce action or simulated action. Details of the parsing procedure are discussed in Bryant's dissertation, but for our purposes, the most important aspect is that the Analyzer does not change across applications - the same "core" Analyzer can be used for any purpose.

Since the ECG Analyzer is written in Java, our system runs this via Jython on a local server, using a customized Analyzer class (file).

Relevant methods

parse(self, sentence)

This returns a dictionary containing:

  1. 'parse': a list of potential parses, which are simplified versions of the SemSpec feature-structure
  2. 'costs': the unification cost for each of the above parses
  3. 'spans': the constructional spans, which can be used for coreference resolution

issubtype(self, typsystem, child, parent

This returns a boolean if CHILD is a subtype of PARENT in the given TYPESYSTEM:

>>> analyzer.issubtype("SCHEMA", "MotionPath", "Process")
True

Core ECG Grammar

The Core ECG Grammar is a collection of constructions, schemas, and ontology items, which, once domain vocabulary is added, can be used to produce semantic analyses of text. The grammar is partitioned into "packages" according to particular grammatical and semantic categories. These packages can then be imported as needed for a given domain.

The grammars are found in the ecg-grammars repository and much more information about ECG can be found in the corresponding wiki.

Core Specializer

As described here (Khayrallah, Trott, & Feldman 2015), the Specializer receives a SemSpec as input and produces an n-tuple as output. An n-tuple contains task-specific semantic information, is focused around action specifications (e.g., move, push, etc.) and their parameters, and functions as a shared communication language between all agents in our Natural Language Understanding system. In terms of implementation, n-tuples are JSON structures mapping shared keys to values.

Below is an n-tuple for the sentence "John saw the box."; note the similarities to the SemSpec above.

(N-tuple for the sentence "John saw the box.")

The Core Specializer uses n-tuple templates to determine which aspects of the SemSpec to extract. More information about the actual design of n-tuple templates can be found on the page describing the Core Communication Modules. This section is dedicated to describing the process by which the Core Specializer produces an n-tuple, and the methods used to do this.

The Core Specializer file can be found here, and contains additional documentation on the methods.

Below is a description of the most important methods in the CoreSpecializer, as well as a walkthrough of how an n-tuple for the sentence "John saw the box.". ********

Relevant methods

###specialize(self, fs)

This is a bound (class) method that takes a SemSpec or "Feature-Structure" (fs) as input, and outputs an n-tuple. Of course, there is a considerable amount of processing that goes on between the call to specialize and the output of an n-tuple.

First, the Core Specializer checks whether the SemSpec is an utterance with discourse information; if it's not (e.g., a sentence fragment like "the red box"), the Specializer calls specialize_fragment (see below), and produces a fragmented n-tuple.

Otherwise, the Specializer identifies the "mood" of the utterance using Discourse information (e.g., "Declarative", "Imperative", etc.), identifies the corresponding mood template, then routes the "content" of the utterance to the specialize_event method.

###specialize_event(self, content)

This takes in an EventDescriptor as input, and produces an n-tuple describing that event. Again, this consists of multiple component steps, but at the highest level, the method identifies the corresponding template for the type of EventDescriptor using the event templates. In most cases, this is a normal EventDescriptor, but in the case of conditional statements, it is a "ConditionalED".

Then, for each key/value pairing in the event template, the Core Specializer calls fill_value to fill in the template.

###fill_value(self, key, value, input_schema)

This is one of the most important methods in the Core Specializer, since it defines procedures by which the declarative templates can guide the Specializer's actions. The method takes as input a key name, the template value, and the schema to extract the information from. A series of conditions are then evaluated; the template value is investigated to determine how to represent the final output for this key in the n-tuple. The key, meanwhile, corresponds to the same-named role in the schema.

For example, if the key is "eventProcess", and the value is the dictionary...

{'parameters': 'eventProcess'}

...the CoreSpecializer knows to call the fill_parameters method (see below) on the contents of the eventProcess role.

If the key is "protagonist", and the value is the dictionary...

{'descriptor': 'objectDescriptor'}

...the CoreSpecializer knows to the call the get_objectDescriptor method (see below) on the contents of the protagonist role.

###fill_parameters(self, eventProcess)

This method identifies the corresponding parameter template

###get_objectDescriptor(self, item, resolving=False)

This method identifies the corresponding descriptor template (in this case, the objectDescriptor template). In our system, objectDescriptors are general descriptions of referents in the SemSpec (an "RD", or "Referent Descriptor"). The Core Specializer has no world model, so it can't actually determine a real-world referent, but it can package the information in a simple, accessible way, so that the Problem Solver can determine the real-world referent (or, in some cases, request clarification).

Besides simply filling in the values from the RD (ontological-category, givenness, gender, etc.), this method performs two key functions:

  1. Pointer inversion: it crawls the SemSpec and finds modifiers, such as adjectives or prepositional-phrases, that point to a given RD, and then incorporates this information into an objectDescriptor.
  2. Referent resolution: in the case of a pronoun or one-anaphora, the Core Specializer searches through its stack of previous referents, and attempts to unify the current objectDescriptor with a previous referent.

For example, "the red box" might output an objectDescriptor that resembles the following:

{color: red
type: box,
givenness: uniquelyIdentifiable
number: singular}

###specialize_fragment(self, fs)

This specializes the SemSpec for a sentence fragment, such as "the red one", or another non-discourse utterance. The Core Specializer has procedures built in for the majority of the potential meanings in the core grammar. However, system integrators might want to subclass the Core-Specializer and extend this method to cover domain-specific meanings as well.