-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathrelated.tex
569 lines (500 loc) · 30.1 KB
/
related.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
In this chapter, we review the body of work most related to this dissertation, and to
the larger goal of extracting knowledge from text.
We cover related work in extracting structures knowledge bases from text
(Knowledge Base Population) and open domain knowledge base population
as two methods for extracting structured and semi-structured knowledge from
text.
We then review work on common-sense reasoning -- one of the main goals of this dissertation.
Lastly, we review work on textual entailment, related to the underlying natural logic
proof system in this dissertation, and conclude with reviewing work on question answering.
%
% KBP + RE
%
\Section{related-kbp}{Knowledge Base Population}
Knowledge base population (KBP) is the task of taking a large body of unstructured text,
and extracting from it a structured knowledge base.
Importantly, the knowledge base has a fixed schema of relations (e.g.,
\ww{born in}, \ww{spouse of}), usually with associated type signatures.
These knowledge bases can then be used as a repository of knowledge for
downstream applications -- albeit restricted to the schema of the knowledge base itself.
In fact, many downstream NLP applications do query large knowledge bases.
Prominent examples include
question answering systems
\cite{key:2001voorhees-trec},
and semantic parsers
\cite{key:1996zelle-semantics,key:2007zettlemoyer-semantics,key:2013kwiatkowski-semantics,key:2014berant-semantics}.
Knowledge base population is in many ways in line with the spirit of this dissertation:
in both cases, the goal is to extract true facts about the world.
However in the case of knowledge base population, these facts are structured bits of information -- e.g.,
subject/relation/object triples -- rather than the open-domain, schemaless information that we
extract in the systems described in this dissertation.
% Talk about relation extraction
Prior work in this area can be categorized into a number of approaches.
The most common of these are \textit{supervised} relation extractors
\cite{key:2004doddington-ace,key:2005zhou-ace,key:2007surdeanu-ace},
\textit{distantly supervised} relation extractors
\cite{key:1999craven-distsup,key:2007wu-distsup,key:2009mintz-distsup,key:2011sun-kbp},
and rule based systems
\cite{key:1997soderland-kbp,key:2010grishman-kbp,key:2010chen-kbp}.
% this is a classification problem
Relation extraction can be naturally cast as a supervised classification problem.
A corpus of annotated relation mentions is collected,
and each of these mentions $x$ is annotated
with the relation $y$, if any, it expresses. The classifier's output
is then aggregated to decide the relations between the two entities.
% costly to annotate
However, annotating supervised training data is generally
expensive to perform at large scale.
Although resources such as Freebase or the TAC KBP knowledge base
have on the order of millions of training tuples over
entities it is not feasible to manually annotate the
corresponding mentions in the text.
This has led to the rise of \textit{distantly supervised} methods, which
make use of this indirect supervision, but do not
necessitate mention-level supervision.
\Fig{img/relation-extraction-setup}{0.5}{re-intro}{
A standard setup for relation extraction for knowledge base population.
}{
The distantly supervised relation extraction setup.
For a pair of entities, we collect sentences which
mention both entities.
These sentences are then used to predict one or more relations between
those entities.
For instance, the sentences containing both \textit{Barack Obama}
and \textit{Hawaii} should support the \rel{state of birth} and
\rel{state of residence} relations.
}
Traditional distant supervision works with the assumption that
for every triple $(e_1, y, e_2)$ in a knowledge base between
a subject $e_1$, a relation $y$, and an object $e_2$, every sentence
containing mentions of $e_1$ and $e_2$ expresses the relation $y$;
exceptions to this assumption are then treated as noise in the training data.
% Example
For instance, taking \reffig{re-intro}, we would create a relation mention $x$ for each
of the three sentences containing \ent{Barack Obama} and \ent{Hawaii}
labeled with \rel{state of birth}, and likewise with
the relation $y=$\rel{state of residence}, creating 6 training examples overall.
Similarly, both sentences involving \textit{Barack Obama} and
\textit{president} would be marked as expressing the \rel{title}
relation.
% Downsides
While this allows us to leverage a large database effectively, it
nonetheless makes a number of na\"{\i}ve assumptions.
First -- explicit in the formulation of the approach -- it assumes that every
mention expresses some relation,
and furthermore expresses the known relation(s).
For instance, the sentence \w{Obama visited Hawaii} would be erroneously
treated as a positive example of the \rel{born in} relation.
Second, it implicitly assumes that our knowledge base is complete:
entity mentions with no known relation are treated as negative examples.
% How these downsides are addressed
The first of these assumptions is addressed by
multi-instance multi-label (MIML) learning, which puts an intermediate latent variable
for each sentence-level prediction, that then has to predict the correct knowledge base
triples \cite{key:2012surdeanu-mimlre}.
\newcite{key:2013min-incomplete} address the second assumption by extending the
MIML model with additional latent variables modeling the uncertainty over our presumed negatives,
while \newcite{key:2013xu-filling}
allow feedback from a coarse relation extractor to augment labels from the knowledge base.
These latter two approaches are compatible with but are
not implemented in this work.
% -- KBC
Lastly, there are approaches to inferring new facts in a knowledge base that do not
make use of text at all, but rather aim to complete missing facts from the knowledge
base based on patterns found from known facts.
For example, in the simplest case, if we know that a person usually lives in the state
of his employer, we can with high confidence predict a missing state of residence if
we know a person's employer, and the employer's state of headquarters.
% Richard
A natural technique for this is to embed entities and relations into an
appropriate vector space, and predict which relations should hold
between entities given the structure of this space.
For example,
\newcite{key:2011bordes-completion} and
\newcite{key:2012jenatton-completion} on knowledge base completion based on vector-space
approaches; this is subsequently extended by \newcite{key:2013chen-completion} to
use Neural Tensor Networks, predicting unseen relation triples in
WordNet and Freebase.
% Universal schemas
\newcite{key:2012yao-schemas} and \newcite{key:2013riedel-schemas}
present a related line of work, inferring new relations between
Freebase entities via inference over both Freebase and
OpenIE relations (see \refsec{related-openie}).
This work appeals to low-rank matrix factorization methods:
in the simplest case, entity pairs form the rows of a matrix, and possible relations
form the columns.
The entries of the matrix are 1 is a given relation holds between the given entity pair,
and 0 otherwise.
This matrix is then factored into a low-rank approximation, in effect trying to compress
the information in the knowledge base into a more concise representation.
Since this is an approximate factorization, some entries in the matrix which were previously
0's now have a non-zero value -- these are taken to be new facts in the knowledge base.
%
% OPEN IE
%
\Section{related-openie}{Open Information Extraction}
% Talk about OpenIE
One approach to broad-domain knowledge extraction is \textit{open information extraction}
(Open IE; \newcite{key:2007yates-textrunner}).
Traditional relation extraction settings usually specify a domain of relations they're
interested in (e.g., place of birth, spouse, etc.), and usually place restrictions on the
types of arguments extracted (e.g., only people are born in places).
Open IE systems generalize this setting to the case where both the relation and the arguments
are represented as plain text, and therefore can be entirely open-domain.
For example, in the sentence \ww{the president spoke to the Senate on Monday}, we might extract
the following triples:
\begin{displayquote}
(\ww{president}; \ww{spoke to}; \ww{Senate}) \\
(\ww{president}; \ww{spoke on}; \ww{Monday})
\end{displayquote}
This representation is a step closer to being able to store open-domain information
and common sense facts, and can be thought of as a sort of compromise between
the more structured knowledge base population work, and this dissertation's stance
of representing facts entirely as unstructured text.
One line of work in this area are the early UW OpenIE systems: for example,
TextRunner \cite{key:2007yates-textrunner} and
ReVerb \cite{key:2011fader-reverb}.
In both of these cases, an emphasis is placed on \textit{speed} --
token-based surface patterns are extracted that would correspond to
open domain triples.
ReVerb, for instance, heuristically extracts relations centered around maximally expanded
relations centered around a main verb, and attempts to find the best arguments
for that relation subject to certain constraints.
A confidence function is then trained over a set of features on the extractions, so that
the system can provide calibrated confidence values.
With the introduction of fast dependency parsers,
\cite{key:2010wu-openie} extracts triples from
learned dependency patterns (in one mode), or POS-tagged surface features (for additional
speed).
Building upon this, Ollie \cite{key:2012mausam-ollie} also learns patterns from dependency
parses, but with the additional contributions of (1) allowing for extractions which are
mediated by nouns or adjectives, not just verbs; and (2) considering context more carefully
when extracting these triples.
Exemplar \cite{key:2013mesquita-exemplar} adapts the open IE framework to
$n$-ary relationships similar to semantic role labeling, but without the
associated expensive machinery of a full-blown semantic role labeling (SRL) system.
Like Exemplar, OpenIE 4 predicts $n$-ary relations (as well as nominal relations) by including
a lightweight SRL system.
In another line of work, The Never Ending Language Learning project (NELL) \cite{key:2010carlson-nell}
iteratively learns more facts from the internet
from a seed set of examples.
In the case of NELL, the ontology is open-domain but fixed, and the goal becomes to learn
all entries in the domain.
For example, learning an extended hypernymy tree (\ww{Post University} is a University);
but also more general relations (\ww{has acquired}, \ww{publication writes about}, etc).
Open IE has been shown to be useful in a number of downstream NLP tasks.
One line of work makes use of these open-domain triples to perform
the fixed-schema relation extraction we discussed in \refsec{related-kbp}.
For example, \cite{key:2013soderland-kbp} constructs a mapping from open-domain
to structured triples in only 3 hours of human effort, and achieves results
which are competitive with (and higher precision than) custom-trained
extractors.
\newcite{key:2010soderland-adapting} use ReVerb extractions to
enrich a domain-specific ontology.
Another line of work makes use of these triples in the context of knowledge
base completion, as alluded to earlier \cite{key:2012yao-schemas,key:2013riedel-schemas}.
In this case, we are learning a sort of soft mapping from Open IE triples to
structured triples implicitly in the matrix factorization.
In a related vein, OpenIE has been used directly for learning entailment patterns.
For example, \newcite{key:2010-schoenmackers-horn} and \newcite{key:2011berant-entailment}
learn valid open-domain entailment patterns from OpenIE extractions.
\newcite{key:2011berant-entailment}, for instance, presents a method for learning
that the relation \ww{be epidemic in} should entail \ww{common in} should entail
\ww{occur in}.
Another prominent use-case for open domain triples is
question answering and search \cite{key:2014fader-openqa,key:2011etzioni-nature}.
In \newcite{key:2014fader-openqa}, a set of question paraphrases is mined from a large
question answering corpus; these paraphrases are then applied to a new question until
it matches a known Open IE relation.
In each of these cases, the concise extractions provided by Open IE allow
for efficient symbolic methods for entailment, such as Markov logic
networks or matrix factorization.
%
% COMMON SENSE
%
\Section{related-commonsense}{Common Sense Reasoning}
% -- GOFAI
% (intro)
The goal of tackling common-sense reasoning is by no means novel in
itself either.
Among others, work by \newcite{key:1980reiter-logic} and \newcite{key:1980mccarthy-circumscription}
attempts to reason about the truth of a consequent in the absence of strict logical entailment.
Reiter introduces a notion of \textit{default reasoning}, where facts and entailments can be
taken as true unless there is direct evidence to the contrary.
For example, \ww{birds fly} ($\textrm{bird}(x) \Rightarrow \textrm{flies}(x)$) is a statement which
is clearly false in the strictest sense -- penguins and ostriches don't fly -- but nonetheless
a statement that we would consider true.
Reiter argues that common-sense statements of this sort should be considered true by default in
a formal reasoning engine, unless there is explicit evidence against them.
McCarthy argues a related point on circumscription: that in the absence of evidence to the contrary,
we should assume that all prerequisites for an action are met.
For instance, if someone claims to have a rowboat, we should assume that they have oars, the oars
fit into the rowlocks, the boat is without holes, etc.; unless explicitly told otherwise.
In another line of work, \newcite{key:1989pearl-probabilistic} presents a framework for
assigning confidences to inferences which can be reasonably assumed.
More recently, work by \newcite{key:2002schubert-commonsense} and
\newcite{key:2009durme-commonsense} approach common sense reasoning
with \textit{episodic logic}.
Episodic logic is a first-order logic which focuses primarily on time-bounded
situations (events and states), rather than the time-insensitive propositions
of conventional first-order logic.
For example, a sentence \ww{John kicked Pluto} could be interpreted as:
\begin{equation*}
\exists e1 : [ e1 ~ \textrm{before Now1} ] \left[ [ \textrm{John kick Pluto} ] **~ e1 \right]
\end{equation*}
That is, there is an event $e1$ that happened before now (the reference time), and the
sentence \ww{John kick Pluto} characterizes or fully describes $e1$.
The operator $**$ has an analogous operator $*$ for cases where the statement (or formula)
is \textit{true} in the event, but does not fully characterize it
For example:
\begin{equation*}
\exists e1 : [ e1 ~ \textrm{before Now1} ] \left[ [ \textrm{John has a leg} ] **~ e1 \right]
\end{equation*}
Efforts like MIT's ConceptNet \cite{key:2011tandon-conceptnet} and Cyc \cite{key:1995lenat-cyc}
aim to create a comprehensive, unified knowledge base of common-sense facts.
For instance, ConceptNet has facts like (\ww{learn}; \ww{motivated by goal}; \ww{knowledge}).
These facts are collected from a number of sources, both automatic (e.g., Open IE) and hand-curated
(e.g., WordNet).
Cyc catalogs in a structured way facts like: There exists a female animal for every Chordate
which is described by the predicate \textit{mother}.
YAGO \cite{key:2007suchanek-yago} is a popular knowledge base of roughly 5M facts covering
both Freebase-style factoids and some common sense topics
The Component Library \cite{key:2001barker-component} is a framework for aggregating these sorts of
broad-domain knowledge bases, including relations between entities and events (e.g.,
\ww{happens before}).
This is, of course, not an exhaustive list.
All of these knowledge bases are based on significant hand-curation.
The systems described in this dissertation differ from this earlier work in a few key respects:
First, the facts inferred by the system are not hand coded, but rather latent in a large
unstructured body of text.
Second, we do not appeal to any formal logical language beyond the syntax of the text we are processing.
Third, we operate at scale: our goal is to store and reason about hundreds of millions of facts (or more),
rather than capturing a deep understanding any particular premise.
%
% RTE
%
\Section{related-rte}{Textual Entailment}
% -- RTE
This dissertation is in many ways related to work on
recognizing textual entailment (RTE) -- e.g.,
\newcite{key:2010-schoenmackers-horn}, \newcite{key:2011berant-entailment}.
Textual Entailment is the task of determining if a given premise sentence
entails a given hypothesis.
That is, if without additional context, a human would infer that the hypothesis
is true if the premise is true.
For instance:
\begin{displayquote}
\ww{I drove up to San Francisco yesterday} \\
\ww{I was in a car yesterday}
\end{displayquote}
Although the definition of entailment is always a bit fuzzy -- what if I drove a train
up to SF, or perhaps a boat? -- nonetheless a reasonable person would assume that if
you drove somewhere you were in a car.
This sort of reasoning is similar to the goal of this dissertation: given premises, infer
valid hypothesis to claim as true.
However, in RTE the premise set tends to be very small (1 or 2 premises), and the domain
tends to have less of a focus on common-sense or broad domain facts.
Work by \newcite{key:2013lewis-entailment}
approaches entailment by constructing a CCG parse of the query,
while mapping questions which are paraphrases of each other to the
same logical form using distributional relation clustering.
\newcite{key:2008hickl-rte} approaches the problem by segmenting the premise and
the hypothesis into a set of \textit{discourse commitments} that a reader would
accept upon reading the sentence.
This can then be used to determine whether the commitments in the premise match those in
the hypothesis.
For example, the following premise, taken from the RTE-3 challenge:
\begin{displayquote}
``The Extra Girl'' (1923) is a story of a small−town girl, Sue Graham (played by Mabel Normand) who comes to
Hollywood to be in the pictures.
This Mabel Normand vehicle, produced by Mack Sennett, followed earlier
films about the film industry and also paved the way for later films about Hollywood, such as King Vidor’s
``Show People'' (1928).
\end{displayquote}
would yield the commitments:
\begin{displayquote}
T1. ``The Extra Girl'' [took place in] 1923. \\
T2. ``The Extra Girl'' is a story of a small−town girl. \\
T3. ``The Extra Girl'' is a story of Sue Graham. \\
T4. Sue Graham is a small−town girl. \\
T5. Sue Graham [was] played by Mabel Normand. \\
$\dots$ \\
T10. ``The Extra Girl'' [was] produced by Mack Sennett. \\
T11. Mack Sennett is a producer.
\end{displayquote}
\noindent This could then be used to easily justify the hypothesis:
\ww{``The Extra Girl'' was produced by Sennett}.
% Natural Logic
This dissertation focuses considerably on natural logic (more background on natural logic is given in \refchp{natlog}).
The primary application of natural logic in prior work has been
for textual entailment and related fields.
For example, work by MacCartney
\cite{key:2007maccartney-natlog,key:2008maccartney-natlog,key:2009maccartney-natlog}
has applied natural logic to the RTE challenges described above.
This work approaches the entailment task by inducing an alignment between the premise and the
hypothesis, classifying entailed pairs into natural logic relations, and then inferring
from this alignment and the semantics of natural logic what the correct entailment
relation is between the sentences.
This line of work has been extended since in, e.g., \newcite{key:2012watanabe-natlog}.
Although this dissertation adopts the same formalism as the earlier textual entailment work,
it differs substantially in that the task is less about classifying whether a premise entails
a consequent, and more about finding a supporting premise in a very large collection of
candidates.
More recently, work by \newcite{key:2013bowman-natlog}
(and \newcite{key:2015bowman-natlog} has explored using vector spaces
as a meaning representation for natural language semantics.
Entailment was chosen as a representative task to evaluate natural language
understanding, and since natural logic is inherently lexical, it was chosen
as the formalism against which to test these models.
Results from these papers show that, indeed, appropriately composing
vector-space representations of words can yield systems which capture a
notion of entailment with high accuracy.
Since the discontinuation of the RTE challenges, there have been a number of
new efforts in creating datasets for textual entailment against which
systems can be trained.
For example, \newcite{key:2014marelli-sick} produce a dataset of entailment
pairs from image captions which is an order of magnitude larger than the RTE
datasets.
Subsequently, \newcite{key:2015bowman-natlog} created a very large corpus
of human-annotated entailment pairs, again from image captions.
This dataset has since been used as the de-facto large-scale dataset on which
to evaluate entailment models, and most prominently neural sequence models for
entailment.
For example, \newcite{key:2015rocktaschel-snli} construct a recurrent neural network
with attention for entailment; \newcite{key:2016long-snli} and others
follow up on that work with a new attention mechanism.
%
% QA
%
\Section{related-qa}{Question Answering}
Many parts of this dissertation are cast as question answering tasks.
Although the methods are different, the goal is the same: given a body of text,
be able to answer meaningful questions about the text -- in our case, simply
whether a new candidate fact is true or false.
Related work on question answering can be segmented into two broad categories:
Early work focused on question answering directly over the text of a
source corpus.
For example, taking as input a Wikipedia article, and using that text to answer
questions about the article's subject.
Later work, in contrast, focuses on using structured knowledge bases for question
answering with an emphasis on moving in the direction of
complicated compositional queries.
% Direct over text Q/A
A representative task in the first line of work -- making direct use of text
for question answering -- is the TREC Q/A competitions
\cite{key:2001voorhees-trec,key:2006voorhees-trec,key:2007dang-trec,key:2008voorhees-trec}.
An early example of such a system is Textract \cite{key:1999srihari-trec}, based around
a pipeline of information extraction components.
The system would take a question (e.g., \ww{who is the president of the United States?})
and convert it into a target named entity type (i.e., PERSON), and a set of keywords
(i.e., \ww{president}, \ww{United States}).
The keywords are then fed into a query expansion module to expand the domain of candidate
premises returned, and the resulting sentences are scanned for an entity of the right
named entity type based on some heuristics.
This can be cast as a simplistic version of the two-stage Q/A approach of running information
retrieval (IR), and then classifying the results into whether they are an answer to the
question.
Subsequent work has expanded on this sort of IR+classify approach to, e.g., use language
modeling to assist in reranking results \cite{key:2006chen-trec}, incorporate
more sophisticated methods for learning to rank \cite{key:2007cao-trec},
incorporating additional knowledge sources such as Wikipedia \cite{key:2004ahn-trec},
etc.
% -- Logic-based QA
There has also been work on incorporating more structured logical
reasoning for question answering.
The COGEX system \cite{key:2003moldovan-trec} incorporates a theorem
prover into a QA system, boosting overall performance.
Similarly, Watson \cite{key:2010ferrucci-watson} incorporates
logical reasoning components.
In COGEX, an input sentence is first split into a shallow logical form -- fundamentally
similar to the Open IE approaches described above -- using a set of 10 rules to cover
most common predicate cases.
This shallow logical form is then expanded via the WordNet ontology, and then added
to the set of axioms known to the theorem prover (alongside \textit{a priori} knowledge).
Empirical results show that this system could answer 206 of 500 TREC questions, 98 of which
(20\%) were not answered by the shallow Q/A module.
In a new spin on logical methods for question-answering,
\newcite{key:2015hixon-aristo} proposes
a dialog system to augment a knowledge graph used for answering science exam
questions.
This is in a sense an oracle measure, where a human is consulted while answering
the question to substantially improve accuracy.
Furthermore, they show that their additional extractions help
answer questions other than the one the dialog was collected for;
that is, human-in-the-loop Q/A improves accuracy even on unseen questions.
The other main line of work uses structured knowledge bases for question answering.
The main line of work here is in the tradition of semantic parsing
\cite{key:2005kate-semantics,key:2005zettlemoyer-semantics,key:2011liang-semantics}.
An input query is parsed into a structured logical representation; this representation
can then be run against a knowledge base to retrieve the answer to the query.
For example, \newcite{key:2005zettlemoyer-semantics} and \newcite{key:2011liang-semantics}
parse complex, compositional geographic
queries into a logical form that can be evaluated against a database of geographic facts.
A complex sentence like:
\begin{displayquote}
\ww{What states border the state that borders the most states?}
\end{displayquote}
Would be parsed into a logical form (effectively, a structured query) like:
\begin{equation*}
\lambda x . state(x) \land borders(x, \argmax( \lambda y . state(y), \lambda y . count (\lambda z . state(x) \land borders(y, z))))
\end{equation*}
\newcite{key:2005zettlemoyer-semantics} approach the problem by learning a synchronous
combinatory categorial grammar (CCG) \cite{key:2004bos-ccg}
parse over lexical items and logical form fragments,
given a fixed grammar.
Until convergence, the algorithm performs the following two steps:
\begin{itemize}
\item Expand the set of lexical mappings
known to the algorithm (e.g., \ww{state} means $\lambda x . state(x)$).
This is done by performing the following on each example in the training set:
(1) Generate all possible lexical mappings, given the logical form and surface
form of the example.
(2) Parse the example, given the current learned parameters.
(3) Add to the set of lexical mappings all lexical mappings in the parsed
logical form.
\item Learn a new parameter vector for the parser, given the lexicon so far and the
training set.
\end{itemize}
Work by \newcite{key:2011liang-semantics} builds on this approach by removing the need for
an annotated logical form, and instead running a two-step EM-like learning algorithm
(on a fixed context free grammar rather than a CCG grammar):
First, a set of possible parses for each example are induced given the current parameter vector.
The parses which evaluate to the correct \textit{answer} are treated as correct and re-normalized
proportional to their original score.
This is then used as a training signal to learn a new parameter vector.
This same type of approach can be applied more broadly to more practical types of questions.
For example, \newcite{key:2013berant-sempre} apply a similar approach to answering
questions about entities in Freebase.
In this case, utterances are factoid-style Q/A questions similar to the TREC questions;
e.g., \ww{what college did Obama go to?}.
This is then parsed into a logical form, and executed against the Freebase knowledge graph
to produce the desired answer: Columbia and Harvard.
Subsequent work has also applied semantic parsing to even more broad-domain representations.
For example, work by \newcite{key:2009artzi-amr} uses semantic parsing methods to parse into
the Abstract Meaning Representation \cite{key:2013banarescu-amr} -- a
broad-domain structured meaning representation.
By the mid 2010's, neural methods gained popularity as a means of bypassing a meaning representation
altogether in favor of end-to-end learning of semantics in a vector space.
For example, \newcite{key:2014bordes-qa} applies neural methods
to embed knowledge base fragments and natural language questions in vector spaces,
and answer questions based on neural network methods.
\newcite{key:2015hermann-qa} tackle a reading comprehension tasks -- similar in spirit
to the KBP task described earlier -- using a neural network to ``read'' the supporting
paragraph and answer questions about it.
In this case, a recurrent neural network is trained to consume a paragraph of text, and then
consume a natural language questions.
While consuming the question, the model is able to use soft attention to look at specific tokens
in the supporting paragraph, which can help it in its decision.
Although these methods show early promise, at the time of writing it remains to be seen
whether a vector space can capture the level of semantics necessary to perform
open-domain reasoning at the level that we as humans perform it.
However the line of work is exciting, in no small part because it offers an alternative
meaning representation and reasoning technique to that presented in this dissertation.
We have reviewed the main lines of work supporting this dissertation.
Some of these -- e.g., knowledge base population, Open IE, and question answering -- are
alternative approaches to extracting knowledge from text.
Others -- e.g., textual entailment and common-sense reasoning -- are tools and sources of insight
which will be useful in our effort.
We will now review natural logic -- the central formalism behind much of the work described here --
and subsequently describe the systems that have been built to tackle common sense reasoning and
question answering.