Patch extractor #186

AndreG-P · 2020-10-29T04:00:38Z

Updates and patches the extractor to use MOI rather than single identifiers.

Since MathMLTools is published on mvn central, it is easier to patch minor problems directly within the submodule. Note that this submodule is *not* on the master branch but on mathosphere-fix.

The <math> tags (as well as all other xml tags in <text>) are by default escaped when downloading a wiki dump. Hence it is not <math> but <math>. Since the AstVisitor is configured to find <math> (as well as <chem>) tags but <math> is not an xml tag, the visitor was unable to find them.

…t from 2.2.0 (why ever used in travis) to 3.3.0

…... jesus

…hosphere-eval

…d from LaCASt

… not only in the list of relations (those are two different lists)

Sweble is unable to detect 'Test' as the name of the reference in '<ref name=Test>'. Instead it interprets name and Test as attribute keys both with null-values. I implemented a workaround to handle this situation.

…position)

…tence There was a problem that we manipulate the words (merging tokens) and afterwards calling the CoreNLP to build a semantic graph out of it. This made it rather difficult to ask for the correct distances between tokens in the semantic tree. The new solution builds the graph right from the cleaned text (including MATH and LINK placeholders) and after merging words, they got the index of the first noun in the merged list of words or the index of the first word in the list if there is no noun present. Hence, 'jacobi polynomial' has the index for jacobi and not polynomial. When we calculate the semantic-graph distance now, it returns the distance to jacobi.

…graph generation from CoreNLP (obviously)

First, we allow to annotate just a single node in the moi dependency graph now rather than always annotating the entire document (thats useful for specific applications). Second the PosTagger only creates one instance of CoreNLP annotator per language and model path. So if you do not change the language or the model path in one run, there will be only one instance of CoreNLP running. Third, we load the dependency parser directly when instantiating NLP annotator. This allows us to build a dependency graph directly on annotation process and not afterwards. It also means we do not need to load a separate dependency parser from the filesystem (CoreNLP takes care of it now) which heavily reduces load time from 40s to 2s.

…writes the previous tags which results in differences between dependency graph and library of mathtags.

…f dependency graphs

AndreG-P added 19 commits March 11, 2020 16:39

Add MathMLTools as submodule

e4b6245

Since MathMLTools is published on mvn central, it is easier to patch minor problems directly within the submodule. Note that this submodule is *not* on the master branch but on mathosphere-fix.

Update CoreNLP to 3.9.2

7915fa5

Fix noun phrase sequences

150a459

Delete empty mlp folder

85bf663

WiP changes (bug fixes and extensions of sweble wikitext parser)

b567ef3

A lot of progress on sweble parser

36b52e9

Update knowledge structure of sentences and math expressions

6b79b89

First try to switch from identifier to MOI

a664039

Bug fixes in MOI approach

75e9d27

From identifier to MOI

1b9da84

Find unicode math within normal text and treat it as math

aaf80f1

Redevelop the WikiTextParser

c9d497a

Updating mathosphere to handle MOI

587b7c2

Update mlp according to updates of LaCASt

7d7653c

core depends on evaluation, so the module order must be fixed

607f69a

Fixing java 11 issues

dc96f56

Fixing buggy logging in maven-assembly-plugin by massively updating i…

4b6ef37

…t from 2.2.0 (why ever used in travis) to 3.3.0

Delete LaCASt dependency and bring MLP tests back to CI... since 2015…

0fa87f6

…... jesus

AndreG-P self-assigned this Oct 29, 2020

AndreG-P added 10 commits October 30, 2020 01:48

Try fixing numerous of issues when it comes to flink

1f904a6

Alright, at least fixing the flink-kryo-serialization problems in mat…

57ef56a

…hosphere-eval

Adjusting mathosphere to new version of lacast

d00a208

Making MLP ready by updating MOI-graph structure and access to be use…

337210a

…d from LaCASt

Updates according to needs in LaCASt

b35e977

Minor change to improve performance of merge definiens

98b358c

Use merge-definiens also in the new datastructure of dependency trees…

e6f4724

… not only in the list of relations (those are two different lists)

Allow pure wikitext inputs (without page or text-tags)

4eeabca

Workaround for bug in sweble with references

e53849e

Sweble is unable to detect 'Test' as the name of the reference in '<ref name=Test>'. Instead it interprets name and Test as attribute keys both with null-values. I implemented a workaround to handle this situation.

Several bug fixes and improvements in sweble wikitext parser

7b3a3b7

AndreG-P added 19 commits December 6, 2020 01:32

Improve noun-merging (include possessive endings, determiner, and pre…

fa91b5f

…position)

Update a save way to compare positions

578b06f

Remove link-replacements in text since they cause incorrect semantic …

cf225c9

…graph generation from CoreNLP (obviously)

Fix bug that creates multiple MathTags for a single formulae and over…

fe5fdb0

…writes the previous tags which results in differences between dependency graph and library of mathtags.

Fix bug in NP merger

c7d79aa

Fixing score calculation for candidates

3c0f2d2

Update a helper function for MathTag and slightly update interfaces o…

17bead6

…f dependency graphs

Fix calculation and math-tag use persistent hash for formulae

896abea

Fixing issues in sweble producing invalid tex for math tags

8de0115

Catch nullpointer in case the indexed word does not exist

a1fbcb3

Synchronized methods

8e080e5

Add LaCASt's TeX pre-processor to MathTags

c2e776e

Add minor test case to see if ''alpha''=2 is correctly detected

b7f47ba

Merge dashed-nouns

808d45c

Fix formula in definiens text bug

5952d88

Try fixing #194

2217ead

Fixing more errors in old test cases

9b1b8d9

AndreG-P mentioned this pull request Jan 4, 2023

Fix gold #164

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch extractor #186

Patch extractor #186

AndreG-P commented Oct 29, 2020

Patch extractor #186

Are you sure you want to change the base?

Patch extractor #186

Conversation

AndreG-P commented Oct 29, 2020