Add additional linguistic information to saved queries #74

melvinwevers · 2015-10-01T06:47:53Z

It would be useful to provide the user with some additional linguistic information.

The number of words (tokens) in a query (corpus)
The option to look for count the instances of one particular keyword within the corpus (this would require an additional search window perhaps)

This enables the user to calculate (or have the computer calculate) the normalized frequency of a word within the sub-collection (or entire collection)

jgonggrijp · 2015-10-20T08:26:15Z

For my understanding: are you trying to solve the same problem as @PimHuijnen in #69?
If not, what is the difference?

melvinwevers · 2015-10-20T08:36:03Z

I think @PimHuijnen wants a different way to generate wordclouds.

I would like to have some linguistic information on the saved queries. So just a number that says how many words there are within a query.

Still, I think both require the same calculation, namely how many words are there in a query.

Second part of this reply extracted to #79 by @jgonggrijp

jgonggrijp · 2015-10-20T08:56:28Z

When you say "how many words there are within a query", do you mean

the total length (raw word count) of all matches combined, or
the total number of unique words (repetitions not counted) across all the matches, or
the total number of occurrences of the search terms in the matches, or
something else?

And when you say "normalized frequency of a word within a collection", I presume that you divide one number by another. What would be the numerator and what would be the denominator?

melvinwevers · 2015-10-20T10:37:29Z

The total amount of words (raw word count) found within the documents belonging to a saved query.

I would like to know how the relative occurrence of a word within a collection based on a query.

So, If I would query Vietnam AND Soviet and this would yield 300 documents. I would like to know how many words there are in these 300 documents. let saw 3000

Then I would like to be able to know how often America appeared in this subset of 300 documents. let say 15 times.

Then this frequency would be: 15/3000

This allows me to compare the relative frequency of words within particular corpora.

jgonggrijp · 2015-10-20T10:53:45Z

Ok, clear!

mhkuu · 2015-10-20T11:12:20Z

ElasticSearch can provide the number of words (i.e. tokens) per document, I think: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-core-types.html#token_count (see this issue as well).

Total number of occurrences of a term can be found in the word cloud, or am I missing something?

melvinwevers · 2015-10-20T12:04:13Z

It gives the occurrences per term within the sub-collection/saved query. But only if the word also appears in the word-cloud. You cannot query for a particular word.

mhkuu · 2015-11-19T07:40:37Z

Note to self: the link above is broken; in ElasticSearch 2.0 this is the correct URL: https://www.elastic.co/guide/en/elasticsearch/reference/current/token-count.html.

In the 1.7 branch this is the correct URL: https://www.elastic.co/guide/en/elasticsearch/reference/1.7/mapping-core-types.html#token_count

mhkuu added the enhancement label Oct 1, 2015

jgonggrijp mentioned this issue Oct 20, 2015

Word-in-context information #79

Closed

mhkuu added this to the Milestone 6 milestone Oct 22, 2015

mhkuu added the priority-3 label Oct 22, 2015

mhkuu modified the milestones: Milestone 6, Milestone 7 Dec 3, 2015

mhkuu removed the priority-3 label Dec 3, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add additional linguistic information to saved queries #74

Add additional linguistic information to saved queries #74

melvinwevers commented Oct 1, 2015

jgonggrijp commented Oct 20, 2015

melvinwevers commented Oct 20, 2015

jgonggrijp commented Oct 20, 2015

melvinwevers commented Oct 20, 2015

jgonggrijp commented Oct 20, 2015

mhkuu commented Oct 20, 2015

melvinwevers commented Oct 20, 2015

mhkuu commented Nov 19, 2015

Add additional linguistic information to saved queries #74

Add additional linguistic information to saved queries #74

Comments

melvinwevers commented Oct 1, 2015

jgonggrijp commented Oct 20, 2015

melvinwevers commented Oct 20, 2015

jgonggrijp commented Oct 20, 2015

melvinwevers commented Oct 20, 2015

jgonggrijp commented Oct 20, 2015

mhkuu commented Oct 20, 2015

melvinwevers commented Oct 20, 2015

mhkuu commented Nov 19, 2015