Skip to content
This repository has been archived by the owner on May 3, 2022. It is now read-only.

OpenGreekAndLatin/OpenHermes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open_Philology_Synonyms

This is the work on automatic creation of thesauri

##Download Module Required Module : wget

##Corps.dictionaries Hold the dictionary entities to download them and convert them to data for the DataModel

##Stop words, you said stopwords ? Because we are dealing with small data for some lexicons or dictionaries, we need to ensure there is not too much noise. For this reason, we use few stopwords, found mainly on discoverysearchengine.

But we can't use a strong stopwords list : we shouldn't avoid some common name such as "good" as it can be the only translation for one word such as bonus. We use a list of simple stopwords, according to Text Analytics 101

Examples of minimal stop word lists that you can use:

  • Determiners - Determiners tend to mark nouns where a determiner usually will be followed by a noun (examples: the, a, an, another)
  • Coordinating conjunctions – Coordinating conjunctions connect words, phrases, and clauses (examples: for, an, nor, but, or, yet, so)
  • Prepositions - Prepositions express temporal or spatial relations (examples: in, under, towards, before)

##Examples

  • Search for lemma Trauma on Greek dictionaries python3 __main__.py --corpus=Greek --search=N,trau\=ma