Skip to content

Latest commit

 

History

History
23 lines (19 loc) · 812 Bytes

keysearch_by_year.md

File metadata and controls

23 lines (19 loc) · 812 Bytes

Read data from ES, and count number of occurrences of keywords or keysentences (by page) and group by year

  • Read data previously stored in HDFS
  • Query module: defoe.es.queries.keysearch_by_year
  • Configuration file:
    • preprocessing treatment to select (preprocess)
    • File with keywords or keysentece (data)
    • Examples:
      • preprocess: normalize
      • data: sc_cities.txt
  • Result format:
<YEAR>:
- [<WORD|SENTENCE>, <NUM_WORDS|NUM_SENTENCES>]
- ...
<YEAR>:
...

Note-1: You will need to have the data previously stored in ES using defoe.nls.queries.write_pages_df_es.

Important: We recommend to read also the documentation for writing and reading data from/to PSQL.