Skip to content

AREkit-based application for a granular view onto sentiments between entities in a mass-media texts written in Russian

License

Notifications You must be signed in to change notification settings

c0demon/ARElight

 
 

Repository files navigation

ARElight 0.22.0

👉 DEMO 👈

Supported Languages: Russian

ARElight is an application for a granular view onto sentiments between mentioned named entities in a mass-media texts written in Russian.

This project is commonly powered by AREkit framework.

for Named Entity Recognition in text sentences, we adopt DeepPavlov (BertOntoNotes model).

Dependencies

  • arekit == 0.22.0
  • gensim == 3.2.0
  • deeppavlov == 0.11.0
  • rusenttokenize
  • brat-v1.3 [github]
  • CUDA

Installation

Docker verion (Quick)

Supported Languages: Russian

Other Requirements: NVidia-docker

docker import nicolay-r-arelight-0.1.1.tar 
docker run --name arelight -itd --gpus all nicolay-r/arelight:0.1.1
docker attach arelight
service apache2 start

Supported Languages: Russian

Supported Languages: Russian

Full

  • ARElight:
# Install the required dependencies
pip install -r dependencies.txt
# Donwload Required Resources
python3.6 download.py
  • BRAT: Download and install library, and run standalone server as follows:
./install.sh -u
python standalone.py

Usage: proceed with the examples folder.

Inference

Supported Languages: Russian

Infer sentiment attitudes from a mass-media document(s).

Using the BERT fine-tuned model version:

python3.6 infer_texts_bert.py --from-files data/texts-inosmi-rus/e1.txt \
    --labels-count 3 \
    --terms-per-context 50 \
    --tokens-per-context 128 \
    --text-b-type nli_m \
    -o output/brat_inference_output

Supported Languages: Russian

Using the pretrained PCNN model (including frames annotation):

python3.6 infer_texts_nn.py --from-files data/texts-inosmi-rus/e1.txt \
    --model-name pcnn \
    --model-state-dir models/ \
    --terms-per-context 50 \
    --stemmer mystem \
    --entities-parser bert-ontonotes \
    --frames ruattitudes-20 \
    --labels-count 3 \
    --bags-per-minibatch 2 \
    --model-input-type ctx \
    --entity-fmt hidden-simple-eng \
    --emb-filepath data/news_mystem_skipgram_1000_20_2015.bin.gz \
    --synonyms-filepath data/synonyms.txt \
    -o output/brat_inference_output

Serialization

Supported Languages: Any

For the BERT model:

python3.6 serialize_texts_bert.py --from-files data/texts-inosmi-rus/e1.txt 
    --entities-parser bert-ontonotes \
    --terms-per-context 50 

Supported Languages: Russian by default (depends on embedding)

For the other neural networks (including embedding and other features):

python3.6 serialize_texts_nn.py --from-files data/texts-inosmi-rus/e1.txt \
    --entities-parser bert-ontonotes \
    --stemmer mystem \
    --terms-per-context 50 \
    --emb-filepath data/news_mystem_skipgram_1000_20_2015.bin.gz \
    --synonyms-filepath data/synonyms.txt \
    --frames ruattitudes-20 

Other Examples

  • Serialize RuSentRel collection for BERT [code]
  • Serialize RuSentRel collection for Neural Networks [code]
  • Finetune BERT on samples [code]
  • Finetune Neural Networks on RuSentRel [code]

Papers

Powered by

About

AREkit-based application for a granular view onto sentiments between entities in a mass-media texts written in Russian

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 93.8%
  • Dockerfile 2.3%
  • HTML 2.0%
  • Shell 1.9%