👉 DEMO 👈
Supported Languages: Russian
ARElight is an application for a granular view onto sentiments between mentioned named entities in a mass-media texts written in Russian.
This project is commonly powered by AREkit framework.
for Named Entity Recognition in text sentences, we adopt DeepPavlov (BertOntoNotes model).
- arekit == 0.22.0
- gensim == 3.2.0
- deeppavlov == 0.11.0
- rusenttokenize
- brat-v1.3 [github]
- CUDA
Supported Languages: Russian
Other Requirements: NVidia-docker
- Download nicolay-r-arelight-0.1.1.tar
- Import container and start Apache hosting:
docker import nicolay-r-arelight-0.1.1.tar
docker run --name arelight -itd --gpus all nicolay-r/arelight:0.1.1
docker attach arelight
service apache2 start
- Proceed with BERT demo: http://172.17.0.2/examples/demo/wui_bert.py
Supported Languages: Russian
- PCNN example, finetuned on RuSentRel: http://172.17.0.2/examples/demo/wui_nn.py
Supported Languages: Russian
- ARElight:
# Install the required dependencies
pip install -r dependencies.txt
# Donwload Required Resources
python3.6 download.py
- BRAT: Download and install library, and run standalone server as follows:
./install.sh -u
python standalone.py
Usage: proceed with the examples
folder.
Supported Languages: Russian
Infer sentiment attitudes from a mass-media document(s).
Using the BERT
fine-tuned model version:
python3.6 infer_texts_bert.py --from-files data/texts-inosmi-rus/e1.txt \
--labels-count 3 \
--terms-per-context 50 \
--tokens-per-context 128 \
--text-b-type nli_m \
-o output/brat_inference_output
Supported Languages: Russian
Using the pretrained PCNN
model (including frames annotation):
python3.6 infer_texts_nn.py --from-files data/texts-inosmi-rus/e1.txt \
--model-name pcnn \
--model-state-dir models/ \
--terms-per-context 50 \
--stemmer mystem \
--entities-parser bert-ontonotes \
--frames ruattitudes-20 \
--labels-count 3 \
--bags-per-minibatch 2 \
--model-input-type ctx \
--entity-fmt hidden-simple-eng \
--emb-filepath data/news_mystem_skipgram_1000_20_2015.bin.gz \
--synonyms-filepath data/synonyms.txt \
-o output/brat_inference_output
Supported Languages: Any
For the BERT
model:
python3.6 serialize_texts_bert.py --from-files data/texts-inosmi-rus/e1.txt
--entities-parser bert-ontonotes \
--terms-per-context 50
Supported Languages: Russian by default (depends on embedding)
For the other neural networks (including embedding and other features):
python3.6 serialize_texts_nn.py --from-files data/texts-inosmi-rus/e1.txt \
--entities-parser bert-ontonotes \
--stemmer mystem \
--terms-per-context 50 \
--emb-filepath data/news_mystem_skipgram_1000_20_2015.bin.gz \
--synonyms-filepath data/synonyms.txt \
--frames ruattitudes-20
- Serialize RuSentRel collection for BERT [code]
- Serialize RuSentRel collection for Neural Networks [code]
- Finetune BERT on samples [code]
- Finetune Neural Networks on RuSentRel [code]
- AREkit [github]