This repository contains the supporting code for the FinCausal 2020 submission titled Span-based Causality Extraction for Financial Documents
. The model extracts cause and effect spans from financial documents, for example:
Text | Cause | Effect |
---|---|---|
Boussard Gavaudan Investment Management LLP bought a new position in shares of GENFIT S A/ADR in the second quarter worth about $199,000. Morgan Stanley increased its stake in shares of GENFIT S A/ADR by 24.4% in the second quarter.Morgan Stanley now owns 10,700 shares of the company’s stock worth $211,000 after purchasing an additional 2,100 shares during the period | Morgan Stanley increased its stake in shares of GENFIT S A/ADR by 24.4% in the second quarter | Morgan Stanley now owns 10,700 shares of the company’s stock worth $211,000 after purchasing an additional 2,100 shares during the period. |
Zhao found himself 60 million yuan indebted after losing 9,000 BTC in a single day (February 10, 2014) | losing 9,000 BTC in a single day (February 10, 2014) | Zhao found himself 60 million yuan indebted |
(sample from the task description: Data Processing and Metrics for FinCausal Shared Task, 2020, Mariko et al.)
The system ranked 2nd on the official evaluation board, and reached the following performance in post-evaluation:
Metric | score |
---|---|
weighted-averaged F1 | 95.01% |
Exact matches | 83.34% |
weighted-averaged Precision | 95.01% |
weighted-averaged Recall | 95.01% |
Metric | score |
---|---|
weighted-averaged F1 | 94.66% |
Exact matches | 73.66% |
weighted-averaged Precision | 94.66% |
weighted-averaged Recall | 94.66% |
The system is based on a RoBERTa span-extraction model (similar to Question Answering architecture), a full description of the system is available in the related system description. If you find this system useful, please cite us:
@inproceedings{
Becquin-fincausal-2020,
title ={{GBe at FinCausal 2020, Task 2: Span-based Causality Extraction for Financial Documents}},
author = {Becquin, Guillaume},
booktitle ={{The 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation (FNP-FNS 2020}},
year = {2020},
address = {Barcelona, Spain}
}
- Install requirements provided in
reuiquirements.py
(it is advised to use a virtual environment) - Generate the train / development data split running running the
./utils/split_dataset.py
- run
main.py --train
- run
main.py --eval
- run
main.py --test