Dependencies

boost-1.60.0
eigen hg clone https://bitbucket.org/eigen/eigen

How to use?

# setup repository #
cd
mkdir git ; cd git/
git clone [email protected]:clab/language-universal-parser.git
cd language-universal-parser
git submodule init
git submodule update
cd dynet
git pull origin master
cd ../

# build the parser (with latest version of dynet) #
cd ~/git/language-universal-parser/dynet
git pull origin master
cd .. ; mkdir build-gpu ; cd build-gpu
cmake -DEIGEN3_INCLUDE_DIR=$EIGEN_ROOT ..  # -DBACKEND=cuda is not supported just yet
make -j 10

# train the parser on small data #
~/git/language-universal-parser/build-gpu/parser/lstm-parse --train -P --training_data $TRAIN_ARCSTD --dev_data $DEV_ARCSTD --pretrained_dim 50 --pretrained $PRETRAINED_EMBEDDINGS --brown_clusters $PRETRAINED_CLUSTERS --epochs 1

How to generate arc-standard transitions?

The parser expects projective treebanks with arc-standard transitions as input (see command lines below). To convert nonprojective treebanks in CoNLL 2006 format to the arc-std oracle files of the pseudo-projective treebanks:

java -jar maltparser-1.8.1.jar -c pproj -m proj -i $split_lc -o $split_projective -pp baseline
java -jar ParserOracleArcStd.jar -t -1 -l 1 -c treebank.conll -i treebank.conll > treebank.arcstd

We recommend that you lowercase word tokens/types in all input files (e.g., pretrained embeddings, Brown clusters, train/dev/test treebanks) before calling the parser.

Language typology embeddings

To enable language typology embeddings, use the following command line argument --typological_properties typology_file. Sample typology files have been provided in the subdirectory typological_properties/. If you enable typology embeddings, please prefix each word in the input files (e.g., en:book instead of book). The two-letter prefix should match the first field in the typology file.

What to cite?

Many Languages, One Parser TACL 2016 (to appear) Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, Noah A. Smith

results

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
cmake		cmake
dynet @ 04b00eb		dynet @ 04b00eb
parser		parser
pretrained_embeddings		pretrained_embeddings
typological_properties		typological_properties
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
ParseOracleArcStd.jar		ParseOracleArcStd.jar
README.md		README.md
config.h.cmake		config.h.cmake
maltparser-1.8.1.jar		maltparser-1.8.1.jar
train-cross-lingual-parsers.tape		train-cross-lingual-parsers.tape

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dependencies

How to use?

How to generate arc-standard transitions?

Language typology embeddings

What to cite?

About

Releases

Packages

Languages

License

alamqadem/language-universal-parser

Folders and files

Latest commit

History

Repository files navigation

Dependencies

How to use?

How to generate arc-standard transitions?

Language typology embeddings

What to cite?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages