MorphoChain

Author: Karthik Narasimhan ([email protected])

Unsupervised Discovery of Morphological Chains (TACL 2015)

A model for unsupervised morphological analysis that integrates orthographic and semantic views of words.
Model consistently outperforms three state-of-the-art baselines on the task of morphological segmentation on Arabic, English and Turkish.

Download

You can clone the repository and use the production2 branch (default) for the latest code.

Dependencies (before Compiling)

This project uses the LBFGS-B algorithm for optimization (the jar files for the library are included in lib/). We, however, recommend you to download and install the lbfgsb_wrapper for Java from here since there may be additional steps for you to take for installing on Mac OSX. At the end of the install, move the files lbfgsb_wrapper-.jar and liblbfgsb_wrapper.so (or liblbfgsb_wrapper.dylib on OSX) into the lib/ directory.
External library: commons-lang3-3.3.2.jar (included in lib/)
Install the Junit framework following instructions in http://junit.org/ or using Maven.
Replace the path for jdk.home.1.7 in the build.properties file with your local install.
(optional) Change path.variable.maven_repository in build.properties to your local maven repository if you wish to use your Maven installs.

Compile

Use 'ant all' to compile on the terminal (requires ant version > 1.6). You can also directly import the entire directory into IntelliJ or Eclipse and compile using the GUI.

Sample Usage

Here is an example of how to run the code from the home directory of the project. The output will contain the predicted segmentations for all the words in the test file. If you do not have gold segmentations to test against, you can just input a file with the word as its own segmentation (i.e. : instead of : in each line of the file - see FORMATS.txt for details).

PARAMS_FILE=params.properties;
OUT_FILE=output.txt;
java -ea  -Djava.library.path=lib/ -classpath "./lib/*:./out/production/Morphology" Main $PARAMS_FILE >$OUT_FILE

Configuration

Most parameters in the model can be changed in the file params.properties

Word Vectors

A good tool to produce your own vectors from a raw corpus is word2vec. You can also use any pre-existing vectors as long as they satisfy the format as specified in FORMATS.txt.

Contact

Please use the issue tracker or email me if you have any questions/suggestions.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
lib		lib
src		src
FORMATS.txt		FORMATS.txt
LICENSE.txt		LICENSE.txt
PARAMS.txt		PARAMS.txt
README.md		README.md
build.properties		build.properties
build.xml		build.xml
java-command		java-command
module_morphology.xml		module_morphology.xml
params.properties		params.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MorphoChain

Unsupervised Discovery of Morphological Chains (TACL 2015)

Download

Dependencies (before Compiling)

Compile

Sample Usage

Configuration

Word Vectors

Contact

About

Releases

Packages

Languages

License

karthikncode/MorphoChain

Folders and files

Latest commit

History

Repository files navigation

MorphoChain

Unsupervised Discovery of Morphological Chains (TACL 2015)

Download

Dependencies (before Compiling)

Compile

Sample Usage

Configuration

Word Vectors

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages