Skip to content

Latest commit

 

History

History
39 lines (25 loc) · 1.71 KB

classification_with_graphlab.md

File metadata and controls

39 lines (25 loc) · 1.71 KB

3. Classification using Graphlab

Previously : Crawling and Preparing Training Set

Once the training data is ready in

# Build the classifier
python classification/graphlab/graphlab_train.py --training_dir classification/data/latest/
# Build the dataset
python classification/graphlab/graphlab_train.py --dataset_parsed_dir ~/brazil/all_files_parsed # ~/brazil/2005_parsed/ 
# Classify the dataset
python classification/graphlab/graphlab_classify.py --dataset_dir graphlab/my_dataset --classified_dir graphlab/result_dataset
#to print the results:
python classification/graphlab/graphlab_classify.py --classified_dir result_dataset --print

Classification by event type:

python classification/graphlab/classify_by_event_type.py --classified_dir graphlab/result_dataset
python classification/graphlab/classify_by_event_type.py --print

Embeddings

Word Embeddings (Portuguese)

  pip install nlpnet
  wget http://nilc.icmc.usp.br/nlpnet/data/embeddings-pt.tgz
  tar xzf embeddings-pt.tgz
  
  python classification/embeddingstotxt.py --type plain --embeddings ~/brazil/w2e-embeddings/types-features.npy -v  ~/brazil/w2e-embeddings/vocabulary2.txt -o /tmp/
  mv /tmp/models.txt ~/brazil/portuguese-nlp/word2vec_model.txt

For more info Evaluating word embeddings and a revised corpus for part-of-speech tagging in Portuguese

Previous is 2. Crawling and Preparing Training Set

Next is 4. Reports

Back to Main Page