Skip to content

Latest commit

 

History

History
140 lines (106 loc) · 5.89 KB

CONTRIBUTING.md

File metadata and controls

140 lines (106 loc) · 5.89 KB

Contributing to NLTK

Hi! Thanks for your interest in contributing to NLTK. :-) In this document we'll try to summarize everything that you need to know to do a good job.

Code and Issues

We use GitHub to host our code repositories and issues. The NLTK organization on GitHub has many repositories, so we can manage better the issues and development. The most important are:

  • nltk/nltk, the main repository with code related to the library;
  • nltk/nltk_data, repository with data related to corpora, taggers and other useful data that are not shipped by default with the library, which can be downloaded by nltk.downloader;
  • nltk/nltk.github.com, NLTK website with information about the library, documentation, link for downloading NLTK Book etc.;
  • nltk/nltk_book, source code for the NLTK Book.

Git and our Branching model

Git

We use Git as our version control system, so the best way to contribute is to learn how to use it and put your changes on a Git repository. There's a plenty of documentation about Git -- you can start with the Pro Git book.

Forks + GitHub Pull requests

We use the famous gitflow to manage our branches.

Summary of our git branching model:

  • Fork the desired repository on GitHub to your account;
  • Clone your forked repository locally (git clone [email protected]:your-username:repository-name.git);
  • Create a new branch off of develop with a descriptive name (for example: feature/portuguese-sentiment-analysis, hotfix/bug-on-downloader). You can do it switching to develop branch (git checkout develop) and then creating a new branch (git checkout -b name-of-the-new-branch);
  • Do many small commits on that branch locally (git add files-changed, git commit -m "Add some change");
  • Add your name to the AUTHORS.markdown file as a contributor;
  • Push to your fork on GitHub (with the name as your local branch: git push origin branch-name);
  • Create a pull request using the GitHub Web interface (asking us to pull the changes from your new branch and add the to our develop branch);
  • Wait for comments.

Tips

  • Write helpful commit messages.
  • Anything in the master branch should be deployable (no failing tests).
  • Never use git add .: it can add unwanted files;
  • Avoid using git commit -a unless you know what you're doing;
  • Check every change with git diff before adding then to the index (stage area) and with git diff --cached before commiting;
  • If you have push access to the main repository, please do not commit directly to master: your access should be used only to accept pull requests; if you want to make a new feature, you should use the same process as other developers so you code will be reviewed.
  • See RELEASE-HOWTO.txt to see everything you need before creating a new NLTK release.

Code Guidelines

  • Use PEP8;
  • Write tests for your new features (please see "Tests" topic below);
  • Always remember that commented code is dead code;
  • Name identifiers (variables, classes, functions, module names) with readable names (x is always wrong);
  • When manipulating strings, use Python's new-style formatting ('{} = {}'.format(a, b) instead of '%s = %s' % (a, b));
  • All #TODO comments should be turned into issues (use our GitHub issue system);
  • Run all tests before pushing (just execute tox) so you will know if your changes broke something;
  • Try to write both Python 2 and Python3-friendly code so won't be a pain for us to support both versions.

See also our developer's guide.

Tests

You should write tests for every feature you add or bug you solve in the code. Having automated tests for every line of our code let us make big changes without worries: there will always be tests to verify if the changes introduced bugs or lack of features. If we don't have tests we will be blind and every change will come with some fear of possibly breaking something.

For a better design of your code, we recommend using a technique called test-driven development, where you write your tests before writing the actual code that implements the desired feature.

Continuous Integration

The continuous integration test suite previously running at Shining Panda is down due to them having taken their Clap de Fin. There have been some investigations into moving to other CI, but no official build is running right now. This may change in the near future.

nltk/test/runtests.py is a good starting point for running tests locally, but note that the suite is currently failing.

Discussion

We have two mail lists on Google Groups:

  • nltk, for announcements only;
  • nltk-users, for general discussion and user questions;
  • nltk-dev, for people interested in NLTK development.

Please feel free to contact us through the nltk-dev mail list if you have any questions or suggestions. Every contribution is very welcome!

Happy hacking! (;