Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support part of speech tagging for words #45

Open
rhdunn opened this issue Apr 11, 2013 · 0 comments
Open

Support part of speech tagging for words #45

rhdunn opened this issue Apr 11, 2013 · 0 comments

Comments

@rhdunn
Copy link
Owner

rhdunn commented Apr 11, 2013

Part of speech tagging is the process of associating a word with the part of speech it is categorised as (noun, verb, adverb, etc.).

The tagging algorithm should be:

  1. If the word is in a partofspeech.dict dictionary, it has the part of speech from that dictionary.
  2. If the word matches a suffix in suffixes.dict it has the part of speech from that dictionary.
  3. If the word has not been matched, it is tagged as a noun.

This algorithm supports false positives in suffixes.dict by adding them to partofspeech.dict.

The parts of speech used should be described in a SKOS vocabulary (data/partsofspeech.rdf) and form a consistent taxonomy.

It should be possible to check the tagging against a manually tagged corpus -- preferrably a freely available/usable one.

It should be possible to associate a word with more than one tag (e.g. read and lead). This should feed into a disambiguation step that looks at the structure of the sentence.

The part of speech tag will then be used to differentiate word pronunciations.

The part of speech tagger should be an independant step, along with the sentence/grammar analysis step. As such they should be optional.

The tagger should support light analysis like eSpeak does to disambiguate words, but things like the "verb follows" should be done via part of speech rules (e.g. marking it as an adverb).

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/1026778-support-part-of-speech-tagging-for-words?utm_campaign=plugin&utm_content=tracker%2F254961&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F254961&utm_medium=issues&utm_source=github).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant