Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support contexts for dictionary entries #55

Open
rhdunn opened this issue Oct 25, 2013 · 0 comments
Open

Support contexts for dictionary entries #55

rhdunn opened this issue Oct 25, 2013 · 0 comments

Comments

@rhdunn
Copy link
Owner

rhdunn commented Oct 25, 2013

The context of a dictionary entry can be used to disambiguate words with the same spelling, but different pronunciations:

context description/usage
after-noun the word occurs after a noun
before-noun the word occurs before a noun
date writing a date (e.g. 31st Jan)
femanine the person/thing is known to be female
masculine the person/thing is known to be male (e.g. Dutch male name)
noun the word is a noun
number the word is a number
spelling the word is a letter and is used when spelling out words
stressed the word is emphasised
unstressed the word is not emphasised
verb the word is a verb
verb-past the word is the past form of a verb

For example:

jan /dZ'an/ # [femanine], i.e. female name
jan /j'an/ [masculine] # e.g. Dutch male name
jan january [date]

i /'aI/
i 1 [number] # roman numeral

a /'eI/ [spelling] [stressed]
a /@/ [unstressed]

st street [after-noun] # e.g. Bridge St.
st saint [before-noun] # e.g. St. Helen

lead /l'i:d/ [verb]
lead /l'Ed/ [noun] [verb-past]

This can be used for part of speech tagging of other words that don't have ambiguous pronuncitions to help disambiguation.

In order to avoid duplicating context entries for different pronunciations and to keep the dictionary format stable, a special context dictionary will have the format:

word word@variant [context] ... [context] # optional comment

Here, variant is a natural number that refers to the given pronunciation context/form. The dictionary will have the following entries:

word word@default
word@1 ...
...
word@n ...

where default refers to the default pronunciation for the word (i.e. when no identifying context can be deduced). This is usually the most common form, or the most easily identified.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/1053258-support-contexts-for-dictionary-entries?utm_campaign=plugin&utm_content=tracker%2F254961&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F254961&utm_medium=issues&utm_source=github).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant