discussion TEI

Discussion on (FreeDict) TEI

Notes:

Points 1-9 are directly taken from an ML thread, numbered A1-A9 there.
- Points 10-X also stem from around that thread.

1) TEI Lex-0.

Questions
- What about the TEI Lex-0 standard?
- Should it be followed?
Examples
- a) <gram type="gender"/> instead of <gen/>.
- b) <usg> with @type (and possibly @norm)
Potential advantages
- good, fixed list of usg types (see this comparison table)
  - The useful @types textType and attribute have no equivalents in the TEI Guidelines' suggested values.
    - textType examples: bibl., poet., admin., journalese
    - attribute examples: derog., euph.
- Requirement to fully annotate with @xml:id and @xml:id
Further questions:
- Should textType and attribute just be borrowed from TEI Lex-0?
- Where to annotate with @xml:id and @xml:lang?
Answers
- The FreeDict conversion style sheets do not support TEI Lex-0. (FreeDict TEI is in parts incompatible with TEI Lex-0)
- "It all boils down to somebody reading the document, defining our specific requirements and potentially modification and implementing it." / @shumenda
- The TEI Lex-0 guidelines may be used in addition wherever they do not contradict the FreeDict or TEI guidelines.
- TEI Lex-0 is meant to encode retrodigitized dictionaries including presentational information, while FreeDict TEI is not concerned with such.
- Consider to someday switch to another (related) standard: ISO LMF-4
  - No public information yet.
  - ISO standard is not available for free
  - There is a skeletal example document
See also: this thread on the mailing list.

2) Verb & Transitivity annotation.

Status quo
- In a HowTo, it is suggested to use v,vt,vi,vti, i.e., merge all such information into a single token.
- In an example, there is "vtr", which would also adhere to TEI Lex-0, in contrast to the former.
Questions: How to annotate transitivity information?
Answer: The use of subc is strongly recommended.

3) IPA Pronunciation.

Question: How can I enrich my dictionary with pronunciation, as annotated in <pron> tags?
Answer: Unless present, the standard build process, using make, adds phonetics information using the teiaddphonetics script (which internally usese speak[-ng]).

4) Normalization of usage annotations

Question: Should usage annotations (the content of <usg> tags) be normalized?
- different languages (e.g. "[Sprw.]" ~ "[prov.]")
- same language (e.g. "[coll.]" ~ "[slang]")
Notes:
- Recommended by TEI Lex-0.
- The usage of @norm in might render this less an issue.
Sub-questions
- Should they be normalised to a single label?
- Should they be normalised to some standard labels?
  - ISO 12620 (cf. Wikipedia:Registers) (full standard only commercially available)
Answers
- An ontology should be defined.
  - Questions:
    - Similar to / linked to shared/FreeDict_ontology.xml?
      - This seems to only allow linking equivalent annotations in different languages, however not "coll." and "slang" (if these should even be considered equivalent).
    - Where to find documentation on writing such an ontology?

5) Quantified (or similar) usage annotations

Examples
- "mainly Am."
- "bes. Süddt.", "especially Am."
Question
- How to represent the determiner ("mainly", "bes.", ...)?
Notes
- TEI Lex-0 suggests a separate attribute, but not which (there is a TODO in the doc).
  - None of the <usg> annotations really fit, maybe @subtype?
Answer
- Likely the easiest: <usg type="hint">mainly Am.</usg>

6) Regional / dialect / language annotations.

classes of such annotation
- a) dialect
  - Ex.: "[Br.]", "[Am.]", "[Ös.]", "[Sächs.]"
  - distinction from b) partially unclear (e.g., "Am.")
- b) Region or country
  - Ex.: "[South Africa]", "[Hessen]", "[Berlin]", "[Wien]"
- c) Ex.: "[French]", "[Lat.]"
Questions
- How to annotate/distinguish the above classes?
Notes
- TEI Lex-0: usg[@type="geographic"]: "marker which identifies the place or region where a lexical unit is mainly used"
  - Matches b), potentially partly a).
Answers
- a), b): usg[@type="geo"]
- c): usg[@type="lang"]
  - See the TEI Guidelines's corresponding section.
- Alternatively: Craft new type and document in the header (usg types may be be freely chosen according to the TEI Guidelines.)
  - Also consider to adopt such a new type in the FreeDict guidelines.
  - Use plain text but name the tag and attribute name explicitly.
- Consider to use a list of languages (e.g., this).
Notes example (ps and lists are both fine):

<notesStmt>
  <note type="status">small</note> <!-- mandatory for our DB -->
  <note xml:lang="de"> <!-- can be freely chosen -->
    <list><item>blah</list>
  </note>
</notesStmt>

7) Abbreviations.

Cases
- a) Headwords, which are annotations.
  - rare
- b) Annotated on headwords.
Question: How to represent in TEI?
Notes
- THE TEI Guidelines contain an example with both <form type="abbrev"> and <form type="full">, in the same <entry>.
- The TEI Guidelines also offer <abbr> and <expan>, possibly grouped in <choice>.
  - These seem to be rather intended for encoding inside of prose.
Answers
- An entry should only contain a single form tag.
- An entry/form may contain a nested form[@type="abbrev"] element.
- In the case of a standalone abbreviation, the corresponding form element right below entry should be annotated with @type="abbrev".
  - potential issue: Shouldn't the topmost form elements have @type="lemma"?

8) entry/sense/gramGrp vs entry/gramGrp

Answer: Both are fine (also in parallel).
- Consider to put gramGrp inside form, when also in sense.

9) Header

9.1) fileDesc/publicationStmt/license

Question: Currently <availability> is suggested and used exclusively (for licensing information). Why not <license>?
Answer: The style sheets do not permit <license>, the validation would hence fail.
- Consider to change this in a future style sheet update.

9.2) Date of imported source

Q: Where to annotate a date special to a source the final TEI was imported from.
A: Annotate within sourceDesc.
- Q: As plain text?

9.3) fileDesc/publicationStmt/pubPlace

HowTo: <ref>https://freedict.org/</ref>
(example) TEI: <ref target="http://freedict.org/">http://freedict.org/</ref>
A: The HowTo is right.

9.6) [imported dictionaries] fileDesc/editionStmt/edition (version)

Question: What to use when the TEI output is both influenced by a source's version and an importer's version?
Answers
- Whatever works or seems logical.
- Options: srcver.importerver | date | srcver | srcver.date

9.7) [imported dictionaries] fileDesc/titleStmt/editor

Q: Set author of importer as editor?
- TEI Guidelines: "[...] acting as editor, compiler, translator, etc."
A: Permitted.

10) Q: Should `xr/ref` have a content, or can a `@target` suffice.

A: Content!

11) Grouping of homographs

Q: Is superEntry ok?
- A: No. "It doesn't seem necessary at all and is on its way out, in general." / @bansp
- A: Not handle by stylsheets. Also, hom is ignored.
Q: Should homographs be grouped somehow?
- A: No (unless they constitute several senses of the same word).

12) Presentational information

Examples: "{v}" - the braces, ";", "~" - for references
A: Drop

13) "to" prefix for verbs

A: Drop.

14) Multiple genders

Ex.: "Avis {m,n}" (german)
A: Two <gen> in a single gramGrp.

15) collocates' case

How to encode "{+Gen.}", indicating that an object in the genitive case should follow?
<colloc>[+ Gen.]</colloc> (where "Gen." might be changed to something else)
- Derived from TEI Lex-0

16) Enoding of plain text annotations on headwords (and translations)

Examples:
- "bread (baked in an oven)"
- "bread (wheat product)"
Options:
- <note>
- <usg> -- @type="hint"?
  - Usually used for more specific usages, e.g. "Am.", "med.".

17) Collocates' encoding

Cases:
- a) case information: "wegen {+Gen.}"
  - see 15) above
- b) auxiliary words representing an object
  - b.1) suffixing: "eat sth."
  - b.2) prefixing: "etw. essen"
  - b.3) alternatives: "notify sth./sb."
  - b.4) several: "give sth. to sb."
    - potentially both prefixing and suffixing
- c) specific word(s)
  - c.1) suffixing: "dismounting (of a machine)"
  - c.2) prefixing
  - c.3) combinations
- d) combinations of a), b), c)
Available tags
- <colloc> (occurs in <gramGrp>
- <usg type="colloc">
- attribute @subtype="left"?
Answers
- For a), see Y) above.
Proposed answers:
- b): <colloc>. This is grammar information.
- c): <usg type="colloc">. This is not grammar information.
- location: @subtype="left" resp. "right".
- order: keep both <colloc> and <usg type="colloc"> in the original order.
  - Keeping the order of the union of both is impossible with the given suggestion, but things like "(of a machine)" are supposed to be optional anyways.
- b.3) (alternatives)
  - i) group in <choice> or similar.
  - ii-iv) see below
  - iii) conflicts with several subsequent <colloc>s

<form><!-- ii) -->
  <orth>notify</orth>
  <gramGrp><colloc>sth.</colloc></gramGrp>
  <form type="alternate">
     <orth>notify</orth>
     <gramGrp><colloc>sb.</colloc></gramGrp>
  </form>
</form>
<!-- OR iii) -->
<form>
  <orth>notify</orth>
  <gramGrp>
    <colloc>sth.</colloc>
    <colloc>sb.</colloc>
  </gramGrp>
</form>
<!-- OR iv) -->
<form>
  <orth>notify</orth>
  <gramGrp>
    <colloc>sth./sb.</colloc>
  </gramGrp>
</form>

18) Grouping of annotations

Consider "[formal/Am.]" vs. "[formal] [Am.]".
- The former indicates a disjunction, the latter a conjunction of the two annotations.
- Also possible with grammar annotations.
Q: How to differentiate?
Options:
- a) Don't.
- b) For grammar annotations: Several gramGrps.
- c) Literal retaining of the slash (or similar separator).
  - May forbid to set a common @type (such as in the example above).
- d) Something like <choice> for disjunctions.

19) Q: Which content should grammar elements have?

Options
- Short english forms from shared/FreeDict_ontology.xml
- Anything, but link to that ontology, as done in eng-pol.tei.

20) Alternatives in a headword or translation (</>)

Example: "biological breakdown/degradation"
Q: How to encode
Options:
- literally
- derive two distinct headwords/translations
  - headwords:
    - link with xr/ref
    - sub-form with @type="alternate" or similar.
  - translations: separare cit elements
- Something else (e.g. something like choice)
  - likely only an option for translations.

21) Q: May/should translations have examples?

It's common to have an example for a headword, together with a translation.
- Question is, what about examples particular to the translation.
Likely realisation: <cit type="trans"><quote /><cit type="example" /></cit>

22) Q: Are entries without translations permitted?

23) Q: What about several `subc`?

Cases
- a) same main part: "v/trans" + "v/intr"
- b) different main part (awkward): "v/trans" + "pron/rel"
Options
- a.1) One pos followed by several subc.
- a.2) Two pairs of pos, subc.
- a.3) two gramGrp
- b.1) two pairs, like a.2)
- b.2) two gramGrp

24) `form` `@type`: `infl` vs. `inflected`

Status quo
- ML, Wiki, lg1-lg2.tei: infl
- TEI Guidelines, TEI Lex-0: inflected

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

discussion TEI

Discussion on (FreeDict) TEI

Notes:

1) TEI Lex-0.

2) Verb & Transitivity annotation.

3) IPA Pronunciation.

4) Normalization of usage annotations

5) Quantified (or similar) usage annotations

6) Regional / dialect / language annotations.

7) Abbreviations.

8) entry/sense/gramGrp vs entry/gramGrp

9) Header

9.1) fileDesc/publicationStmt/license

9.2) Date of imported source

9.3) fileDesc/publicationStmt/pubPlace

9.6) [imported dictionaries] fileDesc/editionStmt/edition (version)

9.7) [imported dictionaries] fileDesc/titleStmt/editor

10) Q: Should `xr/ref` have a content, or can a `@target` suffice.

11) Grouping of homographs

12) Presentational information

13) "to" prefix for verbs

14) Multiple genders

15) collocates' case

16) Enoding of plain text annotations on headwords (and translations)

17) Collocates' encoding

18) Grouping of annotations

19) Q: Which content should grammar elements have?

20) Alternatives in a headword or translation (</>)

21) Q: May/should translations have examples?

22) Q: Are entries without translations permitted?

23) Q: What about several `subc`?

24) `form` `@type`: `infl` vs. `inflected`

25) `usg` inside `form[@type="inflected"]`?

Clone this wiki locally

discussion TEI

Discussion on (FreeDict) TEI

Notes:

1) TEI Lex-0.

2) Verb & Transitivity annotation.

3) IPA Pronunciation.

4) Normalization of usage annotations

5) Quantified (or similar) usage annotations

6) Regional / dialect / language annotations.

7) Abbreviations.

8) entry/sense/gramGrp vs entry/gramGrp

9) Header

9.1) fileDesc/publicationStmt/license

9.2) Date of imported source

9.3) fileDesc/publicationStmt/pubPlace

9.6) [imported dictionaries] fileDesc/editionStmt/edition (version)

9.7) [imported dictionaries] fileDesc/titleStmt/editor

10) Q: Should xr/ref have a content, or can a @target suffice.

11) Grouping of homographs

12) Presentational information

13) "to" prefix for verbs

14) Multiple genders

15) collocates' case

16) Enoding of plain text annotations on headwords (and translations)

17) Collocates' encoding

18) Grouping of annotations

19) Q: Which content should grammar elements have?

20) Alternatives in a headword or translation (</>)

21) Q: May/should translations have examples?

22) Q: Are entries without translations permitted?

23) Q: What about several subc?

24) form @type: infl vs. inflected

25) usg inside form[@type="inflected"]?

Clone this wiki locally

10) Q: Should `xr/ref` have a content, or can a `@target` suffice.

23) Q: What about several `subc`?

24) `form` `@type`: `infl` vs. `inflected`

25) `usg` inside `form[@type="inflected"]`?