-
Notifications
You must be signed in to change notification settings - Fork 51
discussion TEI
- Points 1-9 are directly taken from an
ML thread,
numbered A1-A9 there.
- Points 10-X also stem from around that thread.
-
Questions
- What about the TEI Lex-0 standard?
- Should it be followed?
-
Examples
- a)
<gram type="gender"/>
instead of<gen/>
. - b)
<usg>
with@type
(and possibly @norm)
- a)
-
Potential advantages
- good, fixed list of
usg
types (see this comparison table)- The useful
@type
stextType
andattribute
have no equivalents in the TEI Guidelines' suggested values.-
textType
examples: bibl., poet., admin., journalese -
attribute
examples: derog., euph.
-
- The useful
- Requirement to fully annotate with
@xml:id
and@xml:id
- good, fixed list of
-
Further questions:
- Should
textType
andattribute
just be borrowed from TEI Lex-0? - Where to annotate with
@xml:id
and@xml:lang
?
- Should
-
Answers
- The FreeDict conversion style sheets do not support TEI Lex-0. (FreeDict TEI is in parts incompatible with TEI Lex-0)
- "It all boils down to somebody reading the document, defining our specific requirements and potentially modification and implementing it." / @shumenda
- The TEI Lex-0 guidelines may be used in addition wherever they do not contradict the FreeDict or TEI guidelines.
- TEI Lex-0 is meant to encode retrodigitized dictionaries including presentational information, while FreeDict TEI is not concerned with such.
- Consider to someday switch to another (related) standard: ISO LMF-4
- No public information yet.
- ISO standard is not available for free
- There is a skeletal example document
-
See also: this thread on the mailing list.
-
Status quo
-
Questions: How to annotate transitivity information?
-
Answer: The use of
subc
is strongly recommended.
-
Question: How can I enrich my dictionary with pronunciation, as annotated in
<pron>
tags? -
Answer: Unless present, the standard build process, using
make
, adds phonetics information using the teiaddphonetics script (which internally usese speak[-ng]).
-
Question: Should usage annotations (the content of
<usg>
tags) be normalized?- different languages (e.g. "[Sprw.]" ~ "[prov.]")
- same language (e.g. "[coll.]" ~ "[slang]")
-
Notes:
- Recommended by TEI Lex-0.
- The usage of @norm in might render this less an issue.
-
Sub-questions
- Should they be normalised to a single label?
- Should they be normalised to some standard labels?
- ISO 12620 (cf. Wikipedia:Registers) (full standard only commercially available)
-
Answers
- An ontology should be defined.
- Questions:
- Similar to / linked to
shared/FreeDict_ontology.xml
?- This seems to only allow linking equivalent annotations in different languages, however not "coll." and "slang" (if these should even be considered equivalent).
- Where to find documentation on writing such an ontology?
- Similar to / linked to
- Questions:
- An ontology should be defined.
-
Examples
- "mainly Am."
- "bes. Süddt.", "especially Am."
-
Question
- How to represent the determiner ("mainly", "bes.", ...)?
-
Notes
- TEI Lex-0 suggests a separate attribute, but not which (there is a TODO
in the doc).
- None of the
<usg>
annotations really fit, maybe@subtype
?
- None of the
- TEI Lex-0 suggests a separate attribute, but not which (there is a TODO
in the doc).
-
Answer
- Likely the easiest:
<usg type="hint">mainly Am.</usg>
- Likely the easiest:
-
classes of such annotation
- a) dialect
- Ex.: "[Br.]", "[Am.]", "[Ös.]", "[Sächs.]"
- distinction from b) partially unclear (e.g., "Am.")
- b) Region or country
- Ex.: "[South Africa]", "[Hessen]", "[Berlin]", "[Wien]"
- c) Ex.: "[French]", "[Lat.]"
- a) dialect
-
Questions
- How to annotate/distinguish the above classes?
-
Notes
- TEI Lex-0:
usg[@type="geographic"]
: "marker which identifies the place or region where a lexical unit is mainly used"- Matches b), potentially partly a).
- TEI Lex-0:
-
Answers
- a), b):
usg[@type="geo"]
- c):
usg[@type="lang"]
- See the TEI Guidelines's corresponding section.
- Alternatively: Craft new type and document in the header
(
usg
type
s may be be freely chosen according to the TEI Guidelines.)- Also consider to adopt such a new type in the FreeDict guidelines.
- Use plain text but name the tag and attribute name explicitly.
- Consider to use a list of languages (e.g., this).
- a), b):
-
Notes example (
p
s andlist
s are both fine):
<notesStmt>
<note type="status">small</note> <!-- mandatory for our DB -->
<note xml:lang="de"> <!-- can be freely chosen -->
<list><item>blah</list>
</note>
</notesStmt>
-
Cases
- a) Headwords, which are annotations.
- rare
- b) Annotated on headwords.
- a) Headwords, which are annotations.
-
Question: How to represent in TEI?
-
Notes
-
Answers
- An
entry
should only contain a singleform
tag. - An
entry/form
may contain a nestedform[@type="abbrev"]
element. - In the case of a standalone abbreviation, the corresponding
form
element right belowentry
should be annotated with@type="abbrev"
.- potential issue: Shouldn't the topmost
form
elements have@type="lemma"
?
- potential issue: Shouldn't the topmost
- An
- Answer: Both are fine (also in parallel).
- Consider to put
gramGrp
insideform
, when also insense
.
- Consider to put
-
Question: Currently
<availability>
is suggested and used exclusively (for licensing information). Why not<license>
? -
Answer: The style sheets do not permit
<license>
, the validation would hence fail.- Consider to change this in a future style sheet update.
-
Q: Where to annotate a date special to a source the final TEI was imported from.
-
A: Annotate within
sourceDesc
.- Q: As plain text?
-
HowTo:
<ref>https://freedict.org/</ref>
-
(example) TEI:
<ref target="http://freedict.org/">http://freedict.org/</ref>
-
A: The HowTo is right.
-
Question: What to use when the TEI output is both influenced by a source's version and an importer's version?
-
Answers
- Whatever works or seems logical.
- Options: srcver.importerver | date | srcver | srcver.date
- Q: Set author of importer as editor?
- TEI Guidelines: "[...] acting as editor, compiler, translator, etc."
- A: Permitted.
A: Content!
-
Q: Is superEntry ok?
- A: No. "It doesn't seem necessary at all and is on its way out, in general." / @bansp
- A: Not handle by stylsheets. Also,
hom
is ignored.
-
Q: Should homographs be grouped somehow?
- A: No (unless they constitute several senses of the same word).
- Examples: "{v}" - the braces, ";", "~" - for references
- A: Drop
- A: Drop.
- Ex.: "Avis {m,n}" (german)
- A: Two
<gen>
in a singlegramGrp
.
- How to encode "{+Gen.}", indicating that an object in the genitive case should follow?
-
<colloc>[+ Gen.]</colloc>
(where "Gen." might be changed to something else)- Derived from TEI Lex-0
-
Examples:
- "bread (baked in an oven)"
- "bread (wheat product)"
-
Options:
<note>
-
<usg>
--@type="hint"
?- Usually used for more specific usages, e.g. "Am.", "med.".
-
Cases:
- a) case information: "wegen {+Gen.}"
- see 15) above
- b) auxiliary words representing an object
- b.1) suffixing: "eat sth."
- b.2) prefixing: "etw. essen"
- b.3) alternatives: "notify sth./sb."
- b.4) several: "give sth. to sb."
- potentially both prefixing and suffixing
- c) specific word(s)
- c.1) suffixing: "dismounting (of a machine)"
- c.2) prefixing
- c.3) combinations
- d) combinations of a), b), c)
- a) case information: "wegen {+Gen.}"
-
Available tags
-
<colloc>
(occurs in<gramGrp>
<usg type="colloc">
- attribute
@subtype="left"
?
-
-
Answers
- For a), see Y) above.
-
Proposed answers:
- b):
<colloc>
. This is grammar information. - c):
<usg type="colloc">
. This is not grammar information. - location:
@subtype="left"
resp. "right". - order: keep both
<colloc>
and<usg type="colloc">
in the original order.- Keeping the order of the union of both is impossible with the given suggestion, but things like "(of a machine)" are supposed to be optional anyways.
- b.3) (alternatives)
- i) group in
<choice>
or similar. - ii-iv) see below
- iii) conflicts with several subsequent
<colloc>
s
- i) group in
- b):
<form><!-- ii) -->
<orth>notify</orth>
<gramGrp><colloc>sth.</colloc></gramGrp>
<form type="alternate">
<orth>notify</orth>
<gramGrp><colloc>sb.</colloc></gramGrp>
</form>
</form>
<!-- OR iii) -->
<form>
<orth>notify</orth>
<gramGrp>
<colloc>sth.</colloc>
<colloc>sb.</colloc>
</gramGrp>
</form>
<!-- OR iv) -->
<form>
<orth>notify</orth>
<gramGrp>
<colloc>sth./sb.</colloc>
</gramGrp>
</form>
-
Consider "[formal/Am.]" vs. "[formal] [Am.]".
- The former indicates a disjunction, the latter a conjunction of the two annotations.
- Also possible with grammar annotations.
-
Q: How to differentiate?
-
Options:
- a) Don't.
- b) For grammar annotations: Several
gramGrp
s. - c) Literal retaining of the slash (or similar separator).
- May forbid to set a common
@type
(such as in the example above).
- May forbid to set a common
- d) Something like
<choice>
for disjunctions.
- Options
- Short english forms from
shared/FreeDict_ontology.xml
- Anything, but link to that ontology, as done in
eng-pol.tei
.
- Short english forms from
-
Example: "biological breakdown/degradation"
-
Q: How to encode
-
Options:
- literally
- derive two distinct headwords/translations
- headwords:
- link with
xr/ref
- sub-
form
with@type="alternate"
or similar.
- link with
- translations: separare
cit
elements
- headwords:
- Something else (e.g. something like
choice
)- likely only an option for translations.
-
It's common to have an example for a headword, together with a translation.
- Question is, what about examples particular to the translation.
-
Likely realisation:
<cit type="trans"><quote /><cit type="example" /></cit>
-
Cases
- a) same main part: "v/trans" + "v/intr"
- b) different main part (awkward): "v/trans" + "pron/rel"
-
Options
- a.1) One
pos
followed by severalsubc
. - a.2) Two pairs of
pos
,subc
. - a.3) two
gramGrp
- b.1) two pairs, like a.2)
- b.2) two
gramGrp
- a.1) One
- Status quo
- ML, Wiki,
lg1-lg2.tei
:infl
- TEI Guidelines, TEI Lex-0:
inflected
- ML, Wiki,