-
Notifications
You must be signed in to change notification settings - Fork 6
/
howto_lemmatizing_collocations.dita
77 lines (77 loc) · 3.46 KB
/
howto_lemmatizing_collocations.dita
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="topic_wp3_vsm_fx">
<title>Lemmatizing idiomatic expressions and compound words</title>
<body>
<p><b>Idiomatic expressions and collocations</b></p>
<p>Some words are collocated with a verb so that they form an idiomatic expression. This
collocation will be determined by selecting the correct entry for the verb in the lemmatizer.
The word itself has to be lemmatized as it is.</p>
<p>
<table id="table_qz5_lkb_lt">
<tgroup cols="2">
<colspec colnum="1" colname="col1"/>
<colspec colnum="2" colname="col2"/>
<tbody>
<row>
<entry><b>Example</b></entry>
<entry><b>correct lemmatization</b></entry>
</row>
<row>
<entry><i>jri̯ </i>ꜣ<i>h.w</i> – to feel pain</entry>
<entry>1) <i>jri̯ (</i>ꜣ<i>h.w)</i> (WCN: 851959); 2) ꜣ<i>h.w </i>(WCN: 174)</entry>
</row>
<row>
<entry><i>rḏi̯ j</i>ꜣ<i>.w</i> – to praise s.o.</entry>
<entry>1) <i>rḏi̯ (j</i>ꜣ<i>.w)</i> (WCN: 851491); 2) <i>j</i>ꜣ<i>.w</i> (WCN:
20360)</entry>
</row>
<row>
<entry><i>sḏm </i>ꜥ<i>š</i> – to serve</entry>
<entry>1)<i> sḏm (</i>ꜥ<i>š)</i> (WCN: 150630); 2) ꜥ<i>š</i> (WCN: 40900)</entry>
</row>
</tbody>
</tgroup>
</table>
</p>
<p/>
<p><b>Compound words</b></p>
<p>Compound words such as prepositions, nouns, proper names, titles etc. are connected by a
hyphen “-“ and are treated as one lemma, i.e. do not lemmatize <i>ḥm-nṯr</i> separately as
<i>ḥm</i> and <i>nṯr</i> (see: <xref href="grammar_structure_convention.dita"/>). This is
because the lemma <i>ḥm-nṯr</i> has already been sub-lemmatized in the word list, so you do
not need to do this again in your text.</p>
<p/>
<p><b>Compound words separated by suffix pronoun or personal names</b></p>
<p>Sometimes titles, epithets, and other compound words might be split up by a suffix pronoun or
a personal name. In this case, the separation has to be marked by "+", and the suffix or
personal name is lemmatized normally. The first part of the compound word is lemmatized with
the WCN of the whole compound, the second part is left without lemmatization.</p>
<p>
<table frame="all" rowsep="1" colsep="1" id="table_d1q_3ym_fx">
<tgroup cols="2">
<colspec colname="c1" colnum="1" colwidth="1.0*"/>
<colspec colname="c2" colnum="2" colwidth="1.0*"/>
<thead>
<row>
<entry>Example</entry>
<entry>correct lemmatization</entry>
</row>
</thead>
<tbody>
<row>
<entry><i>ḥ</i>ꜣ<i>.tj-ꜥ+ M</i>ꜥ<i>ḥ +n-Nfr-wsj</i></entry>
<entry>1) <i>ḥ</i>ꜣ<i>.tj-</i>ꜥ<i>+</i> (WCN: 857144); 2) <i>M</i>ꜥ<i>ḥ</i> (WCN:
600439); 3) <i>+n-Nfr-wsj </i>(WCN: -)</entry>
</row>
<row>
<entry><i>rn+ =f +nfr</i></entry>
<entry>1) <i>rn+</i> (WCN: 94780 for <i>rn-nfr</i>); 2) <i>=f</i> (WCN: 10050); 3)
<i>+nfr</i> (WCN: -)</entry>
</row>
</tbody>
</tgroup>
</table>
</p>
</body>
</topic>