Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commas appear within values in LEMMATA #8

Open
whoopsedesy opened this issue Apr 8, 2020 · 0 comments
Open

Commas appear within values in LEMMATA #8

whoopsedesy opened this issue Apr 8, 2020 · 0 comments

Comments

@whoopsedesy
Copy link

Various entries in greek-lemmata.txt have the characters - and , in their beta code. For example,

$ awk '$1 ~ /[,-]/ {print $1}' greek-lemmata.txt
a)llanto-pw/lhs
a)mfi/,ei)s-qa/w
a)mfi/,kata/-e(/zomai
a)mfi/,kata/-e)/rdw
a)mfi/,kata/-ka/qhmai
a)mfi/,peri/-ai)wre/w
a)mfi/,peri/-ei(li/ssw
a)mfi/,peri/-ei)=pon
a)mfi/,peri/-pla/zw
a)mfi/,peri/-sfi/ggw
a)mfi/,peri/-skai/rw
a)mfi/,peri/-stei/nw
a)mfi/,peri/-troxa/zw
a)mfi/-a)/gamai
a)mfi/-a)/gnumi
... 18247 more ...

The - characters don't cause a problem, as they are stripped in beta_to_unicode.py. But the , are making their way into the output LEMMATA dictionary. It affects both keys and values. For example,

'τό,τε': 'τότε',
'τό,τ’': 'τότε',
'τἠστραπῆι': 'ἐστραπῆι,εἰστραπέω',
'τἠστραπῇ': 'ἐστραπῇ,εἰστραπέω',
'ἀμφιπεριπλάζουσαν': 'ἀμφί,περίπλάζω',
'ἀμφιπερισκαίροντες': 'ἀμφί,περίσκαίρω',
'ἀμφιπεριστείνωνται': 'ἀμφί,περίστείνω',
'ἀμφιπερισφίγγουσα': 'ἀμφί,περίσφίγγω',
'ἀμφιπεριτροχάζειν': 'ἀμφί,περίτροχάζω',
'ἀμφιπεριῃώρηνται': 'ἀμφί,περίαἰωρέω',
'ἀνένηκεν': 'ἀνά,ἐνἥκω',

I don't know what data source puts them in greek-lemmata.txt, or whether they should be stripped or what.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant