Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decisions on what parts of Abbott-Smith headword to extract and how to extract them #1

Open
jtauber opened this issue Nov 23, 2015 · 3 comments

Comments

@jtauber
Copy link
Member

jtauber commented Nov 23, 2015

Relevant code and data so far is in:

https://github.com/morphgnt/morphological-lexicon/tree/master/projects/merge_abbott_smith

Here's a sample of what I currently have. The third column (which is the one I'm discussing here) is just all the text of the form element with other tags stripped.

Ἀβιληνή|G9|Ἀβιληνή v.s. Ἀβειληνή.
Ἀβιούδ|G10|Ἀβιούδ, ὁ, indecl. (Heb. אֲבִיָּהוּד), 
Ἀβραάμ|G11|Ἀβραάμ (Heb. אַבְרָהָם), ὁ, indecl. (in FlJ, Ἄβραμος, -ου; MM, VGT, s.v.), 
ἄβυσσος|G12|ἄ-βυσσος, -ον
Ἄγαβος|G13|Ἄγαβος, -ου, ὁ
ἀγαθοεργέω|G14|*† *† ἀγαθοεργέω, -ῶ, 
ἀγαθὀποιέω|G15|ἀγαθὀ-ποιέω, -ῶ (= cl. ἀγαθὸν ποιεῖν, εὐεργετεῖν), 
ἀγαθοποιΐα|G16|*† *† ἀγαθοποιία, -ας, ἡ 
ἀγαθοποιός|G17|**† **† ἀγαθοποιός, -όν = cl. ἀγαθουργός, 
ἀγαθός|G18|ἀγαθός, -ή, -όν, 
ἀγαθουργέω|*† *† ἀγαθουργέω, -ῶ, contracted form (rare, v. WH, App., 145) of ἀγαθοερ- (q.v.), 
ἀγαθωσύνη|G19|† † ἀγαθωσύνη (on the termination, v.s. ἁγιότης, and cf. WH, App., 152; MM, VGT, s.v.), -ης, ἡ 
ἀγαλλίασις|G20|† † ἀγαλλίασις, -εως, ἡ

The question is

  1. how much of that text in the third column should we keep?
  2. can we remove the rest programmatically (given the source XML) or is it quicker to just manually clean it up?
@jtauber
Copy link
Member Author

jtauber commented Nov 23, 2015

Here's the information from above that I think we should keep:

Ἀβιληνή|G9|Ἀβιληνή
Ἀβιούδ|G10|Ἀβιούδ, ὁ, indecl.
Ἀβραάμ|G11|Ἀβραάμ, ὁ, indecl.
ἄβυσσος|G12|ἄ-βυσσος, -ον
Ἄγαβος|G13|Ἄγαβος, -ου, ὁ
ἀγαθοεργέω|G14|ἀγαθοεργέω, -ῶ 
ἀγαθὀποιέω|G15|ἀγαθὀ-ποιέω, -ῶ
ἀγαθοποιΐα|G16|ἀγαθοποιία, -ας, ἡ 
ἀγαθοποιός|G17|ἀγαθοποιός, -όν
ἀγαθός|G18|ἀγαθός, -ή, -όν
ἀγαθουργέω|ἀγαθουργέω, -ῶ
ἀγαθωσύνη|G19|ἀγαθωσύνη
ἀγαλλίασις|G20|ἀγαλλίασις, -εως, ἡ

I think we should drop the Hebrew, the references to other lexicons and texts, the daggers and asterisks.

I'm not 100% sure about variant spellings, "cl." annotations and various other links to other words.

@jtauber
Copy link
Member Author

jtauber commented Nov 23, 2015

Eventually much of that information can make it's way back but what I'm really most interested in doing is including in https://github.com/morphgnt/morphological-lexicon/blob/master/lexemes.yaml the Abbott-Smith full headword like I do for Danker's CL.

@jtauber
Copy link
Member Author

jtauber commented Dec 1, 2015

I've manually cleaned up the headwords just including the inflectional class / article info and removing pretty much everything else for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant