Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Junk parses due to DetNP (DetQuant IndefArt NumPl) in English #437

Open
inariksit opened this issue Aug 23, 2023 · 4 comments
Open

Junk parses due to DetNP (DetQuant IndefArt NumPl) in English #437

inariksit opened this issue Aug 23, 2023 · 4 comments

Comments

@inariksit
Copy link
Member

When using the English resource grammar for parsing, there are lots of junk parses due to the interplay of ApposCN and DetNP applied to the plural indefinite article, that is linearised as an empty string.

For my own purposes, I have been doing this inariksit@55551e4 , just in my local branch.

I haven't made a pull request with this solution, because I don't know if it would break some existing code. But I wonder if something like that would be acceptable to merge into the official RGL? Other languages have the same problem as well, basically any language where some Det linearises into an empty string.

@hleiss
Copy link
Contributor

hleiss commented Jan 21, 2024

I made a similar change in Ger to get rid of such junk parses. It improves parsing, but the linearization now gives

Lang> l AdjCN (PositA good_A) (ApposCN (UseN book_N) (PPartNP (DetNP (DetQuant IndefArt NumPl)) see_V2))
good book ones seen
gutes Buch einige , gesehen

Do you understand this? Even if PPartNP makes some sense with reasonable np, e.g.

Lang> l PPartNP (UsePN john_PN) forget_V2
John forgotten
Johann , vergessen

ApposCN is pretty bad, or as the abstract grammar says:

-- This is certainly overgenerating.

ApposCN : CN -> NP -> CN ;    -- city Paris (, numbers x and y)

It is hard for me to guess what really is intended: is the CN "city" the apposition to the NP "Paris" (or the PN "Paris"?), or the other way round, the NP the apposition to a PN (rather than a CN), as in "Paris, the capital of France"?

@inariksit
Copy link
Member Author

Intended for ApposCN is that the CN ("city") is the head, and the NP ("Paris") is the appositive. This construction requires that the head is a CN, and thus the whole resulting CN can be quantified with arbitrary Det ("my five cities Paris").

The more common example of apposition occurs with two NPs, so "[my sister] [Alice]" or indeed "[Paris], [the capital of France]". To get this construction, you can use ApposNP : NP -> NP -> NP from the Extend module.

As for "gutes Buch einige , gesehen", it comes together like this:

 PPartNP
    (DetNP (DetQuant IndefArt NumPl)) -- 'einige'
    see_V2

is an NP of the same structure as Johann, vergessen.

Then we have the CN (UseN book_N), and ApposCN : CN -> NP -> CN puts together "[Buch]:CN [einige, gesehen]:NP", which is still a CN. Then that CN is modified by the adjective good_A.

Note that I don't find that particular thing a problem, because personally my use case is not to use GF to generate random language. My problem is when there is a combination of trees that results in empty strings being parsed into some subtree. I don't find the existence of ApposCN to be a huge problem, because if you want to use the RGL abstract syntax as a base for an application grammar, you can exclude ApposCN and then it won't bother you anymore.

@hleiss
Copy link
Contributor

hleiss commented Jan 24, 2024

My "Do you understand this?" was a rhetorical question, I just wanted to point out that your solution has some strange effects you might not have been aware of. And I find apposition to a CN still not very convincing; apposition to a PN or NP seems to me a better rule for Lang than ApposCN.

@inariksit
Copy link
Member Author

Oh I see, sorry for misunderstanding! I just personally don't care about strange effects that appear when random generating sentences, ApposCN is far from being the only RGL function that produces mostly nonsense. But I do care about hundreds of junk trees when parsing something totally normal.

I agree with you that NP-based apposition would be a better rule for Grammar/Lang. But you know as well as I do the GF community's commitment to backwards compatibility 😛 So the best you can do is to use Extend.ApposNP in whatever custom RGL-based parsing grammar you put together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants