Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how hard would it be to extend pyRdfa3 to lxml.etree? #29

Open
doriantaylor opened this issue Apr 11, 2019 · 2 comments
Open

how hard would it be to extend pyRdfa3 to lxml.etree? #29

doriantaylor opened this issue Apr 11, 2019 · 2 comments

Comments

@doriantaylor
Copy link

Hey there,

Just tried to feed graph_from_DOM an already-parsed lxml.etree document and I tripped over the fact that it only speaks xml.dom.minidom. Since both these APIs give access to roughly the same information (at least as far as RDFa is concerned), I'd be okay with trying to make it handle both—unless it was too much of a snarl, or you didn't want it to for some reason.

Thoughts?

@iherman
Copy link
Contributor

iherman commented Apr 12, 2019

@doriantaylor to be honest, I have no idea. This code is fairly old; when I began its first version (must be way more than 10 years ago…), minidom was in the tool for xml and, following the adage "ain't broken, don't fix it" I never really changed it. I cannot judge the difficulty.

One potential issue may be (but again it may not be…) whether there is a clear compatibility in the interface between the minidom used when parsing a pure XML content (say, an SVG file) and what is produced via the html5parser. I would be surprised if there was a difference, but this must be checked. Obviously, html5parser (which is an external dependency) plays an essential role.

I do not have any objection at all if you try. Mind you, this library is behind the RDFa distiller and parser service at W3C (which has a decent usage), so there has to be extra care in adopting any change…

@doriantaylor
Copy link
Author

doriantaylor commented Apr 15, 2019

It looks like html5lib has an option to construct output with lxml.etree, however my reading of graph_from_DOM is that it's farther down the pipeline than that. One might be able to get away with a small proxy class that does a partial implementation:

  • detect lxml.etree._Element
  • wrap in something called maybe DOMNodeProxy
  • use as normal

I will take a look at what this entails. Maybe somebody has done it already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants