-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Microdata parser: updated the parser to the latest version of the microdata->rdf note (published in December 2014) #443
Conversation
…rodata->rdf note (published in December 2014)
self.base = 'file://'+name | ||
return open(name, 'rb') | ||
self.base = name | ||
return file(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iherman sooo... the problem is that with both, your original changes on top of the six_2to3 branch (see #444) as well as after rebasing them onto master (as here), there are some tests not passing. I tried out if it's just the two lines that i commented on above in #445 but sadly that didn't do the trick. |
so i guess something in this change (no matter based on which branch) isn't compatible with the tests introduced for #375. I still think that the two lines that i restored in #445 are part of the problem but seems to be something else still. I'd suggest to just continue developing on this branch (after hard resetting your local one to it). If you push to it this Pull Request will update and automatically re-run the tests. In case you want to try yourself you can still make an own Pull Request from your original change based on the six_2to3 branch, but it should be identical to #444. |
Jörn, before I do anything... I wonder whether it is not cleaner if we do a fresh start. Here is what I have in mind. For historical and practical reason, I do maintain the 'original' repo for pyMicrodata: https://github.com/RDFLib/pymicrodata I need this, because I have to maintain a microdata service on the W3C site without installing the full updated RDFLib (you do not want to know the details:-). That was the repo that I originally used, that I then moved to RDFLib when I pushed the microdata (and the RDFa 1.1) parsers into the main branch. In both cases I simply made a copy of the whole package into RDFLib and created a separate interface to the plugin structure of RDFLib. They way I updated the implementation last week was to update the separate version first and test it with all the microdata tests. The changes are on my machine, not on the repo yet; I want to do push it when the new microdata->RDF document is published. To make the update on RDFLib, what I intended to do is to, essentially, copy the content of that original package into RDFLib (under pyMicrodata) and make the update of the interface. Essentially what I did in the past, thus. I did two mistakes, however
So... isn't it simpler if
This requires more work but that if fine. I made the mess, so I believe it is my job to clean this up somehow. It will take some more time, but there is no rush, after all. Note that I cannot really check the python3 version and the issues around that, but I hope this would work out nevertheless. WDYT? Is this a possibility? And thanks for all your help Ivan
Ivan Herman |
sorry for the late reply, got some urgent stuff coming in :-/ First of all: you didn't do anything wrong, i was just surprised that you based something on six_2to3 which is pretty much unstable. The thing is: only master is meant to be stable. So everything else in branches is meant to be "in development" and unstable. This means we can just try out things and even push them online for others to see and collaborate, but the underlying agreement is that none of that is stable until merged into master (or some other stable branch(es) in other projects). Hence, what you and i did here was just that. Let me answer the rest inline:
Are the microdata tests in rdflib as well? Cause they should be... you essentially create a branch, push some changes, making them available online... then you can immediately create the pull request and describe what that branch is for. This has the nice side-effect that travis will notice and run the tests for all supported environments. So if you have more tests, they should be run inside rdflib as well.
Actually that's one of the nice things about git... it can deal with this & support cleaning it up, as it's fairly common in "multiplayer development" (see below).
Hmm, as described above: github will probably keep the commits somewhere as we discussed them, but they won't turn up in master unless someone merges them and so from a "release" point of view we didn't do anything yet. I'm keeping the microdata-to-rdf-second-edition-bak branch around at the moment so we can still compare / see it, but it's just a pointer and when we dealt with this i'll just remove it and it's gone. What i actually already did with this branch (microdata-to-rdf-second-edition) was: i rebased your changes on top of master. What this means is that i actually already did the second point you mentioned with the help of I'd suggest to watch this yourself: do a Now if your local Thing is: this is now as if you developed from master, but it seems it fails some of the tests in rdflib, so i wouldn't merge this back into master, as it breaks the one stable branch we have. I already tried a simple way to fix this in #445 (by just reverting the two changes i commented on before here: joernhees@47e0416 ), but as you can see in #445 the error remains. So i guess what you could do is investigate why the tests fail more thoroughly from the current state of microdata-to-rdf-second-edition aka this page aka #443 ... they have something to do with #375 and #406 / #403, which is why i tried reverting the two lines in question in #445, but seems your change to pyMicrodata did some other stuff that the test for #375 don't like. On all environments by the way, not only on py3.
Actually it wasn't too much work, just some "git magic" which was understandably a bit confusing. I hope this answer clears it up a bit ;) j |
Jörn, I will look at this, although I cannot tell you when that will happen. One answer to a side-issue:
They are not, but... what I referred to are the tests part of the microdata-rdf repository: https://github.com/w3c/microdata-rdf/tree/gh-pages/tests I am not familiar with the test harness approach in RDFLib (never used that) but, in any case, that repo contains the 'official' set of tests that accompany the conversion specification. I do not think it is possible, or even a good idea, to move them to the RDFLib repo. Cheers Ivan
Ivan Herman |
…ct (master) base. The previous attempt went wrong because I started with a wrong branch:-( Jörn rebased it, and I re-did the __init__.py file from scratch.
So... I did what I planned, with your help; I took the init file from the master branch, redid the necessary changes, and pushed back through the microdata-to-rdf-second-edition. The version runs on my machine with all the official microdata-rdf tests. Let us hope that will work out now... Ivan
Ivan Herman |
cool, but the tests still fail wrt. #375 ... see https://github.com/RDFLib/rdflib/blob/master/test/test_issue375.py i checked this locally and found out that the expected string differs from the result string only in the preamble. the test expects: but the new version seems to contain: This is another thing that seems to have been fixed in rdflib but not in the pymicrodata repo... see commits 3182efb , 9321c66 and 45cbc63 . So is the test right or the new pymicrodata version? What this seems to show is that we're actually dealing with 2 versions of pymicrodata which got out of sync a while ago. People fixed some bugs in rdflib, but they weren't incorporated back in the standalone pymicrodata... Maybe if you really need to keep the standalone pymicrodata, we should consider removing it from rdflib core, make it a python package and have rdflib depend on it so that pip would automagically install it when rdflib is installed? That way we could maybe defuse this in the future... Another very weird thing i found when testing this on my machine: even on master this fails for me now! |
Jörn, thanks. But...
I am not even sure where it comes from. First of all, from an RDF point of view, the difference is meaningless. If you look at the relevant part of test_issue375, it has loads of @Prefix statements in the test that are not used in the required RDF code, ie, whether it is cat or dcat is unimportant; actually, neither of that is used. (I do not know how the test harness works, but it should not be textual comparison anyway...) I also do not know where the current 'cat' version comes from. When I run the test on my machine, I get a much smaller set of @Prefixes without that one. What has changed, though, is as follows. In the mdata generation I do add a number of namespaces with prefixes, so that, if there is a serialization, the result would look better. In the previous version I used all the prefixes from the RDFa initial context which is, actually, encoded in the pyRdfa part of the distribution (pyRdfa/initialcontext.py) and that the old mdata code simply reused the list there. However, this is not in line with the current microdata practice which uses only a few namespaces, if at all, beyond the core schema.org. I have therefore removed the generation of those namespaces, and reduced the other namespaces as well. With all that: I have no idea where that prefix setting comes from. There is no prefix setting in the microdata code for dcat; there is one in the pyRdfa part using 'dcat' (though this is not used in the mdata code).
These are all around the same issue, and are all referring to pyRdfa. That one uses dcat, I checked both the master branch and this microdata-to-rdf-second-version one. I must admit I have no idea where this 'cat' comes from (I did not author those tests, and I do not even know how they exactly work).
I believe my code is correct. There seems to be some testing issue.
This is the first time this happened... (and it is clearly my fault). And I would prefer to keep it as it is now. Not everybody use pip, that would create problem for those who are already relying on this. What I hope for is that, soon, I can get things changed on the W3C site and such things would not happen again. (Do you know how often things change? Ie, if there is a change in the RDFLib repo, how much time does it take to get into the latest release?)
I really do not know. The initial context, ie, the usage of 'dcat' as a prefix, is in sync in all the versions I have and, in fact, with the 'official' set http://www.w3.org/2011/rdfa-context/rdfa-1.1 I do add new entries to the pyRdfa code when the official list is extended, and I did that in the past. I must admit I was a little bit careless on one front: I simply updated that on the main branch, because it is easy to see if it is correct or not (I should probably do it via pull requests in future). But the last change occurred quite a while ago (almost a year ago, if I see the CVS log on the w3c page). :-( Thanks! Ivan
Ivan Herman |
We have at least three srparate issues here. git submodules would be another option for keeping pyRdfa in a separate repository without running into synchronisation issues, if we really need to keep them separate. The last time I tried to use the standalone pyRdfa module, however, it was effectively useless without RDFLib. I'm skeptical that keeping it separate serves a real practical purpose. As for test 375, it uses a comparison of the text output because it's trying to ensure that rdfpipe is, in fact, working, and as far as I can tell checking the text output is the only way to do that. The output should be consistent between runs, right? If the expected output changes because we decided to change the default contexts, then the expected output should be changed. That said, it's possible that there is a problem with the test in how it invokes rdfpipe as a subprocess. Perhaps it is picking up on an installed version of RDFLib rather than the libraries in the repo. |
Indeed. Both modules (RDFa and mdata) were originally developed on top of RDFLib; then, at some point in the past, I was convinced (I do not remember who, I must admit) that it would be better to add these to the core distribution, which I did. Note that his was quite some time ago (about two years). Going backwards now would really be an issue. Let us try to do without this.
Between runs: yes. But if something changes in one of the modules, then it should be updated. Which is where the problem comes; I did not contribute those tests, I am not even sure how they exactly work:-(
That I do not know:-( Thanks Ivan
Ivan Herman |
Going through this again, the remaining problem is that rdflib and pymicrodata diverged when some issue was fixed in rdflib but not in pymicrodata. Then you made a large update to pymicrodata and that update now seems to break one of the rdflib tests (which was introduced after the fixes). I think we should definitely do two things:
Wrt. to 2 you already pointed out that the test might not be the best, which i agree to... it is just a syntactic test for semantic data. The benefit is that it notified us of a problem. Part of this problem seems to have been that some default prefixes changed, which could be desired or not. If you say it's ok, it's not much work to fix the test. Another thing (and where we definitely started to talk about too many different things at once) is how to make sure we won't have such a problem again. Let's put that decision into #582. |
Jörn, as I said in my original reply: I really have no time to deal with this issue now. Is it something that you can take care of? Ivan
|
short on time myself, but i'll look into this |
OK, having looked into this for a while yesterday and today, this is where i'm at:
As the changes are backported already it seems, i'll next copy all the new version back into rdflib and make a PR to see how that goes with our tests. Finally, when that's figured out, i'll close this and we'll think about which way to resolve this massive timesink once and for all in #582. |
…587, #443, #444, #445 * microdata-to-rdf-third-edition: some whitespace cleanup updated microdata test for #375 wrt. #587 (microdata-rdf 2014) modified microdata test for #375 wrt. isomorphism and better output updated MicrodataParser to reflect that pyMicrodata no longer has vocab expansion and cache args syncing changes from pyMicrodata rev c760db0e77c13c4e80fdef675f3c65497f8d08bf
See #828 |
re-based microdata-to-rdf-second-edition from six_2to3 branch as described here: b082c48#commitcomment-8975378
pull request to run the tests...