Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a validate_all target #28

Closed
bansp opened this issue Jan 6, 2021 · 3 comments
Closed

Add a validate_all target #28

bansp opened this issue Jan 6, 2021 · 3 comments

Comments

@bansp
Copy link
Member

bansp commented Jan 6, 2021

This is logically part of the functionality suggested in #11 and I needed something like this to proceed with freedict/fd-dictionaries#62 .
This ticket is partially to document my interim solution but mostly to serve as an anchor for the hopefully not-too-distant commit, and a place to consider extending the functionality, again in the spirit of #11.

First, a non-commit that does the job, in a maddeningly slow manner:

validate_all:
	for dict in $(DICTS); do \
		cd ./$$dict ; \
		xmllint --noout --xinclude --relaxng freedict-P5.rng $$dict.tei ; \
		cd .. ; \
	done

The slowness is sadly due to the overall architecture of this kind of call: the schema is recompiled for each new iteration, and then the parsing of the XML begins, and it can't be helped with the --stream flag, because it looks like our huge databases are still too small for streaming to be an effective enhancement.

I tried to make the above more kosher with respect to our building system architecture, use the variables set in dicts.mk and the target defined there, for single databases, namely validation.

validate_all:
	for dict in $(DICTS); do \
		$(MAKE) -C $$dict validation; \
	done

And with the above, the troubles I described in #27 began. So until #27 is handled, I can only use my makeshift solution at the top. Not sure if it's worth committing temporarily, because we all know how permanent temporary solutions can be.

Potential extensions

  • apart from validation against the RNG schema, we might want to simply check well-formedness (in the case of xmllint, it just a matter of dropping the --relaxng freedict-P5.rng fragment, so the whole thing could be a single parametrized statement)
  • depending on how the databases grow, maybe --stream can be used, for huge databases (otherwise it doesn't buy any time, but rather costs extra)
  • we might also see if we can find a sensible parser different from xmllint that precompiles the schema, to save some time, or if we can keep the parsing library preloaded; of course, that would have to be an optional extra for the user
@bansp
Copy link
Member Author

bansp commented Jan 6, 2021

Just noting that commit e5cdfe0 is a minor addition to dicts.mk: the --xinclude parameter to xmllint, to handle eng-pol, but maybe, in the future, also other large databases, if we decide to split them up into more manageable chunks. It is harmless if there are no Xinclusions in the document.

@humenda humenda closed this as completed in 574a35a Jan 6, 2021
@humenda
Copy link
Member

humenda commented Jan 6, 2021 via email

@bansp
Copy link
Member Author

bansp commented Jan 6, 2021

Thank you, Wizard! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants