Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML output option creates invalid characters #56

Open
solfeggietto opened this issue Jul 2, 2020 · 2 comments
Open

XML output option creates invalid characters #56

solfeggietto opened this issue Jul 2, 2020 · 2 comments

Comments

@solfeggietto
Copy link

solfeggietto commented Jul 2, 2020

  1. Character encoding is not set
    Used: ?xml version="1.0" ?
    Should be changed to: ?xml version="1.0" encoding="UTF-8"?

  2. Examples of the Noark 5-validation report with invalid xml-characters

With E5 for å:

> <test id="AST13" name="Record status" description="Tests whether non-finalized records exist in the archive. A record is considered non-finalized if a finalized date or a finalizing party is not specified, or if the record status is not 'Arkivert' or 'Utgår'." summary="0" info="1" warn="1301" error="0"></test>

Norwegian å is saved as ascii 1 character hex E5

I believe this could be saved as the character entity:
& aring; | Latin small letter a with ring above

With & aring; for å

> <test id="AST13" name="Record status" description="Tests whether non-finalized records exist in the archive. A record is considered non-finalized if a finalized date or a finalizing party is not specified, or if the record status is not 'Arkivert' or 'Utg&aring;r'." summary="0" info="1" warn="1301" error="0"></test>

But better to use UTF-8 declaration at top and save the file as UTF-8
å | LATIN SMALL LETTER A WITH RING ABOVE (U+00E5) | c3a5

@solfeggietto
Copy link
Author

Correction:
A valid one character ascii replacement for E5 (å) is:
& #229;

As & aring; is not approved by my xml check in notepad++ at least...

@solfeggietto
Copy link
Author

solfeggietto commented Jul 2, 2020

Suggested solution:

  1. Encoding declaration first line xml: ?xml version="1.0" encoding="UTF-8"?
  2. Use a standard character encoding check and replace illegal characters with their equivalents
    Now the case of "å" within the word "Utgår" will be saved correctly in the xml output file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant