-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Round-tripping string values that look like backward compatible arrays #5
Comments
I think your observation is correct, due to our unfortunate need for backward compatibility. And I can't think of any way to fix that right now, at least for strings that look like a single item of another primitive type (for things that look like arrays we could make the writer backslash escape the whitespace). You could backslash escape one character in all strings, but that still leaves you unable to represent the string 'n', because '\n' is special, which is ugly.
I was going to suggest using single quotes somehow, but then I realized that those are already supported by the fortran reader, so that sort of defines the backward compatibility requirements. Did we remember to support those in the new standard?
|
Obviously we could do something like add a new label for an explicit string (e.g. key=S_"1 2 3"), but writers would either need to use it always, even though it's ugly, or know enough about the standard to decide when it's necessary.
BTW, did we actually make a conscious decision not to support single quotes in the new standard? FWIW the Fortran reader (C, really) currently supports it, if we think that defines our backward compatibility requirements.
|
Single quote string would solve the issue here, writer could always use single quote when writing strings that needs to be quoted. |
I'm also happy to support single quoted strings. Need to check if they work in the current implementation, and update the spec. |
Are we OK with defining single quotes as strings only, even though the old Fortran (extxyz.c, really) implementation treats |
Yes, I think that's fine - most writers will have used |
OK, let's add single quotes as a string-only container, and then writers can always use that for strings unambiguously. |
This goes against the “human readability” a little bit. I would read ‘1 2 3’ as equivalent to “1 2 3”, and it would be confusing if it isn’t. From the original email I like key={"1 3 4 5”} to guarantee that it is read back as a string. then the writer doesn’t need a parser.
…-- Gábor
Gábor Csányi
Professor of Molecular Modelling
Engineering Laboratory, University of Cambridge
Pembroke College Cambridge
Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/
On 6 Aug 2021, at 21:02, bernstei ***@***.***> wrote:
OK, let's add single quotes as a string-only container, and then writers can always use that for strings unambiguously.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
|
I guess. I was hoping to basically deprecate the confusing aspects of the
original syntax, though (arrays in quotes and single element arrays being
scalars). This sort of enshrines it (at least the second) instead. I
suppose it may still be better than single quotes.
By the way, I think in the old reader any number of nested quotes were
equivalent. We’ve already chosen to break that much backward compatibility,
But the old reader may well interpret the {“1 2 3“} as an array of ints
anyway.
|
I realized I was not very clear in my last email, so here is a separate issue, hopefully clearer.
I'm wondering about the possibility to round-trip string values, i.e. store some data in extended XYZ and then be able to read it again.
In particular, there is some ambiguity here between string values and backward compatible arrays. If the user asks to store the string
1 3 4 5
(with spaces inside) with thestring
key in an extended XYZ comment line, simply writing it as akey="1 3 4 5"
is not enough, since that will be read later as an array of 4 integers. It is possible to use a single element{}
array instead, i.e. write this askey={"1 3 4 5"}
, which will then be read as intended by the parser. The example from my email was about a string containing a single number or boolean value (True/...) which can not be written directly askey="3"
, but the same applies for string that would look like arrays.This means that any extended XYZ writer must check all string values to ensure they would parse back as string, and otherwise use the
{}
array trick to make sure the type of the value is preserved in the extended XYZ comment line.Do you agree with my reading of the spec and is this the intended behaviour?
I fear it will make writing to extended XYZ a bit more complex, since now every software that needs to write to this format also needs to implement a (partial) parser for it; but I also understand this is allowed for backward compatibility reasons.
The text was updated successfully, but these errors were encountered: