Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gather a real test suite #7

Open
lassik opened this issue May 12, 2021 · 4 comments
Open

Gather a real test suite #7

lassik opened this issue May 12, 2021 · 4 comments

Comments

@lassik
Copy link
Collaborator

lassik commented May 12, 2021

We should probably collect a few POSE files for use as test inputs (and verify that writing them produces the same encoding in an agreed-upon normal form).

@wallymathieu
Copy link
Member

One way would be to have a pose-format that represents the AST so that we could have a simpler multi language test suite. Say for:

(symbol \"value\")

you would get the output:

((symbol \"symbol\") (string \"value\"))

?

@lassik
Copy link
Collaborator Author

lassik commented May 15, 2021

That would necessitate adding extra code to the POSE writer in each library to write out the AST representation (or to construct a meta-level representation of a POSE expression that has been read in, i.e. a mapping from Exp to Exp).

The draw of S-expressions is that they can be their own AST; the mapping from S-exp to AST is 1:1. It'll be easier to write test data that covers all the data types we support. I expect most bugs to be in edge cases about what characters are allowed to be part of symbols, what counts as whitespace, etc; the big picture (which datums are contained in a file, and how they are nested in each other) is reasonably easy to get right.

@lassik
Copy link
Collaborator Author

lassik commented May 15, 2021

Unicode handling is another place where bugs easily lurk. BOM (byte order mark), UTF-8 vs UTF-16, normalization forms (NFC vs NFD), non-ASCII whitespace, etc. And some string types (e.g. in Go) are byte strings internally, whereas others (.NET and JVM) are UTF-16, and still others (Gambit Scheme) are UTF-32.

@lassik
Copy link
Collaborator Author

lassik commented May 15, 2021

The most esoteric test cases should probably be written as hex dumps... The raw bytes would easily get mangled by text editors and other tools that try to clean up the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants