Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle non-UTF-8 CLI args #7

Merged
merged 2 commits into from
Dec 21, 2024
Merged

Handle non-UTF-8 CLI args #7

merged 2 commits into from
Dec 21, 2024

Conversation

smores56
Copy link
Owner

Properly handle the real encoding of command-line arguments which is:

  • U8 strings on Unix systems, not guaranteed UTF-8
  • U16 strings on Windows, not guaranteed UTF-16

In addition, provide Param.bytes/Opt.bytes and Param.arg/Opt.arg to allow engagement with the proper encoding by Weaver users.

@@ -67,6 +68,9 @@ format_arg_extract_err = \err ->
InvalidValue reason ->
"The value provided to $(option_display_name option) was not a valid $(option_type_name option): $(reason)"

InvalidUnicode ->
"The value provided to $(option_display_name option) was not valid UTF-8."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The err is InvalidUnicode which tactfully sidesteps the u8/u16 thing, but the message here says utf-8, should we "invalid unicode"?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably call this InvalidUtf8, actually. The idea is that an Arg is either a U8 or a U16 list, but Str can only be valid UTF-8. We try to convert either the U8 list or the U16 list to UTF-8, and if that fails this error is raised.

@@ -75,6 +79,9 @@ format_arg_extract_err = \err ->
InvalidValue reason ->
"The value provided to the '$(param |> .name)' parameter was not a valid $(param |> .type |> full_type_name): $(reason)."

InvalidUnicode ->
"The value provided to the '$(param |> .name)' parameter was not valid UTF-8."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

package/Arg.roc Show resolved Hide resolved
Comment on lines +52 to +54
## Convert an [Arg] to a list of bytes.
to_bytes : Arg -> List U8
to_bytes = \@Arg arg ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used?

Your pickup on my PR was about these not being bytes for Windows. Not sure if that is relevant here, just making a note in case.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new Opt.bytes function that converts Arg to bytes if someone wants the underlying bytes from an Arg.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for Windows with its U16 codepoints instead of bytes, I convert to bytes which is a valid thing to do, it's just sidestepping the OS awareness

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly people should either use Opt.arg for actual handling or Opt.str for decoding to a proper string.

@smores56 smores56 merged commit 99dbd5d into main Dec 21, 2024
1 check passed
@smores56 smores56 deleted the os-aware-args branch December 21, 2024 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants