Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump nom to v5 #53

Merged
merged 44 commits into from
Feb 5, 2020
Merged

Bump nom to v5 #53

merged 44 commits into from
Feb 5, 2020

Conversation

tomharmon
Copy link
Contributor

@tomharmon tomharmon commented Jan 2, 2020

This PR bumps nom to the latest version, v 5.0.1, as of the time of this PR. There are no functional changes in this PR.

NOTE: the reason for such a big diff is because nom v5 macros were rewritten to use the streaming version of all parsers. Since nom-sql assumes complete input, the entire library had to be rewritten to use the new nom functions instead of the macros

tomharmon and others added 30 commits December 31, 2019 11:19
- updates nom dependency in `Cargo.toml` to v5.0.1
- find and replace `CompleteByteSlice` with `&[u8]`
- update `arithmetic.rs` file to no longer have compile errors (except
  for internal dependencies from `common.rs`)
- Fixes all compile errors except one, which this commit also adds a
  TODO since that function is a bit more than I want to do today
- As part of upgrading to nom v5, rewrote the macros of `set.rs` to use
  the new complete parsing functions from nom
- Because nom v5 macros were rewritten to use the stream version of the
  parser, use the complete version of the nom parsers
- Since in nom v5 macros were rewritten to use the streaming version of
  the parser, and we want to do with complete inputs, rewrote the
`parser.rs` to use the complete nom functions
- Since nom v5 macros were rewritten to use the streaming version,
  rewrote `select.rs` to use the complete version of the nom v5
functions since we want to deal with complete input only
- Since in nom v5 macros were rewritten to use the streaming version of
  the parser, and we want to do with complete inputs, rewrote the
`order.rs` to use the complete nom functions
- Since in nom v5 macros were rewritten to use the streaming version of
  the parser, and we want to do with complete inputs, rewrote the
`join.rs` to use the complete nom functions
- Since in nom v5 macros were rewritten to use the streaming version of
  the parser, and we want to do with complete inputs, rewrote the
`insert.rs` to use the complete nom functions
- Since in nom v5 macros were rewritten to use the streaming version of
  the parser, and we want to do with complete inputs, rewrote the
`delete.rs` and `drop.rs` to use the complete nom functions
- Since in nom v5 macros were rewritten to use the streaming version of
  the parser, and we want to do with complete inputs, rewrote the
`create.rs` and `create_table_options.rs` to use the complete nom functions
- Since in nom v5 macros were rewritten to use the streaming version of
  the parser, and we want to do with complete inputs, rewrote the
`condition.rs` to use the complete nom functions
- Since in nom v5 macros were rewritten to use the streaming version of
  the parser, and we want to do with complete inputs, rewrote the
`case.rs` and `compound_select.rs` to use the complete nom functions
- Since in nom v5 macros were rewritten to use the streaming version of
  the parser, and we want to do with complete inputs, rewrote the
`arithmetic.rs` to use the complete nom functions
- Since in nom v5 macros were rewritten to use the streaming version of
  the parser, and we want to do with complete inputs, rewrote the
`common.rs` to use the complete nom function
- Since in nom v5 macros were rewritten to use the streaming version of
  the parser, and we want to do with complete inputs, rewrote the
`statement_terminator` to use the complete nom functions
- Since in nom v5 macros were rewritten to use the streaming version of
  the parser, and we want to do with complete inputs, rewrote the
`common` to use the complete nom functions
@tomharmon tomharmon requested a review from fintelia January 21, 2020 17:35
@spazm
Copy link
Contributor

spazm commented Jan 25, 2020

Rad!
When I saw that this was using nom4, i got myself all pumped up to do this transition having just gone through it in another project. Even better, you've already done it!

What would you like in a review?

@tomharmon
Copy link
Contributor Author

@spazm I didn't have anything in mind since the changes seemed pretty straightforward to me (just converting a calls to a bunch of macros to calls of a bunch of functions). I was able to do the conversion without changing the design or approach, or editing any tests. All tests pass so I'm pretty sure everything works exactly as before. When I opened the PR I thought the white space handling was going to be a little more lax but I'm pretty sure it's the same now.

But yeah no major areas where I had to do anything drastically different. If anything I added a few private helper functions here and there to break things up, but that's about it.

@ms705
Copy link
Owner

ms705 commented Jan 25, 2020

Thank you @thomasharmon808 and @spazm ! I'll take a look over this in the next few days.

@spazm If you'd like to review, that'd be great :-) I'd pay attention to any possible differences in behavior between the old and new code w.r.t. cases that the tests do not necessarily catch.

@fintelia fintelia requested review from ms705 and removed request for fintelia January 25, 2020 19:12
@spazm
Copy link
Contributor

spazm commented Jan 30, 2020

Working through my review now. So far it is a very clear (almost-mechanical -- well done!) translation from nom macro syntax to the corresponding function syntax.

@ms705 et al: I think you'll find that nom5 functions are more pleasant to work with than the old macro system. nom5 still supports the macro forms but macros are now only in streaming context without the ability to force a complete context.

@spazm
Copy link
Contributor

spazm commented Jan 30, 2020

Discussion point:

When converting a multi-parser do_parse!(...) to v5 syntax, we can write it as a tuple(...) and capture the results into a tuple, as @thomasharmon808 has done here.

In my other project, we translated these do_parse!(...) multiple parsers as a series of individual parser commands, where the remaining from each is used as the input to the next (specifically by shadowing the input var). I can't remember if we came up with it or found it in the nom docs? I have no idea if there are performance considerations between the two.

Stylistically, I find it easier to match the saved results to variables when they are closer, and it feels closer to the capture syntax in do_parse!() do_parse!(parse1 >> save: parse2 >> parse3). It is a tad more verbose, since each line is suffixed by the parser invocation (i)?

Any opinions or chatter before we switch the code base over from macro based parsers?

Here's an example of switching the arithmetic_cast_helper(). Each parser result shadows the previous i value.

 fn arithmetic_cast_helper(i: &[u8]) -> IResult<&[u8], (ArithmeticBase, Option<SqlType>)> {
-    let (remaining_input, (_, _, _, _, a_base, _, _, _, _sign, sql_type, _, _)) = tuple((
-        tag_no_case("cast"),
-        multispace0,
-        tag("("),
-        multispace0,
-        // TODO(malte): should be arbitrary expr
-        arithmetic_base,
-        multispace1,
-        tag_no_case("as"),
-        multispace1,
-        opt(terminated(tag_no_case("signed"), multispace1)),
-        type_identifier,
-        multispace0,
-        tag(")"),
-    ))(i)?;
+    let (i, _) = tag_no_case("cast")(i)?;
+    let (i, _) = multispace0(i)?;
+    let (i, _) = tag("(")(i)?;
+    let (i, _) = multispace0(i)?;
+    // TODO(malte): should be arbitrary expr
+    let (i, a_base) = arithmetic_base(i)?;
+    let (i, _) = multispace1(i)?;
+    let (i, _) = tag_no_case("as")(i)?;
+    let (i, _) = multispace1(i)?;
+    let (i, _sign) = opt(terminated(tag_no_case("signed"), multispace1))(i)?;
+    let (i, sql_type) = type_identifier(i)?;
+    let (i, _) = multispace0(i)?;
+    let (i, _) = tag(")")(i)?;
 
-    Ok((remaining_input, (a_base, Some(sql_type))))
+    Ok((i, (a_base, Some(sql_type))))
 }

Edit 1: I probably found it in this nom-tutorial , alternative-final-parser section.
Edit 2: fixed confusing use of 'parsers' by specifying 'macro based parsers'

@tomharmon
Copy link
Contributor Author

tomharmon commented Jan 30, 2020

I have no idea if there are performance considerations between the two.

I am not a very experienced Rustacean but my assumption would actually be that the alternative method to tuples is slower because you have to unwrap the result of each parser. I'm pretty sure (though not 100%) that unwrapping this many times would cause a noticeable overhead. We should test this.

In terms of style I prefer the tuple one but obviously I wrote the code so 😅 .

Any opinions or chatter before we switch the code base over from parsers?

Can you clarify what you mean? The tuple combinator (and all the other nom parser/combinator functions) is still a parser, is it not?

Also just a note to everyone else not reviewing on why I used the tuple combinator relatively often: nom is missing a few combinator functions that would've been helpful but that the tuple combinator can accomplish (eg, take the result of the last parser in a list of parsers. You would have to do preceded(parse1, preceded(parse2, preceded(etc...))) Another example is take the results of the ith - jth parsers in a list of parsers.

Also, the tuple combinator was helpful when there some kind of stuff we have to do with the parse result in order to return what we really wanted (eg match on an optional then construct an object or do some more conversions etc).

Thank you for reviewing by the way @spazm

@spazm
Copy link
Contributor

spazm commented Jan 31, 2020

Any opinions or chatter before we switch the code base over from parsers?

Can you clarify what you mean? The tuple combinator (and all the other nom parser/combinator functions) is still a parser, is it not?

What I mean is I have trouble typing all of the correct words in my sentences. Which is odd since I type way too many words. 😄 I'll edit to the corrected version:

Any opinions or chatter before we switch the code base over from macro based parsers?

I was really confused when I couldn't find this pattern in the nom docs and examples, thinking I must have seen it in the migration_to_nom5 doc. Then last night I was trying to verify my (turns out, incorrect) understanding of nom::sequence::delimited() and noticed it when reading the source. It is used in most of nom::sequence::*, which are the only fixed-size combinators in nom.

To muddy the waters, the examples use tuple(...)

pub fn delimited<I, O1, O2, O3, E: ParseError<I>, F, G, H>(first: F, sep: G, second: H) -> impl Fn(I) -> IResult<I, O2, E>
where
  F: Fn(I) -> IResult<I, O1, E>,
  G: Fn(I) -> IResult<I, O2, E>,
  H: Fn(I) -> IResult<I, O3, E>,
{
  move |input: I| {
    let (input, _) = first(input)?;
    let (input, o2) = sep(input)?;
    second(input).map(|(i, _)| (i, o2))
  }
}

@tomharmon
Copy link
Contributor Author

tomharmon commented Jan 31, 2020

Interesting, I didn't know delimited did that internally. Good point. I'll try to test it regardless and see if there's any performance between the two. We can also see if @ms705 has any preference

@ms705
Copy link
Owner

ms705 commented Jan 31, 2020

I prefer the individual parsers, for reasons similar to @spazm's: capture variables are closer to the parser that sets them. That said, if it complicates things compared to the tuple approach, I'm fine with the tuple approach too.

Performance has not been a huge concern for nom-sql historically, but I'm open to using whichever method is faster.

Copy link
Contributor

@spazm spazm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work! 👍

Finished reading through all the changes and testing some edge cases. Looks like it took 4 days -- feels like it too :)

You used examples of nom function that I hadn't used, mostly preceding and terminated (and corrected my understanding of delimited), it was neat to look them up and learn how they worked. thx!

I only have a few comments left, Several of which are about pre-existing TODOs or bugs that could be migrated into their own issues.

src/order.rs Show resolved Hide resolved
src/compound_select.rs Outdated Show resolved Hide resolved
src/compound_select.rs Show resolved Hide resolved
src/create_table_options.rs Show resolved Hide resolved
src/create_table_options.rs Show resolved Hide resolved
src/select.rs Outdated Show resolved Hide resolved
src/select.rs Show resolved Hide resolved
src/select.rs Show resolved Hide resolved
src/keywords.rs Show resolved Hide resolved
src/keywords.rs Show resolved Hide resolved
spazm added 3 commits February 2, 2020 20:57
* match is a single expression.  It doesn't need to be wrapped in
  {} braces for the closure
* Type signature for higher order combinators gets verbose and messy quickly.
* The generic nature of create_option_equals_pair requires that ws_sep_equals
  also be generic over I.
* I would have been happy to make create_option_equals_pair work over &u[8]
  directly, rather than a parametized I.
@spazm
Copy link
Contributor

spazm commented Feb 3, 2020

@thomasharmon808:
I pushed a branch with changes for 3 of these items to thomasharmon808/nom-sql#5. Feel free to merge/cherry-pick any/none/all of them.

@tomharmon
Copy link
Contributor Author

tomharmon commented Feb 3, 2020

Thank you @spazm ! I will try to resolve these when I have time over the next week. I will also merge in your branch asap.

@tomharmon
Copy link
Contributor Author

@spazm got excited that someone reviewed my first ever open source PR, so I finished up looking over your suggestions.

I really liked your PR on my branch, so I merged that all in. Looks like it actually ended up resolving 4 of the comments you had, so thanks! I resolved all those threads.

I added issues for most of your other comments, since they were about existing behaviors. After replying with a link to the issues, I also resolved those threads.

So I guess that leaves us with the tuple stuff, @spazm is it a hard requirement to change them or should we just write not write them like that in the future?

Sorry don't mean to evade changing them cause I actually think it is a good idea that may be useful for debugging (though I still suspect it might impact performance slightly, though I haven't tested it yet). I just don't have a lot of time right now (No one does 😢, which makes me appreciate the review that much more @spazm )

@spazm
Copy link
Contributor

spazm commented Feb 3, 2020

So I guess that leaves us with the tuple stuff, @spazm is it a hard requirement to change them or should we just write not write them like that in the future?

I enjoyed having the discussion. I don't think changing is a requirement at all.

Thank you so much for making and linking the new issues. excellent follow-through.

I added one tiny response about adding a comment for "SET".

@spazm
Copy link
Contributor

spazm commented Feb 3, 2020

I give this PR a hearty LGTM (looks good to me). :LGTM: 👍

back to you @ms705

@tomharmon
Copy link
Contributor Author

@spazm I enjoyed the discussion as well. I pushed a commit adding the comment and will wait to hear from malte

@ms705
Copy link
Owner

ms705 commented Feb 5, 2020

LGTM! Thanks for all your efforts. I'll merge this now 🎉

@ms705 ms705 merged commit 6d1e461 into ms705:master Feb 5, 2020
@ms705
Copy link
Owner

ms705 commented Feb 5, 2020

Published to crates.io as v0.0.11.

@tomharmon tomharmon deleted the upgrade-nom branch February 5, 2020 20:51
@spazm spazm mentioned this pull request Feb 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants