Bump nom to v5 #53

tomharmon · 2020-01-02T08:48:32Z

This PR bumps nom to the latest version, v 5.0.1, as of the time of this PR. There are no functional changes in this PR.

NOTE: the reason for such a big diff is because nom v5 macros were rewritten to use the streaming version of all parsers. Since nom-sql assumes complete input, the entire library had to be rewritten to use the new nom functions instead of the macros

- updates nom dependency in `Cargo.toml` to v5.0.1 - find and replace `CompleteByteSlice` with `&[u8]` - update `arithmetic.rs` file to no longer have compile errors (except for internal dependencies from `common.rs`)

- Fixes all compile errors except one, which this commit also adds a TODO since that function is a bit more than I want to do today

- As part of upgrading to nom v5, rewrote the macros of `set.rs` to use the new complete parsing functions from nom

- Because nom v5 macros were rewritten to use the stream version of the parser, use the complete version of the nom parsers

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `parser.rs` to use the complete nom functions

- Since nom v5 macros were rewritten to use the streaming version, rewrote `select.rs` to use the complete version of the nom v5 functions since we want to deal with complete input only

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `order.rs` to use the complete nom functions

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `join.rs` to use the complete nom functions

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `insert.rs` to use the complete nom functions

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `delete.rs` and `drop.rs` to use the complete nom functions

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `create.rs` and `create_table_options.rs` to use the complete nom functions

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `condition.rs` to use the complete nom functions

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `case.rs` and `compound_select.rs` to use the complete nom functions

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `arithmetic.rs` to use the complete nom functions

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `common.rs` to use the complete nom function

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `statement_terminator` to use the complete nom functions

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `common` to use the complete nom functions

spazm · 2020-01-25T03:29:48Z

Rad!
When I saw that this was using nom4, i got myself all pumped up to do this transition having just gone through it in another project. Even better, you've already done it!

What would you like in a review?

tomharmon · 2020-01-25T12:54:53Z

@spazm I didn't have anything in mind since the changes seemed pretty straightforward to me (just converting a calls to a bunch of macros to calls of a bunch of functions). I was able to do the conversion without changing the design or approach, or editing any tests. All tests pass so I'm pretty sure everything works exactly as before. When I opened the PR I thought the white space handling was going to be a little more lax but I'm pretty sure it's the same now.

But yeah no major areas where I had to do anything drastically different. If anything I added a few private helper functions here and there to break things up, but that's about it.

ms705 · 2020-01-25T14:41:01Z

Thank you @thomasharmon808 and @spazm ! I'll take a look over this in the next few days.

@spazm If you'd like to review, that'd be great :-) I'd pay attention to any possible differences in behavior between the old and new code w.r.t. cases that the tests do not necessarily catch.

spazm · 2020-01-30T05:11:10Z

Working through my review now. So far it is a very clear (almost-mechanical -- well done!) translation from nom macro syntax to the corresponding function syntax.

@ms705 et al: I think you'll find that nom5 functions are more pleasant to work with than the old macro system. nom5 still supports the macro forms but macros are now only in streaming context without the ability to force a complete context.

spazm · 2020-01-30T05:11:26Z

Discussion point:

When converting a multi-parser do_parse!(...) to v5 syntax, we can write it as a tuple(...) and capture the results into a tuple, as @thomasharmon808 has done here.

In my other project, we translated these do_parse!(...) multiple parsers as a series of individual parser commands, where the remaining from each is used as the input to the next (specifically by shadowing the input var). I can't remember if we came up with it or found it in the nom docs? I have no idea if there are performance considerations between the two.

Stylistically, I find it easier to match the saved results to variables when they are closer, and it feels closer to the capture syntax in do_parse!() do_parse!(parse1 >> save: parse2 >> parse3). It is a tad more verbose, since each line is suffixed by the parser invocation (i)?

Any opinions or chatter before we switch the code base over from macro based parsers?

Here's an example of switching the arithmetic_cast_helper(). Each parser result shadows the previous i value.

 fn arithmetic_cast_helper(i: &[u8]) -> IResult<&[u8], (ArithmeticBase, Option<SqlType>)> {
-    let (remaining_input, (_, _, _, _, a_base, _, _, _, _sign, sql_type, _, _)) = tuple((
-        tag_no_case("cast"),
-        multispace0,
-        tag("("),
-        multispace0,
-        // TODO(malte): should be arbitrary expr
-        arithmetic_base,
-        multispace1,
-        tag_no_case("as"),
-        multispace1,
-        opt(terminated(tag_no_case("signed"), multispace1)),
-        type_identifier,
-        multispace0,
-        tag(")"),
-    ))(i)?;
+    let (i, _) = tag_no_case("cast")(i)?;
+    let (i, _) = multispace0(i)?;
+    let (i, _) = tag("(")(i)?;
+    let (i, _) = multispace0(i)?;
+    // TODO(malte): should be arbitrary expr
+    let (i, a_base) = arithmetic_base(i)?;
+    let (i, _) = multispace1(i)?;
+    let (i, _) = tag_no_case("as")(i)?;
+    let (i, _) = multispace1(i)?;
+    let (i, _sign) = opt(terminated(tag_no_case("signed"), multispace1))(i)?;
+    let (i, sql_type) = type_identifier(i)?;
+    let (i, _) = multispace0(i)?;
+    let (i, _) = tag(")")(i)?;
 
-    Ok((remaining_input, (a_base, Some(sql_type))))
+    Ok((i, (a_base, Some(sql_type))))
 }

Edit 1: I probably found it in this nom-tutorial , alternative-final-parser section.
Edit 2: fixed confusing use of 'parsers' by specifying 'macro based parsers'

tomharmon · 2020-01-30T13:49:13Z

I have no idea if there are performance considerations between the two.

I am not a very experienced Rustacean but my assumption would actually be that the alternative method to tuples is slower because you have to unwrap the result of each parser. I'm pretty sure (though not 100%) that unwrapping this many times would cause a noticeable overhead. We should test this.

In terms of style I prefer the tuple one but obviously I wrote the code so 😅 .

Any opinions or chatter before we switch the code base over from parsers?

Can you clarify what you mean? The tuple combinator (and all the other nom parser/combinator functions) is still a parser, is it not?

Also just a note to everyone else not reviewing on why I used the tuple combinator relatively often: nom is missing a few combinator functions that would've been helpful but that the tuple combinator can accomplish (eg, take the result of the last parser in a list of parsers. You would have to do preceded(parse1, preceded(parse2, preceded(etc...))) Another example is take the results of the ith - jth parsers in a list of parsers.

Also, the tuple combinator was helpful when there some kind of stuff we have to do with the parse result in order to return what we really wanted (eg match on an optional then construct an object or do some more conversions etc).

Thank you for reviewing by the way @spazm

spazm · 2020-01-31T07:23:59Z

Any opinions or chatter before we switch the code base over from parsers?

Can you clarify what you mean? The tuple combinator (and all the other nom parser/combinator functions) is still a parser, is it not?

What I mean is I have trouble typing all of the correct words in my sentences. Which is odd since I type way too many words. 😄 I'll edit to the corrected version:

Any opinions or chatter before we switch the code base over from macro based parsers?

I was really confused when I couldn't find this pattern in the nom docs and examples, thinking I must have seen it in the migration_to_nom5 doc. Then last night I was trying to verify my (turns out, incorrect) understanding of nom::sequence::delimited() and noticed it when reading the source. It is used in most of nom::sequence::*, which are the only fixed-size combinators in nom.

To muddy the waters, the examples use tuple(...)

pub fn delimited<I, O1, O2, O3, E: ParseError<I>, F, G, H>(first: F, sep: G, second: H) -> impl Fn(I) -> IResult<I, O2, E>
where
  F: Fn(I) -> IResult<I, O1, E>,
  G: Fn(I) -> IResult<I, O2, E>,
  H: Fn(I) -> IResult<I, O3, E>,
{
  move |input: I| {
    let (input, _) = first(input)?;
    let (input, o2) = sep(input)?;
    second(input).map(|(i, _)| (i, o2))
  }
}

tomharmon · 2020-01-31T14:23:15Z

Interesting, I didn't know delimited did that internally. Good point. I'll try to test it regardless and see if there's any performance between the two. We can also see if @ms705 has any preference

ms705 · 2020-01-31T16:16:20Z

I prefer the individual parsers, for reasons similar to @spazm's: capture variables are closer to the parser that sets them. That said, if it complicates things compared to the tuple approach, I'm fine with the tuple approach too.

Performance has not been a huge concern for nom-sql historically, but I'm open to using whichever method is faster.

spazm

Excellent work! 👍

Finished reading through all the changes and testing some edge cases. Looks like it took 4 days -- feels like it too :)

You used examples of nom function that I hadn't used, mostly preceding and terminated (and corrected my understanding of delimited), it was neat to look them up and learn how they worked. thx!

I only have a few comments left, Several of which are about pre-existing TODOs or bugs that could be migrated into their own issues.

src/order.rs

src/compound_select.rs

src/create_table_options.rs

src/select.rs

src/keywords.rs

* match is a single expression. It doesn't need to be wrapped in {} braces for the closure

* Type signature for higher order combinators gets verbose and messy quickly. * The generic nature of create_option_equals_pair requires that ws_sep_equals also be generic over I. * I would have been happy to make create_option_equals_pair work over &u[8] directly, rather than a parametized I.

spazm · 2020-02-03T09:16:26Z

@thomasharmon808:
I pushed a branch with changes for 3 of these items to thomasharmon808/nom-sql#5. Feel free to merge/cherry-pick any/none/all of them.

tomharmon · 2020-02-03T13:54:28Z

Thank you @spazm ! I will try to resolve these when I have time over the next week. I will also merge in your branch asap.

Changes for my comments on PR #53

tomharmon · 2020-02-03T15:41:12Z

@spazm got excited that someone reviewed my first ever open source PR, so I finished up looking over your suggestions.

I really liked your PR on my branch, so I merged that all in. Looks like it actually ended up resolving 4 of the comments you had, so thanks! I resolved all those threads.

I added issues for most of your other comments, since they were about existing behaviors. After replying with a link to the issues, I also resolved those threads.

So I guess that leaves us with the tuple stuff, @spazm is it a hard requirement to change them or should we just write not write them like that in the future?

Sorry don't mean to evade changing them cause I actually think it is a good idea that may be useful for debugging (though I still suspect it might impact performance slightly, though I haven't tested it yet). I just don't have a lot of time right now (No one does 😢, which makes me appreciate the review that much more @spazm )

spazm · 2020-02-03T18:46:12Z

So I guess that leaves us with the tuple stuff, @spazm is it a hard requirement to change them or should we just write not write them like that in the future?

I enjoyed having the discussion. I don't think changing is a requirement at all.

Thank you so much for making and linking the new issues. excellent follow-through.

I added one tiny response about adding a comment for "SET".

spazm · 2020-02-03T18:47:42Z

I give this PR a hearty LGTM (looks good to me). :LGTM: 👍

back to you @ms705

tomharmon · 2020-02-03T18:53:03Z

@spazm I enjoyed the discussion as well. I pushed a commit adding the comment and will wait to hear from malte

ms705 · 2020-02-05T18:33:15Z

LGTM! Thanks for all your efforts. I'll merge this now 🎉

ms705 · 2020-02-05T18:38:28Z

Published to crates.io as v0.0.11.

tomharmon and others added 30 commits December 31, 2019 11:19

Bump nom to v5.0.1

bfa5462

- updates nom dependency in `Cargo.toml` to v5.0.1 - find and replace `CompleteByteSlice` with `&[u8]` - update `arithmetic.rs` file to no longer have compile errors (except for internal dependencies from `common.rs`)

Partial updates to common.rs

c226dbe

Add TODO for last compile error in common.rs

72bb1dd

- Fixes all compile errors except one, which this commit also adds a TODO since that function is a bit more than I want to do today

Update compond_select.rs

d2b8b77

Fixed compile time errors in insert.rs

29bee21

Fixed compile time errors in keyword, order and select

9869b49

Update condition.rs

99519e3

Fixed compile time errors in set and update

5808b44

Update create.rs and create_table_options.rs

170533d

Fixed compile time errors in parser and common

a820898

Fixed compile time errors in more files

5cdeeaa

Replace opt_multispace with multispace0

bfacbd6

Replace set.rs macros w functions, fix compile

106fa01

- As part of upgrading to nom v5, rewrote the macros of `set.rs` to use the new complete parsing functions from nom

Rewrite update.rs to use nom v5 functions

b293a3f

- Because nom v5 macros were rewritten to use the stream version of the parser, use the complete version of the nom parsers

Update parser.rs to use nom v5 functions

d5dec81

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `parser.rs` to use the complete nom functions

Update select.rs to use nom v5 functions

8a667dd

- Since nom v5 macros were rewritten to use the streaming version, rewrote `select.rs` to use the complete version of the nom v5 functions since we want to deal with complete input only

Update order.rs to use nom v5 functions

1d29161

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `order.rs` to use the complete nom functions

Update join.rs to use nom v5 functions

32a7d1b

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `join.rs` to use the complete nom functions

Update insert.rs to use nom v5 functions

045f514

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `insert.rs` to use the complete nom functions

Update delete and drop to use nom v5 functions

86ac840

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `delete.rs` and `drop.rs` to use the complete nom functions

Use nom v5 functions for create

a4cbae3

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `create.rs` and `create_table_options.rs` to use the complete nom functions

Update condition.rs to use nom v5 functions

dafae2b

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `condition.rs` to use the complete nom functions

Update case, compound_select to nom v5

4c831bf

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `case.rs` and `compound_select.rs` to use the complete nom functions

Use shorthand for constructor args

bde5ce5

Formatting

823a971

Update arithmetic.rs to use nom v5 functions

03dbbcf

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `arithmetic.rs` to use the complete nom functions

Partial work on updating common to nom v5

9b68403

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `common.rs` to use the complete nom function

Update statement_terminator to nom v5

a6557f4

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `statement_terminator` to use the complete nom functions

Partial update to common to nom v5

8079641

- Since in nom v5 macros were rewritten to use the streaming version of the parser, and we want to do with complete inputs, rewrote the `common` to use the complete nom functions

More work on upgrading common to nom v5

5f32f9c

tomharmon requested a review from fintelia January 21, 2020 17:35

fintelia requested review from ms705 and removed request for fintelia January 25, 2020 19:12

spazm approved these changes Feb 3, 2020

View reviewed changes

spazm added 3 commits February 2, 2020 20:57

Remove unnecessary level of bracing

bef5b7a

* match is a single expression. It doesn't need to be wrapped in {} braces for the closure

Require trailing spaces after "natural" in join clause

1b38260

tomharmon added 2 commits February 3, 2020 09:26

Merge pull request #5 from spazm/upgrade-nom

ec3b281

Changes for my comments on PR #53

add back TODO comment for table options

40b1072

Add comment explaining max tuple size for alt

9fafd10

ms705 merged commit 6d1e461 into ms705:master Feb 5, 2020

tomharmon deleted the upgrade-nom branch February 5, 2020 20:51

spazm mentioned this pull request Feb 10, 2020

Upgrade to nom v5 #52

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump nom to v5 #53

Bump nom to v5 #53

tomharmon commented Jan 2, 2020 •

edited

Loading

spazm commented Jan 25, 2020

tomharmon commented Jan 25, 2020

ms705 commented Jan 25, 2020

spazm commented Jan 30, 2020

spazm commented Jan 30, 2020 •

edited

Loading

tomharmon commented Jan 30, 2020 •

edited

Loading

spazm commented Jan 31, 2020

tomharmon commented Jan 31, 2020 •

edited

Loading

ms705 commented Jan 31, 2020 •

edited

Loading

spazm left a comment

spazm commented Feb 3, 2020

tomharmon commented Feb 3, 2020 •

edited

Loading

tomharmon commented Feb 3, 2020

spazm commented Feb 3, 2020

spazm commented Feb 3, 2020

tomharmon commented Feb 3, 2020

ms705 commented Feb 5, 2020

ms705 commented Feb 5, 2020

Bump nom to v5 #53

Bump nom to v5 #53

Conversation

tomharmon commented Jan 2, 2020 • edited Loading

spazm commented Jan 25, 2020

tomharmon commented Jan 25, 2020

ms705 commented Jan 25, 2020

spazm commented Jan 30, 2020

spazm commented Jan 30, 2020 • edited Loading

tomharmon commented Jan 30, 2020 • edited Loading

spazm commented Jan 31, 2020

tomharmon commented Jan 31, 2020 • edited Loading

ms705 commented Jan 31, 2020 • edited Loading

spazm left a comment

Choose a reason for hiding this comment

spazm commented Feb 3, 2020

tomharmon commented Feb 3, 2020 • edited Loading

tomharmon commented Feb 3, 2020

spazm commented Feb 3, 2020

spazm commented Feb 3, 2020

tomharmon commented Feb 3, 2020

ms705 commented Feb 5, 2020

ms705 commented Feb 5, 2020

tomharmon commented Jan 2, 2020 •

edited

Loading

spazm commented Jan 30, 2020 •

edited

Loading

tomharmon commented Jan 30, 2020 •

edited

Loading

tomharmon commented Jan 31, 2020 •

edited

Loading

ms705 commented Jan 31, 2020 •

edited

Loading

tomharmon commented Feb 3, 2020 •

edited

Loading