Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strengthen the follow-set rule for macros #131025

Open
traviscross opened this issue Sep 29, 2024 · 2 comments
Open

Strengthen the follow-set rule for macros #131025

traviscross opened this issue Sep 29, 2024 · 2 comments
Labels
C-discussion Category: Discussion or questions that doesn't represent real issues. I-lang-nominated Nominated for discussion during a lang team meeting. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@traviscross
Copy link
Contributor

Over in:

@compiler-errors describes this general problem:

The breakage specifically represents an inherent limitation to the "macro follow-set" formulation which is supposed to make us more resilient against breakages due to extensions to the grammar like this.

Given two macro matcher arms:

  • ($ty:ty) => ...
  • (($tt:tt)*) => ...

And given tokens like:

  • & pin mut [...more tokens may follow...]

On nightly today, &pin gets parsed as a type. However, we run out of matchers but still have tokens left (the mut token is next), so we fall through to the next arm. Since it's written like ($tt:tt)*, everything is allowed, and we match the second arm successfully...

I think that's weird, because if this second arm were written like $ty:ty mut, that would be illegal, since mut is not in the follow-set of the :ty matcher. Thus, we can use :tt matchers to observe whether the compiler actually parses things not in our grammar that should otherwise be protected against, which seems pretty gross.

And @Noratrieb proposes a general solution:

I believe a solution to this would be the following new logic:

  • after the end of a macro matcher arm has been reached
  • and there are still input tokens remaining
  • and if the last part of the matcher is a metavar
  • ensure that the first remaining token is in the follow set of this metavar
  • if it is, move on to the next arm
  • if it is not, emit an error

What this semantically does is strengthen the "commit to fully matching metavars or error" behavior such that it extends past the end. I don't know how many macros rely on this, but it seems like emitting an FCW (instead of error) on such macro invocations would find all these cases and ensure that the follow-set logic is actually robust past the end. But imo this shouldn't block this PR (which should probably just ship as-is) and can be done separately.

This issue is to track the proposal for this FCW.

cc @Noratrieb @compiler-errors @eholk @rust-lang/lang

@traviscross traviscross added I-lang-nominated Nominated for discussion during a lang team meeting. T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Sep 29, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Sep 29, 2024
@traviscross traviscross added C-discussion Category: Discussion or questions that doesn't represent real issues. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Sep 29, 2024
@compiler-errors
Copy link
Member

I think that this FCW could be implemented, but also I think it's a bit of a shame we can't detect this at the macro definition site like we can for a follow-set error.

That would require having to like... construct like NDFAs for each matcher arm and compare them to detect when $tt:tt conflicts in the way I mentioned above, so I don't think it's possible or at least remotely implementable.

@traviscross
Copy link
Contributor Author

In the meeting today, we didn't quite have time to discuss this, but @nikomatsakis noted:

I don't think this proposal is sufficient but I am interested in pursuing a real fix to this for a future edition.

Example:

macro_rules! test {
    (if $x:ty { }) => {};
    (if $x:expr { }) => {};
}

This basically says to pick one arm if something is a type, another if it's an expression. Extending the type grammar to cover new cases could change which arm you go down to.

I think the most general fix is to say: when you would start parsing a fragment, first skip ahead to find the extent of it (i.e., until you see an entry from the follow-set). Then parse it as the fragment. If the parsing fails or there are unconsumed tokens, report a hard error.

I suspect it would break a lot in practice and we would need an opt-in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-discussion Category: Discussion or questions that doesn't represent real issues. I-lang-nominated Nominated for discussion during a lang team meeting. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants