Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[x2cpg] Unification of Lambda Naming #3831

Merged
merged 4 commits into from
Nov 24, 2023

Conversation

DavidBakerEffendi
Copy link
Collaborator

  • Added io.joern.x2cpg.AstCreatorBase.nextClosureName to generate names for closures/lambdas/anonymous functions.
  • Chose the naming scheme following Kotlin/Python using <lambda>0, <lambda>1, etc. due it low likelihood of collisions with real source code method naming schemes, and it doesn't include special regex characters.
  • Replaced naming conventions with this unified one in:
    • c2cpg
    • kotlin2cpg
    • javasrc2cpg
    • jssrc2cpg
    • php2cpg
    • pysrc2cpg

The result is that all lambdas in the CPG now share the same naming scheme.

Resolves #3792

* Added `io.joern.x2cpg.AstCreatorBase.nextClosureName` to generate names for closures/lambdas/anonymous functions.
* Chose the naming scheme following Kotlin/Python using `<lambda>0`, `<lambda>1`, etc. due it low likelihood of collisions with real source code method naming schemes, and it doesn't include special regex characters.
* Replaced naming conventions with this unified one in:
  - c2cpg
  - kotlin2cpg
  - javasrc2cpg
  - jssrc2cpg
  - php2cpg
  - pysrc2cpg

The result is that all lambdas in the CPG now share the same naming scheme.

Resolves #3792
@DavidBakerEffendi DavidBakerEffendi added frontends Relates to the shared x2cpg class consistency Concerns abstracting and adding tooling to unify duplicated or inconsistent code labels Nov 16, 2023
Copy link
Contributor

@max-leuthaeuser max-leuthaeuser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait.. I guess with this approach the names are not stable across multiple runs/files in parallel.
Did you check this? Maybe I am wrong here.

@johannescoetzee
Copy link
Contributor

johannescoetzee commented Nov 16, 2023

Wait.. I guess with this approach the names are not stable across multiple runs/files in parallel. Did you check this? Maybe I am wrong here.

That's a good point. It looks like each AstCreator instance uses a separate keypool which should deal with cross-file consistency (this is similar to what javasrc2cpg was doing before without issues that I've noticed), but it's worth confirming that it is stable.

@max-leuthaeuser max-leuthaeuser deleted the dave/x2cpg/closure-naming branch November 16, 2023 17:06
@johannescoetzee johannescoetzee restored the dave/x2cpg/closure-naming branch November 16, 2023 17:07
@johannescoetzee
Copy link
Contributor

Apologies for the close. I fumbled and accidentally hit "Close with comment" instead of just comment

@DavidBakerEffendi
Copy link
Collaborator Author

DavidBakerEffendi commented Nov 16, 2023

Yeah, it's bound to AstCreator so it'll restart at 0 for each file and as long as there is no concurrency within that task-level then it should be fine. A similar pattern was followed in Kotlin and Java with the key-pools.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@max-leuthaeuser
Copy link
Contributor

max-leuthaeuser commented Nov 16, 2023

@maltek FYI: when this goes live (i.e., jssrc2cpg-internal is also updated) the sptests expectations for js2cpg and jssrc2cpg will differ a bit more from each other.
Also, I am not sure if we have some special handling for the lambda names affected here within CS JS passes.

@maltek
Copy link
Contributor

maltek commented Nov 16, 2023

Not just sptests, that's fine. (Nobody cares about the method names in those...)

This will have some customer impact for us since stable method full names are a requirement for tracking findings across different scans. @bbrehm you're the fingerprint guy - will this only affect findings where the source or sink is a lambda, or also any findings where it's anywhere on the path? (I think most of those frontends aren't considered GA yet, so we might still be able to get away with such a change. But it requires a discussion.)


I don't particularly like that this is per-file now everywhere. At least jssrc had a per-method counter for this, which means that a code change of a lambda only affected stability of the fullnames within that method - the fullnames in the rest of that file staying as they were before. Ideally, we'd move all frontends in that direction instead of the opposite one.

@DavidBakerEffendi
Copy link
Collaborator Author

Alternatively, we could add a modifier like CLOSURE to unify this "special" type of method, similar to #3826? The main idea of this issue and PR is to really be able to easily separate this from other methods without considering every unique frontend's naming scheme.

Either solution works for me, the modifier direction may be less intrusive.

@DavidBakerEffendi
Copy link
Collaborator Author

This is not a pressing issue, it's more of a larger project towards cleaning up. Lambda and module-defining methods seem to follow different patterns in each frontend. For the CPG to be more of a uniform abstraction, I'd say that these structures should either share the same naming conventions or have a defining property.

@maltek
Copy link
Contributor

maltek commented Nov 17, 2023

for the use-case of finding all lambdas, a modifier feels cleaner to me anyway than a regex search 👍 (Though I would bikeshed the naming a bit - if it's about unnamed functions, I would go for something like ANONYMOUS or LAMBDA. After all, named functions can also be closures / close over variables from an outer scope.)

@maltek
Copy link
Contributor

maltek commented Nov 17, 2023

It still would be nice if we could unify these names, just to have things cleaner... with fullnames that's just breaking a deep assumption on the qwiet side :(

@DavidBakerEffendi
Copy link
Collaborator Author

@maltek Let me keep this PR open for a while and prioritize the modifier then. That way, we can migrate to checking the modifier to retrieve these first.

@DavidBakerEffendi
Copy link
Collaborator Author

ShiftLeftSecurity/codepropertygraph#1746

@DavidBakerEffendi
Copy link
Collaborator Author

Related PR #3842

@DavidBakerEffendi
Copy link
Collaborator Author

I'd like to merge this at some point tomorrow to conclude the unification of lambdas in the CPG. Then this week can conclude the uniform representation of lambdas in the CPG.

@DavidBakerEffendi DavidBakerEffendi merged commit 7f8ed1f into master Nov 24, 2023
@DavidBakerEffendi DavidBakerEffendi deleted the dave/x2cpg/closure-naming branch November 24, 2023 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consistency Concerns abstracting and adding tooling to unify duplicated or inconsistent code frontends Relates to the shared x2cpg class
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent Closure Naming & Identification
4 participants