-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Java-CPG language module #1659
base: develop
Are you sure you want to change the base?
Conversation
Content: - A new CPG language frontend for JPlag - An interface to transform submissions into CPGs - An interface to transform CPGs into token lists - A Graph Transformation Engine (to be extended) . interfaces representing node and graph patterns, matches of these patterns, transformations . an isomorphism detector . a transformation algorithm - Some graph transformations (to be extended)
- implemented multi-root graph patterns - implemented searching for "all matches at once" - GraphOperations should leave EOG intact - implemented new kinds of edges and properties for graph patterns - tokenization works well - implemented DFG sort pass -- this requires specialized treatment for all kinds of language features. Surely the considered feature set is incomplete.
It was designed for local use.
Add many comments and put a file in the 'passes' package. When JavaDoc cannot find a Java file in there, it quits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of pointers concerning your code style. I only added the comments for one occurrence each. It would be great if you could fix those things to make the style more consistent with the rest of the code.
languages/java-cpg/src/main/java/de/jplag/java_cpg/JavaCpgLanguage.java
Outdated
Show resolved
Hide resolved
* A ATransformationPass is an abstract transformation pass. All transformation passes function the same way, but need | ||
* to be separate classes to work with the CPG pipeline. | ||
*/ | ||
abstract class ATransformationPass(ctx: TranslationContext) : TranslationResultPass(ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not use Kotlin, as using more languages complicates the process of compiling JPlag and make the code harder to understand for new contributors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of the CPG library requires that Kotlin is used, any subclass of Pass written in Java leads to an exception, unfortunately. I could ask the authors to change that, if that is an absolute requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the exception? Do you remember this?
languages/java-cpg/src/main/java/de/jplag/java_cpg/token/IVisitorExitor.java
Outdated
Show resolved
Hide resolved
languages/java-cpg/src/main/java/de/jplag/java_cpg/transformation/Role.java
Show resolved
Hide resolved
This comment was marked as outdated.
This comment was marked as outdated.
Please also add the language to the report viewer here: https://github.com/jplag/JPlag/blob/develop/report-viewer/src/model/Language.ts Also set a language for highlighting here: https://github.com/jplag/JPlag/blob/develop/report-viewer/src/utils/CodeHighlighter.ts |
This comment was marked as outdated.
This comment was marked as outdated.
Quality Gate passed for 'JPlag Plagiarism Detector'Issues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not forget to add a language module readme. Especially to explain the differences to the other Java modules. It should probably both include information for users and for developers due to the complexity of the PR.
On another note: We should just include Kotlin in the spotless config: https://github.com/diffplug/spotless/tree/main/plugin-maven#kotlin. We can add the config directly, but I would wait to apply the formatting after the reviews are done and the feedback addressed.
Finally, some questions for our next meeting:
This is a LOT of code, I initially hoped more functionality would be provided by the API itself. In your opinion, what is the main cause for the size of the PR? And why does the CPG API require so much boilerplate code?
clearTransformations(); | ||
setReorderingEnabled(false); | ||
} | ||
// TokenizationPass sets tokenList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field tokenList
is set to null
a few lines before. I thought I'd include that comment to clarify that the tokenList
is not supposed to still be null
when it is returned in line 49, but it is set by the TokenizationPass
via the method CpgAdapter.setTokenList(List<Token>)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay, this was not clear to me, thanks.
} catch (InterruptedException e) { | ||
throw e; | ||
} catch (ExecutionException | ConfigurationException e) { | ||
throw new ParsingException(List.copyOf(files).getFirst(), e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the copy when only one element is passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because any one element can not be gotten from the Set<File>
files
. I agree that this is not very elegant.
Maybe replace it with files.stream().findAny().orElse(null)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use files.iterator().next();
instead.
/** | ||
* This is a wrapper for a graph edge (with a 1:1 relation). | ||
* @param <T> The type of the source node | ||
* @param <R> The type of the related node | ||
*/ | ||
public class CpgEdge<T extends Node, R extends Node> extends AEdge<T, R> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear to me: Why are we wrapping the AEdge which is just a base class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you refer to IEdge vs. AEdge:
IEdge existed before AEdge, and I did not realize that it could actually just fully replace it. I'll remove the IEdge interface, then.
If you refer to the phrase "wrapper for a graph edge", then let me explain it in a bit more detail.
To summarize: The Edge classes make up for the fact that in the CPG library, edges are represented inconsistently.
Either:
- the edge target node is a field of the source (e.g., an IfStatement node has the fields condition, thenStatement, elseStatement); the AST edge itself is not modeled
--- or, if it is a 1:n relation ---- - that relation is represented as a list of nodes (e.g. a
FunctionType
has the fieldparameters
) - that relation is represented as a list of edges (e.g. a
Block
node has the fieldstatementEdges
)
The edge class of the CPG library is called PropertyEdge
.
For the transformations, I need to get and the target(s) of edges and eventually set them to new values. My Edge
classes save the appropriate getter and setter functions for all three cases.
In the class de.jplag.java_cpg.transformation.matching.edges.Edges
, you can see all the kinds of edges I use, and how I save the appropriate getter and setter functions from the node classes in them.
import de.jplag.java_cpg.transformation.matching.pattern.NodePattern; | ||
|
||
/** | ||
* This is a wrapper for a graph edge (1:n relation). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate here?
/** | ||
* This serves as an interface to wrap any kind of {@link PropertyEdge}. | ||
* @param <T> the source node type | ||
* @param <R> the related node type | ||
*/ | ||
public interface IEdge<T extends Node, R extends Node> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I am kind of lost in the edge hierarchy.
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, very large class, with many nested classes. In the long term, maintaining this language module will be very tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move them out, that's not a problem.
import de.jplag.java_cpg.transformation.matching.pattern.Match; | ||
import de.jplag.java_cpg.transformation.matching.pattern.NodePattern; | ||
|
||
public abstract sealed class Relation<T extends Node, R extends Node, V> permits OneToNRelation, RelatedNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JDoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have a lock at the things i marked, I've not checked tests
*/ | ||
@MetaInfServices(de.jplag.Language.class) | ||
public class JavaCpgLanguage implements Language { | ||
private static final int DEFAULT_MINIMUM_TOKEN_MATCH = 9; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could take the dafault mtm of the java module instead of setting it to 9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is possible, but it would require a module dependency on the Java frontend.
Do you think that would be justified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that would not be justified.
node: Statement, | ||
essentialNodes: MutableList<Statement>, | ||
): Boolean { | ||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this is necessary? Or can it be formatted so it doesn't span 10 lines?
languages/java-cpg/src/main/java/de/jplag/java_cpg/passes/JTokenizationPass.java
Show resolved
Hide resolved
...ages/java-cpg/src/main/java/de/jplag/java_cpg/transformation/operations/InsertOperation.java
Show resolved
Hide resolved
...ages/java-cpg/src/main/java/de/jplag/java_cpg/transformation/operations/RemoveOperation.java
Show resolved
Hide resolved
* @param astRoot the root of the sub-AST | ||
* @return the EOG {@link SubgraphWalker.Border} of the AST | ||
*/ | ||
public static SubgraphWalker.Border getEogBorders(Node astRoot) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does EOG stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the Evaluation Order Graph, a concept introduced by the authors of the CPG library, where the term is also often used. The EOG is a type of Control Flow Graph, but instead of blocks, it operates on individual Nodes.
For example, in a binary operation expression, the left operand is evaluated first (except if it is an assignment), then the right operand is evaluated, and then the operation itself. An EOG of a binary operation expression would contain a directed edge from the left operand node to the right operand node and from the right operand to the operation node to reflect that. In control statements, the EOG may split.
In the CPG library, the EOG is the basis for complex name analysis, data flow analysis, and other types of analyses.
I use the EOG as the basis for CPG Linearization.
This Draft PR contains the new "Java Code Property Graph" module.
It uses a CPG library and graph transformations to normalise substructures of the code prior to tokenization, which should lead to a higher resilience against refactoring attacks.
It also serves to fix SonarCloud errors before a proper PR is opened.