Skip to content
This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

Error generating CPG #51

Open
anonymitycoder2 opened this issue Dec 3, 2023 · 17 comments
Open

Error generating CPG #51

anonymitycoder2 opened this issue Dec 3, 2023 · 17 comments

Comments

@anonymitycoder2
Copy link

Why does the progress stop at 20% when I generate a CPG?
微信截图_20231203211021

@prabhu
Copy link
Contributor

prabhu commented Dec 4, 2023

@anonymitycoder2 joern project has some known bugs and performance issues with c/c++. Can you use our forks atom and chen instead?

https://github.com/AppThreat/atom (comparable to joern cli v2+)

# Download from releases https://github.com/AppThreat/atom/releases - java 21 is better
atom.sh -o app.atom -l c .
# Create an atom with data flows
atom.sh -o app.atom -l c . --with-data-deps

To query the atom you can use chen (fork of joern with enhancements)

https://github.com/AppThreat/chen

importAtom("/path/to/atom file")

Let me know how it goes.

@anonymitycoder2
Copy link
Author

Joern Project 在 C/C++ 方面存在一些已知的错误和性能问题。你能用我们的分叉 atom 和 chen 代替吗?

https://github.com/AppThreat/atom(可与 Joern CLI v2+ 相媲美)

# Download from releases https://github.com/AppThreat/atom/releases - java 21 is better
atom.sh -o app.atom -l c .
# Create an atom with data flows
atom.sh -o app.atom -l c . --with-data-deps

要查询原子,您可以使用 chen (具有增强功能的 joern 的分支)

https://github.com/AppThreat/chen

importAtom("/path/to/atom file")

让我知道它是怎么回事。

Thank you for your reply
I seem to have generated some data using cpggen
cpggen -i dataset -o cpggen-out
image
How can I use this data to generate various graph structures and slices for each c file under the dataset file, just like joner did.
The file I generated using the relevant command was empty
If you can help me solve this problem, I would be immensely grateful.

@prabhu
Copy link
Contributor

prabhu commented Dec 10, 2023

atom has dedicated commands for slices generation.

https://github.com/AppThreat/atom/tree/main#create-usages-slice-for-a-java-project

The repotests in atom have invocations for other languages.

@anonymitycoder2
Copy link
Author

atom has dedicated commands for slices generation.

https://github.com/AppThreat/atom/tree/main#create-usages-slice-for-a-java-project

The repotests in atom have invocations for other languages.

atom has dedicated commands for slices generation.

https://github.com/AppThreat/atom/tree/main#create-usages-slice-for-a-java-project

The repotests in atom have invocations for other languages.
thank you for your reply! I generated some slice data using the atom you provided,It seems that all c files in the dataset directory are sliced into a json file.
image
Is this the result of slicing? Can atom generate the corresponding graph structure and slice data for each c file in the dataset directory, just like joern did? Now it seems that all c files are sliced into a json.
Looking forward to your reply.

@prabhu
Copy link
Contributor

prabhu commented Dec 11, 2023

@anonymitycoder2, yes, slices are a single file for all source code. I didn't know joern was generating one file per source for slicing. If you mean export of CPG and DDG, we're happy to add that command.

@anonymitycoder2
Copy link
Author

@anonymitycoder2, yes, slices are a single file for all source code. I didn't know joern was generating one file per source for slicing. If you mean export of CPG and DDG, we're happy to add that command.

It would be great if you could do this. Thank you for your patient reply. It really helped me a lot.

@prabhu
Copy link
Contributor

prabhu commented Dec 13, 2023

@anonymitycoder2 could you kindly review the below PR, which adds export to graphml?

AppThreat/atom#101

atom -o app.atom -l java --export-atom --export-dir <export dir> <path to application>

@anonymitycoder2
Copy link
Author

@anonymitycoder2 could you kindly review the below PR, which adds export to graphml?

AppThreat/atom#101

atom -o app.atom -l java --export-atom --export-dir <export dir> <path to application>

Thank you for your reply,I ran the program according to the PR you provided, but the graphml file was not successfully exported.
Some errors occurred,Is it caused by the versions of python and jdk?
image
There are some other errors reported
image
image
Looking forward to your reply

@prabhu
Copy link
Contributor

prabhu commented Dec 14, 2023

@anonymitycoder2, interesting! Please share the full exception trace since I want to know which line is looking for Python, which must remain an optional dependency for non-ml users.

@prabhu
Copy link
Contributor

prabhu commented Dec 14, 2023

@anonymitycoder2 Could you retest with the latest from that branch?

@anonymitycoder2
Copy link
Author

atom -o app.atom -l java --export-atom --export-dir <export dir> <path to application>

Thanks for your reply, I generated some graphml files after testing using the latest version of the branch.
image
However, the number and file names of the generated graphml do not correspond to the java files in the dataset.

Can atom generate a grapgml file corresponding to the file name for each java file? It would be better if atom could specify the type of graph to output, such as ast, cpg, pdg

@prabhu
Copy link
Contributor

prabhu commented Dec 15, 2023

@anonymitycoder2, thank you for trying the branch. I have pushed an update to atom:

  • to include the filename
  • add support for dot format
  • not include DDG by default (can be included with --with-data-deps)

Regarding support for all individual representations, we do not have any enterprise users with such a request, so it is not a priority yet. We also aim to keep atom lightweight for easy CI/CD use cases. Hope this helps.

@anonymitycoder2
Copy link
Author

@anonymitycoder2, thank you for trying the branch. I have pushed an update to atom:

  • to include the filename
  • add support for dot format
  • not include DDG by default (can be included with --with-data-deps)

Regarding support for all individual representations, we do not have any enterprise users with such a request, so it is not a priority yet. We also aim to keep atom lightweight for easy CI/CD use cases. Hope this helps.

Thanks for your help, it really helped me a lot. If there is a chance in the future, I will introduce atoms in my paper to help promote it. Thanks again from the bottom of my heart

@prabhu
Copy link
Contributor

prabhu commented Dec 15, 2023

@anonymitycoder2, you used the magic word paper. Let me find a way to do this without affecting the size.

@prabhu
Copy link
Contributor

prabhu commented Dec 21, 2023

@anonymitycoder2 atom 1.8.0 was released with three individual representations exported automatically in dot format. AST, CDG, CFG. Four files would be created per method in total, with the 4th comparable to CPG since it would include all representations, including DDG and PDG. I hope this helps.

https://github.com/AppThreat/atom/releases/tag/v1.8.0

@anonymitycoder2
Copy link
Author

@anonymitycoder2 atom 1.8.0 was released with three individual representations exported automatically in dot format. AST, CDG, CFG. Four files would be created per method in total, with the 4th comparable to CPG since it would include all representations, including DDG and PDG. I hope this helps.

https://github.com/AppThreat/atom/releases/tag/v1.8.0

Thank you so much! This is really helpful for my work,I used the latest version of atom to generate a comprehensive dot file and dot files of ast, cfg, dfg. However, this comprehensive dot file is different from the cpg14 exported by joern. Atom seems to have more additional information about edges and nodes. The cpg14 type code graph structure exported by joern is very popular in the field of code representation learning, and it is what I want to generate. Thank you again for your help, atom is excellent, I will recommend it to my friends who are engaged in related research

@prabhu
Copy link
Contributor

prabhu commented Dec 22, 2023

Thanks, @anonymitycoder2, for your kind words! As you figured, we call atom version 2 since we need that additional information to perform type inference and package URL inference.

Below are a couple of screenshots that show these inferences in action. Not only do we know the type, but we even know the precise dependency they must have come from for a few languages.

4
2

I am looking forward to the new generation of research unlocked by atom and chen.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants