Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document or improve scripts to make perf-java-flames work with docker #50

Open
alicegoldfuss opened this issue Jan 24, 2017 · 30 comments
Open

Comments

@alicegoldfuss
Copy link

alicegoldfuss commented Jan 24, 2017

I'm trying to create a Java process FlameGraph with perf-java-flames. It seems to run successfully, but I can't find the resulting svg file.

$ ./perf-java-flames 161991 -F 99 -a -g -- sleep 30
Recording events for 15 seconds (adapt by setting PERF_RECORD_SECONDS)
Warning:
PID/TID switch overriding SYSTEM
$

CentOS 7
3.10.0-327.36.3.el7.x86_64
cmake version 2.8.12.2

Up-to-date versions of perf-map-agent and the FlameGraph repo.

@alicegoldfuss
Copy link
Author

Looks like it's only creating the .data file and not .stacks or .collapsed

perf-map-agent]# ls -la /tmp/ | grep 161991
-rw-------  1 root  root    1919352 Jan 24 19:40 perf-161991.data

@jrudolph
Copy link
Member

jrudolph commented Jan 24, 2017

Can you try without -- sleep 30? perf-java-record-stack already does -- sleep $PERF_RECORD_SECONDS.

@jrudolph
Copy link
Member

I see that this is not particularly well documented...

You could also try enabling set -x in one or several of the scripts to see where it exits. (Shell scripting is not my expertise so any help is appreciated.)

@alicegoldfuss
Copy link
Author

I'm getting the same results without sleep 30 and even a vanilla run like

$ ./bin/perf-java-flames 161991

Where is the resulting svg file supposed to turn up?

@jrudolph
Copy link
Member

In the same directory as the script is run but it should show a line with the name.

@alicegoldfuss
Copy link
Author

Yeah it's definitely not showing up there. I'm going to try to create a Java FlameGraph with the manual steps.

@nitsanw
Copy link
Member

nitsanw commented Jan 25, 2017

To eliminate some suspects:

  1. Does the perf-java-top script work for you?
  2. Which version of Java are you using?
  3. Is the Java process using the -XX:+PreserveFramePointer? This would require an OpenJDK/Oracle post 8u60 release.
  4. Can you generate normal perf flame-graphs on your setup?
    Even without perf-map-agent generating the map file you should be able to at least see the native portion of the JVM process, so getting nothing at all suggests some issue in the perf interaction. Or perhaps some incompatability with the scripts, though I've used them on CentOS 7 before and they "Just Worked"...

@alicegoldfuss
Copy link
Author

  1. It fails with bash: sudo: java: command not found even when run with root with java in the path
  2. Java is 1.8.0_102-b14
  3. Yes it's using this option.
  4. Yes I can create perf FlameGraphs using perf and the FlameGraph repo, just not with any of your tools.

@nitsanw
Copy link
Member

nitsanw commented Jan 25, 2017

Thanks! I managed to reproduce this issue locally and have a fix. I'll send a PR in a second, but it's very minor so if you can't wait you can go ahead and fix locally by applying the following:

diff --git a/bin/create-java-perf-map.sh b/bin/create-java-perf-map.sh
index 52ee75d..b297067 100755
--- a/bin/create-java-perf-map.sh
+++ b/bin/create-java-perf-map.sh
@@ -24,5 +24,5 @@ fi
 [ -d "$JAVA_HOME" ] || (echo "JAVA_HOME directory at '$JAVA_HOME' does not exist." && false)
 
 sudo rm $PERF_MAP_FILE -f
-(cd $PERF_MAP_DIR/out && sudo -u \#$TARGET_UID java -cp $ATTACH_JAR_PATH:$JAVA_HOME/lib/tools.jar net.virtualvoid.perf.AttachOnce $PID "$OPTIONS")
+(cd $PERF_MAP_DIR/out && sudo -u \#$TARGET_UID $JAVA_HOME/bin/java -cp $ATTACH_JAR_PATH:$JAVA_HOME/lib/tools.jar net.virtualvoid.perf.AttachOnce $PID "$OPTIONS")
 sudo chown root:root $PERF_MAP_FILE

Which just carries the JAVA_HOME through to the sudo command

@nitsanw
Copy link
Member

nitsanw commented Jan 25, 2017

See PR #51

@nitsanw
Copy link
Member

nitsanw commented Jan 25, 2017

Please let me know if the fix helps your ultimate goal which is to get flame graphs.
Also note that if you are aiming to collect machine wide stats for many Java processes @brendangregg has the jmaps scrips which creates a map file for all java processes which can be used as part of producing machine wide profile:
https://github.com/brendangregg/FlameGraph/blob/3da963a74a686e2caea489ba637f6afdb6d6658a/jmaps

@alicegoldfuss
Copy link
Author

I suspect this fix will still fail for me, due to a bug in Java that requires me to dump the symbols as the user of the running Java process. But I will let you know!

Also thanks for the jmaps link, but there's only one Java process on this machine.

@alicegoldfuss
Copy link
Author

alicegoldfuss commented Jan 25, 2017

I added the fix but it still fails, even when running as root. New error though:

# ./bin/perf-java-top 161991
sudo: unable to execute /home/alice/jdk1.8.0_102/bin/java: Permission denied
# ls -la /home/alice/jdk1.8.0_102/bin/java
-rwxr-xr-x 1 alice alice 7734 Jan 24 00:33 /home/alice/jdk1.8.0_102/bin/java

I get the same error when running as alice. And yes I can run java directly by calling that path.

@nitsanw
Copy link
Member

nitsanw commented Jan 26, 2017

OK... Not seen that one before. For what it's worth my java executable has the exact same permissions. Running the script as root works for me if I setup the JAVA_HOME environment variable.

We can work out an alternative, I think. The permissions game in the scripts is around 2 files, the map file and the perf.data file:

  • The perf data collection requires sudo/root. This can be mitigated by fixing up profiling security on the particular box, e.g. by executing the following:
echo 0 | sudo tee /proc/sys/kernel/kptr_restrict
echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid

This will allow users to collect perf data for running processes.

  • The map collection requires attaching to a JVM, this requires switching to the JVM process user. If you are already that user you can skip this.
    Under the assumption that the user collecting the map file is the user running the JVM you can skip the sudo requirement on this script as well, and need not worry about fixing up the map file owner/permissions.

Using the above I've setup boxes where users are allowed to perf profile their own Java processes with slightly modified scripts, essentially removing the sudo prefix everywhere and the file ownership manipulation. I've not attempted to merge these efforts back.
Reflecting on this, perhaps the issue you are seeing is because the user whose process you are trying to profile does not have permission to run you Java installation? Maybe pointing JAVA_HOME at an installation available to all users will solve the issue?

@brendangregg
Copy link

Some security enforcement preventing alice from executing things? like seccomp?

FWIW, my jmaps tool also works around the perf issue of needing the /tmp/perf*map files as owned by root.

@alicegoldfuss
Copy link
Author

Apologies for going dark. I've been digging into this issue with the manual commands.

The issue comes down to containers and namespacing. The Java process I'm trying to profile is running inside a container. The process is owned by a user inside the container, but only has a UID exposed to the host. Even spoofing a user with that UID on the host doesn't work when trying to dump symbols. And I can't do the profiling inside the container, because perf isn't installed and the version of Ubuntu running inside the container is too new for the underlying host kernel to have a supported perf package.

My planned workaround (which I haven't verified, but I believe will work) is:

  1. Drop the symbols from inside the container.
  2. Get the resulting perf-pid.map onto the underlying host via a mounted volume.
  3. Change the perf-pid.map filename to match the Java process PID as seen by the host (and chown to root).
  4. Run the perf and FlameGraph scripts on the host, using the renamed perf-pid.map file.

I think this will give me what I want.

@nitsanw
Copy link
Member

nitsanw commented Jan 27, 2017

@alicegoldfuss Thanks for sharing your use case in more detail. I bow to your Linux Fu powers, sounds like you are on your way to cracking it, when you do please share the details. It would perhaps help to add this to the wiki.
I have not looked much into it, but this project:
https://github.com/chbatey/docker-jvm-flamegraphs
By @chbatey aims to demo a solution to what seems like a similar challenge.

@nitsanw
Copy link
Member

nitsanw commented Jan 27, 2017

And the relevant blog post to go with the repo:
http://batey.info/docker-jvm-flamegraphs.html

@alicegoldfuss
Copy link
Author

Ah, looks like this person has come to the same conclusion as me! That's comforting :)

@nitsanw
Copy link
Member

nitsanw commented Jan 27, 2017

@alicegoldfuss if nothing else I at least hope I've introduced you to the right person :-)

@brendangregg
Copy link

Ah, right, containers and perf. I've been meaning to post a blog post too -- we've all probably been working on the same problem. :)

Christopher's post is good, but he needs to let the JVM warm up a bit more -- too many "Interpreter" frames -- they haven't hit CompileThreshold yet.

@jrudolph
Copy link
Member

Ah, this is about containers. I also tried to get it working but only half-hearted. I would also be interested in getting this to work. Thanks for having the discussion here and the extra links, @alicegoldfuss, @nitsanw, and @brendangregg.

@alicegoldfuss
Copy link
Author

My workaround worked!

I'm going to dance to something and then document what I did.

screen shot 2017-01-27 at 11 13 07 am

@alicegoldfuss
Copy link
Author

Turned it into a blog post. Thanks everyone for your help: http://blog.alicegoldfuss.com/making-flamegraphs-with-containerized-java/

jrudolph added a commit that referenced this issue Feb 26, 2017
Fix #50 by carrying through observed JAVA_HOME to the sudo
@jrudolph jrudolph reopened this Feb 26, 2017
@jrudolph
Copy link
Member

Thanks a lot, @alicegoldfuss for documenting your findings!

@jrudolph jrudolph changed the title perf-java-flames fails silently? Document or improve scripts to make perf-java-flames work with docker Feb 26, 2017
@bobrik
Copy link

bobrik commented May 16, 2018

I added transparent support for containers in jmaps: brendangregg/FlameGraph#171.

@bobrik
Copy link

bobrik commented May 16, 2018

To add to the "Why?" section of blog post by @alicegoldfuss, the reason seems to be this:

    // "/tmp" is used as a global well-known location for the files
    // .java_pid<pid>. and .attach_pid<pid>. It is important that this
    // location is the same for all processes, otherwise the tools
    // will not be able to find all Hotspot processes.
    // Any changes to this needs to be synchronized with HotSpot.

@goldshtn
Copy link

goldshtn commented May 16, 2018 via email

@bobrik
Copy link

bobrik commented May 18, 2018

@goldshtn I replied to your comment in the PR. Let's have PR related comments there.

@jrudolph
Copy link
Member

I managed to run the attach script from the host namespace. I haven't properly integrated that as it needs some hacking of internals from tools.jar because as you say above the attach mechanism relies on well-known paths shared between attach and target JVM. Right now it will only work if the target process has PID 1 in the container.

See jvm-profiling-tools/perf-map-agent/compare/jr/attach-to-container-from-host

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants