-
-
Notifications
You must be signed in to change notification settings - Fork 192
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
8 changed files
with
120 additions
and
131 deletions.
There are no files selected for viewing
98 changes: 49 additions & 49 deletions
98
chapters/4-Terminology-And-Metrics/4-11 Case Study of 4 Benchmarks.md
Large diffs are not rendered by default.
Oops, something went wrong.
8 changes: 3 additions & 5 deletions
8
chapters/5-Performance-Analysis-Approaches/5-0 Performance analysis approaches.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,13 @@ | ||
|
||
|
||
# Performance Analysis Approaches {#sec:sec_PerfApproaches} | ||
|
||
When you're working on a high-level optimization, e.g., integrating a better algorithm into an application, it is usually easy to tell whether the performance improves or not since the benchmarking results are pronounced well. Big speedups, like 2x, 3x, etc., are relatively easy from performance analysis perspective. When you eliminate an extensive computation from a program, you expect to see a visible difference in the running time. | ||
|
||
But also, there are situations when you see a small change in the execution time, say 5%, and you have no clue where it's coming from. Timing or throughput measurements alone do not provide any explanation on why performance goes up or down. In this case, we need more insights about how a program executes. That is the situation when we need to do performance analysis to understand the underlying nature of the slowdown or speedup that we observe. | ||
But also, there are situations when you see a small change in the execution time, say 5%, and you have no clue where it's coming from. Timing or throughput measurements alone do not provide any explanation for why performance goes up or down. In this case, we need more insights about how a program executes. That is the situation when we need to do performance analysis to understand the underlying nature of the slowdown or speedup that we observe. | ||
|
||
Performance analysis is akin to detective work. To solve a performance mystery, you need to gather all the data that you can and try to form a hypothesis. Once a hypothesis is made, you design an experiment that will either prove or disprove it. It can go back and forth a few times before you find a clue. And just like a good detective, you try to collect as many pieces of evidence as possible to confirm or refute your hypothesis. Once you have enough clues, you make a compelling explanation for the behavior you're observing. | ||
|
||
When you just start working on a performance issue, you probably only have measurements, e.g., before and after the code change. Based on that measurements you conclude that the program became slower by `X` percent. If you know that the slowdown occurred right after a certain commit, that may already give you enough information to fix the problem. But if you don't have good reference points, then the set of possible reasons for the slowdown is endless, and you need to gather more data. One of the most popular approaches for collecting such data is to profile an application and look at the hotspots. This chapter introduces this and several other approaches for gathering data that have proven to be useful in performance engineering. | ||
When you just start working on a performance issue, you probably only have measurements, e.g., before and after the code change. Based on those measurements you conclude that the program became slower by `X` percent. If you know that the slowdown occurred right after a certain commit, that may already give you enough information to fix the problem. But if you don't have good reference points, then the set of possible reasons for the slowdown is endless, and you need to gather more data. One of the most popular approaches for collecting such data is to profile an application and look at the hotspots. This chapter introduces this and several other approaches for gathering data that have proven to be useful in performance engineering. | ||
|
||
The next question comes: "What performance data is available and how to collect it?" Both hardware and software layers of the stack have facilities to track performance events and record them while a program is running. In this context, by hardware, we mean the CPU, which executes the program, and by software, we mean the OS, libraries, the application itself, and other tools used for the analysis. Typically, the software stack provides high-level metrics like time, number of context switches, and page-faults, while CPU monitors cache misses, branch mispredictions, and other CPU-related events. Depending on the problem you are trying to solve, some metrics are more useful than others. So, it doesn't mean that hardware metrics will always give us a more precise overview of the program execution. Some metrics, like the number of context-switches, for instance, cannot be provided by a CPU. Performance analysis tools, like Linux Perf, can consume data from both the OS and the CPU. | ||
The next question comes: "What performance data is available and how to collect it?" Both hardware and software layers of the stack have facilities to track performance events and record them while a program is running. In this context, by hardware, we mean the CPU, which executes the program, and by software, we mean the OS, libraries, the application itself, and other tools used for the analysis. Typically, the software stack provides high-level metrics like time, number of context switches, and page faults, while the CPU monitors cache misses, branch mispredictions, and other CPU-related events. Depending on the problem you are trying to solve, some metrics are more useful than others. So, it doesn't mean that hardware metrics will always give us a more precise overview of the program execution. Some metrics, like the number of context switches, for instance, cannot be provided by a CPU. Performance analysis tools, like Linux Perf, can consume data from both the OS and the CPU. | ||
|
||
As you have probably guessed, there are hundreds of data sources that a performance engineer may use. Since this book is about CPU low-level performance, we will focus on collecting hardware-level information. We will introduce some of the most popular performance analysis techniques: code instrumentation, tracing, Characterization, sampling, and the Roofline model. We also discuss static performance analysis techniques and compiler optimization reports that do not involve running the actual application. |
Oops, something went wrong.