-
Notifications
You must be signed in to change notification settings - Fork 3.9k
/
nfsslower_example.txt
158 lines (136 loc) · 7.68 KB
/
nfsslower_example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
Demonstrations of nfsslower, the Linux eBPF/bcc version.
nfsslower show NFS reads, writes, opens and getattrs, slower than a
threshold. For example:
./nfsslower.py
Tracing NFS operations that are slower than 10 ms
TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
11:25:16 dd 21295 W 1048576 15360 14.84 1.test
11:25:16 dd 21295 W 1048576 16384 12.73 1.test
11:25:16 dd 21295 W 1048576 17408 24.27 1.test
11:25:16 dd 21295 W 1048576 18432 22.93 1.test
11:25:16 dd 21295 W 1048576 19456 14.65 1.test
11:25:16 dd 21295 W 1048576 20480 12.58 1.test
11:25:16 dd 21297 W 1048576 6144 10.50 1.test.w
11:25:16 dd 21297 W 1048576 7168 16.65 1.test.w
11:25:16 dd 21297 W 1048576 8192 13.01 1.test.w
11:25:16 dd 21297 W 1048576 9216 14.06 1.test.w
This shows NFS writes from dd each 1MB in size to 2 different files. The
writes all had latency higher than 10ms.
This "latency" is measured from when the operation was issued from the VFS
interface to the file system, to when it completed. This spans everything:
RPC latency, network latency, file system CPU cycles, file system locks, run
queue latency, etc. This is a better measure of the latency suffered by
applications reading from a NFS share and can better expose problems
experienced by NFS clients.
Note that this only traces the common NFS operations (read,write,open and
getattr). I chose to include getattr as a significant percentage of NFS
traffic end up being getattr calls and are a good indicator of problems
with an NFS server.
The threshold can be provided as an argument. E.g. I/O slower than 1 ms:
./nfsslower.py 1
Tracing NFS operations that are slower than 1 ms
TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
11:40:16 cp 21583 R 131072 0 4.35 1.test
11:40:16 cp 21583 R 131072 256 1.87 1.test
11:40:16 cp 21583 R 131072 384 2.99 1.test
11:40:16 cp 21583 R 131072 512 4.19 1.test
11:40:16 cp 21583 R 131072 640 4.25 1.test
11:40:16 cp 21583 R 131072 768 4.65 1.test
11:40:16 cp 21583 R 131072 1280 1.08 1.test
11:40:16 cp 21583 R 131072 1408 3.29 1.test
11:40:16 cp 21583 R 131072 1792 3.12 1.test
11:40:16 cp 21583 R 131072 3712 3.55 1.test
11:40:16 cp 21583 R 131072 3840 1.12 1.test
11:40:16 cp 21583 R 131072 4096 3.23 1.test
11:40:16 cp 21583 R 131072 4224 2.73 1.test
11:40:16 cp 21583 R 131072 4352 2.73 1.test
11:40:16 cp 21583 R 131072 4480 6.09 1.test
11:40:16 cp 21583 R 131072 5120 4.40 1.test
[...]
This shows all NFS_READS that were more than 1ms. Depending on your
latency to your fileserver, you might need to tweak this value to
remove
A threshold of 0 will trace all operations. Warning: the output will be
verbose, as it will include all file system cache hits.
./nfsslower.py 0
Tracing NFS operations
11:56:50 dd 21852 W 1048576 0 0.42 1.test
11:56:50 dd 21852 W 1048576 1024 0.46 1.test
11:56:50 dd 21852 W 1048576 2048 0.36 1.test
11:56:50 cp 21854 G 0 0 0.35 1.test
11:56:50 cp 21854 O 0 0 0.33 1.test
11:56:50 cp 21854 G 0 0 0.00 1.test
11:56:50 cp 21854 R 131072 0 0.07 1.test
11:56:50 cp 21854 R 131072 128 0.02 1.test
11:56:50 cp 21854 R 131072 256 0.02 1.test
11:56:50 cp 21854 R 131072 384 0.02 1.test
11:56:50 cp 21854 R 131072 512 0.02 1.test
11:56:50 cp 21854 R 131072 640 0.02 1.test
11:56:50 cp 21854 R 131072 768 0.02 1.test
11:56:50 cp 21854 R 131072 896 0.02 1.test
11:56:50 cp 21854 R 131072 1024 0.02 1.test
11:56:50 cp 21854 R 131072 1152 0.02 1.test
11:56:50 cp 21854 R 131072 1280 0.02 1.test
11:56:50 cp 21854 R 131072 1408 0.02 1.test
11:56:50 cp 21854 R 131072 1536 0.02 1.test
11:56:50 cp 21854 R 131072 1664 0.02 1.test
11:56:50 cp 21854 R 131072 1792 0.02 1.test
11:56:50 cp 21854 R 131072 1920 0.02 1.test
11:56:50 cp 21854 R 131072 2048 0.02 1.test
11:56:50 cp 21854 R 131072 2176 0.04 1.test
11:56:50 cp 21854 R 131072 2304 0.02 1.test
11:56:50 cp 21854 R 131072 2432 0.03 1.test
11:56:50 cp 21854 R 131072 2560 0.03 1.test
11:56:50 cp 21854 R 131072 2688 0.02 1.test
11:56:50 cp 21854 R 131072 2816 0.03 1.test
11:56:50 cp 21854 R 131072 2944 0.02 1.test
11:56:50 cp 21854 R 0 3072 0.00 1.test
11:56:50 ls 21855 G 0 0 0.00 1.test
11:56:50 ls 21856 G 0 0 0.36 music
11:56:50 ls 21856 G 0 0 0.00 music
11:56:50 ls 21856 G 0 0 0.00 test
11:56:50 ls 21856 G 0 0 0.00 ff
11:56:50 ls 21856 G 0 0 0.00 34.log
11:56:50 ls 21856 G 0 0 0.00 vmlinuz-linux
11:56:50 ls 21856 G 0 0 0.00 2.test
11:56:50 ls 21856 G 0 0 0.00 rt.log
11:56:50 ls 21856 G 0 0 0.00 1.lod
11:56:50 ls 21856 G 0 0 0.00 COPYRIGHT.txt
11:56:50 ls 21856 G 0 0 0.00 gg
11:56:50 ls 21856 G 0 0 0.00 qw.log
11:56:50 ls 21856 G 0 0 0.00 README.md
11:56:50 ls 21856 G 0 0 0.00 1.log
The output now includes open operations ("O"), and reads ("R") wand getattrs ("G").
A cp operation
A -j option will print just the fields (parsable output, csv):
./nfsslower.py -j 0
ENDTIME_us,TASK,PID,TYPE,BYTES,OFFSET_b,LATENCY_us,FILE
87054476520,dd,22754,W,1048576,0,425,1.test
87054482916,dd,22754,W,1048576,1048576,320,1.test
87054488179,dd,22754,W,1048576,2097152,389,1.test
87054511340,cp,22756,G,0,0,371,1.test
87054511685,cp,22756,O,0,0,306,1.test
87054511700,cp,22756,G,0,0,2,1.test
87054512325,cp,22756,R,131072,0,56,1.test
87054512432,cp,22756,R,131072,131072,22,1.test
87054512520,cp,22756,R,131072,262144,32,1.test
87054512600,cp,22756,R,131072,393216,21,1.test
87054512678,cp,22756,R,131072,524288,21,1.test
87054512803,cp,22756,R,131072,655360,56,1.test
This may be useful for visualizing with another tool, for example, for
producing a scatter plot of ENDTIME vs LATENCY, to look for time-based
patterns.
USAGE message:
usage: nfsslower.py [-h] [-j] [-p PID] [min_ms]
Trace READ, WRITE, OPEN and GETATTR NFS calls slower than a threshold,supports NFSv{3,4}
positional arguments:
min_ms Minimum IO duration to trace in ms (default=10ms)
optional arguments:
-h, --help show this help message and exit
-j, --csv just print fields: comma-separated values
-p PID, --pid PID Trace this pid only
./nfsslower # trace operations slower than 10ms
./nfsslower 1 # trace operations slower than 1ms
./nfsslower -j 1 # ... 1 ms, parsable output (csv)
./nfsslower 0 # trace all nfs operations
./nfsslower -p 121 # trace pid 121 only