-
Notifications
You must be signed in to change notification settings - Fork 0
/
perfmonitor.go
215 lines (189 loc) · 6.68 KB
/
perfmonitor.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
/*
File : $File: //depot/stillwater-sc/perfmonitor/perfmonitor.go $
Authors : E. Theodore L. Omtzigt
Date : 5 May 2016
Source Control Information:
Version : $Revision: #1 $
Latest : $Date: 2016/05/05 $
Location : $Id: //depot/stillwater-sc/perfmonitor/perfmonitor.go#1 $
Organization:
Stillwater Supercomputing, Inc.
P.O Box 720
South Freeport, ME 04078-0720
Copyright (c) 2006-2016 E. Theodore L. Omtzigt. All rights reserved.
Licence : Stillwater license as defined in this directory
The PerfMonitor is an infrastructure that tracks Operational Analysis attributes
of resources, and time series sequences of transactions. The goal of the PerfMonitor
is to deliver performance metrics for quality assurance and regression testing.
The transaction monitoring must be capable of tracing the update sequences of a
system of recurrence equations. Domain flow algorithms can be viewed as injecting data
elements into domains of computation that evolve intermediate values till they are
ejected back into memory, or another domain. For understanding the dynamics and
functional validity of a domain flow algorithm, tracing these evolutions is important
to understand functional bugs.
The general structure of a Domain Flow Algorithm is:
input ((i,j,k) | l <= i,j <= u, k = c) {
// injection of a memory data structure into a Domain of Computation
a(i,j,k) = A(i,j)
b(k,j,i) = B(i,j)
}
compute ((i,j,k) | l <= i,j,k <= u) {
// evolution of a computation inside a Domain of Computation
a(i,j,k) = f(a(i,j,k-1))
b(i,j,k) = g(b(i-1,j,k))
}
output ((i,j,k) | l <= i,j <= u, k = d) {
// ejection of a memory data structure from a Domain of Computation
Aprime(i,j) = a(i,j,k)
}
The transaction tracing would tag an input data element from the input() domain, for example, A(1,1).
Then trace the evolution of the recurrence; a(i,j,k) = f(a(i,j,k-1)) through the domain.
The transaction traces generated would look like this:
A(1,1) (t0,R_a,v1) (t1,R_b,v2) (t2,R_c,v3) (t3,R_d,v4) ... etc.
A(1,2) (t0,R_w,v1) (t1,R_x,v2) (t2,R_y,v3) (t3,R_z,v4) ... etc.
A(2,1) (t0,R_a,v1) (t1,R_b,v2) (t2,R_c,v3) (t3,R_d,v4) ... etc.
t0, t1, t2, etc. are time stamps in nanoseconds.
R_a, R_b, R_c, etc. are resource identifiers
v1, v2, v3, etc. are the values of the intermediate values of the recurrence equation.
We would like to be able to leverage the InfluxDB time series database for data
storage, management, and queries, so we are using the same line protocol as that
database. This would allow us to replay the events captured in the PerfMonitor
and send them to an InfluxDB instance. We have an in-memory PerfMonitor in the
simulator because of simulator performance requirements. InfluxDB replays would only be
required for inspection and debug. Most of the time, the low level operational
analysis attributes will be enough to validate results, and support regression testing.
The InfluxDB line protocol is structured as follows:
{
"database": "foo",
"retentionPolicy": "bar",
"points": [
{
"name": "measurement",
"tags": {
"host": "server01",
"region": "us-east1",
"tag1": "value1",
"tag2": "value2",
"tag2": "value3",
"tag4": "value4",
"tag5": "value5",
"tag6": "value6",
"tag7": "value7",
"tag8": "value8"
},
"time": 14244733039069373,
"precision": "n",
"fields": {
"value": 4541770385657154000
}
}
]
}
The measurement name, a set of tags, a timestamp plus precision identifier, and a
set of fields of values defines a point in the time series. The InfluxDB database
indexes on measurement and the tags, but not on the fields.
We could map our computational events on this model like so:
A point:
{
"name": "A(1,1)",
"tags": {
"pe": "[1][1]", // the resource tag
"lp": "(1,1,1)", // lattice point of the computational event
"re": "a" // recurrence id
},
"time": 1, // clock ticks in terms of nsec
"precision": "n",
"fields": {
"value": 1.0f
}
},
{
"name": "A(1,1)",
"tags": {
"pe": "[1][2]", // the resource tag
"lp": "(1,1,2)", // lattice point of the computational event
"re": "a" // recurrence id
},
"time": 2, // clock ticks in terms of nsec
"precision": "n",
"fields": {
"value": 2.0f
}
},
{
"name": "A(1,1)",
"tags": {
"pe": "[1][3]", // the resource tag
"lp": "(1,1,3)", // lattice point of the computational event
"re": "a" // recurrence id
},
"time": 3, // clock ticks in terms of nsec
"precision": "n",
"fields": {
"value": 3.0f
}
},
*/
package perfmonitor
import (
"fmt"
"github.com/golang/glog"
"sync"
)
type ResourceTag uint64
type ResourceAddress [3]int
type PerfMonitor struct {
Name string
Observations map[ResourceTag]*JobFlow
Mutex *sync.Mutex
}
// NewPerfMonitor creates a new PerfMonitor
func NewPerfMonitor(name string) (perfmon *PerfMonitor) {
perfmon = &PerfMonitor{
Name:name,
Observations:make(map[ResourceTag]*JobFlow),
Mutex:&sync.Mutex{},
}
return
}
/////////////////////////////////////////////////////////////////
// Selectors
func (t *ResourceTag) String() string {
return fmt.Sprintf("%#10X", *t)
}
func (t *ResourceTag) GenerateUniqueIdentifier(index ResourceAddress) ResourceTag {
*t = ResourceTag(uint64(index[0]) | uint64(index[1]) << 16 | uint64(index[2]) << 32)
return *t
}
func (oa *PerfMonitor) Print() {
for key, value := range (*oa).Observations {
fmt.Printf("id[%#x] = %v\n", key, value)
}
}
/////////////////////////////////////////////////////////////////
// Modifiers
func (oa *PerfMonitor) Arrival(tag ResourceTag, timeStamp uint64) {
oa.Mutex.Lock()
defer oa.Mutex.Unlock()
var jf *JobFlow
jf, ok := (*oa).Observations[tag]
if ok {
jf.Arrivals = jf.Arrivals + 1
} else {
jf = new(JobFlow)
jf.Arrivals = 1
(*oa).Observations[tag] = jf
}
}
func (oa *PerfMonitor) Completion(tag ResourceTag, timeStamp uint64) {
oa.Mutex.Lock()
defer oa.Mutex.Unlock()
var jf *JobFlow
jf, ok := (*oa).Observations[tag]
if ok {
jf.Completions = jf.Completions + 1
} else {
// this is bad, Arrival should have allocated a new JobFlow
glog.Errorf("Completion for tag %d was called before Arrival", tag)
}
}