forked from riscv-non-isa/riscv-trace-spec
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpayload.tex
820 lines (726 loc) · 47.4 KB
/
payload.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
\chapter{Instruction Trace Encoder Output Packets} \label{packets}
The bulk of this section describes the payload of packets output from the Instruction Trace Encoder.
The infrastructure used to transport these packets is outside the scope of this document, and
as such the manner in which packets are encapsulated for transport is not specified.
However, the following information must be provided to the encapsulator:
\begin{itemize}
\item The packet type;
\item The packet length, in bytes;
\item The packet payload.
\end{itemize}
Two example transport schemes are the Siemens Messaging Infrastructure, and the Arm Trace Bus.
Figure~\ref{fig:packet-format} shows the encapsulation used for the Siemens infrastructure:
\begin{itemize}
\item The header byte contains a 5-bit field specifying the payload length in bytes, a 2-bit
field indicating the "flow" (destination routing indicator), and a bit to indicate whether
an optional 16-bit timestamp is present;
\item The index field indicates the source of the packet. The number of bits is system dependent,
And the initial value emitted by the trace encoder is zero (it gets adjusted as it propagates
through the infrastructure);
\item An optional 2-byte timestamp;
\item The packet payload.
\end{itemize}
\begin{figure}[h]
\begin{center}
\includegraphics[height=1cm, width=9cm]{newPacket.jpg}
\caption{Example encapsulated packet format}
\label{fig:packet-format}
\end{center}
\end{figure}
Alternatively, for ATB, the source of the packet is indicated by the \textbf{ATID} bus field, and there is
no equivalent of "flow", so an example encapsulation might be:
\begin{itemize}
\item A 5-bit field specifying the payload length in bytes
\item A bit to indicate whether an optional 16-bit timestamp is present;
\item An optional 2-byte timestamp;
\item The packet payload.
\end{itemize}
It may be desirable for packets to start aligned to an ATB word, in which the \textbf{ATBYTES} bus field
in the last beat of a packet can be used to indicate the number of valid bytes.
The remainder of this section describes the contents of the payload
portion which should be independent of the infrastructure. In each table, the fields are listed in
transmission order: first field in the table is transmitted first, and multi-bit fields are
transmitted LSB first.
This packet payload format is used to output encoded instruction
trace. Three different formats are used according to the needs of the
encoding algorithm. The following tables show the format of the
payload - i.e. excluding any encapsulation.
In order to achieve best performance, actual packet lengths may be adjusted using 'sign based compression'.
At the very minimum this should be applied to the address field of format 1 and 2 packets, but ideally will
be applied to the whole packet, regardless of format. This technique eliminates identical bits from the most
significant end of the packet, and adjusts the length of the packet accordingly. A decoder receiving this
shortened packet can reconstruct the original full-length packet by sign-extending from the most significant
received bit.
Where the payload length given in the following tables, or after applying sign-based compression, is not a
multiple of whole bytes in length, the payload must be sign-extended to the nearest byte boundary.
Whilst offering maximum encoding efficiency, variable length packets can present some challenges,
specifically in terms of identifying where the boundaries between packets occur either when packed
packets are written to memory, or when packets are streamed offchip via a communications channel. Two
potential solutions to this are as follows:
\begin{itemize}
\item If the maximum packet payload length is 2\textsuperscript{N}-1 (for example, if N is 5, then the maximum length is
31 bytes), and the minimum packet payload length is 1, then a sequence of at least 2\textsuperscript{N} zero
bytes cannot occur within a packet payload, and therefore the first non-zero byte seen after a sequence of
at least 2\textsuperscript{N} zero bytes must be the first byte of a packet. This approach can be used for
alignment in either memory or a data stream;
\item An alternative approach suitable for packets written to memory is to divide memory into blocks of M bytes
(e.g. 1kbyte blocks), and write packets to memory such that the first byte in every block is always the first
byte of a packet. This means packets cannot span block boundaries, and so zero bytes must be used to pad between
the end of the last message in a block and the block boundary.
\end{itemize}
\section{Format 3 packets} \label{sec:format3}
Format 3 packets are used for synchronization, traps, reporting context and supporting information.
There are 4 sub-formats.
Throughout this document, the term "synchronization packet" is used. This refers specifically to format 3,
subformat 0 and subformat 1 packets.
\section{Format 3 subformat 0 - Synchronisation} \label{sec:format30}
This packet contains all the information the decoder needs to fully identify an instruction. It is sent for
the first traced instruction (unless that instruction also happens to be the first in a trap handler),
and when resynchronization has been scheduled by expiry of the resynchronisation timer.
\begin{table}[htp]
\centering
\caption{Packet format 3, subformat 0}
\label{tab:te_inst3-0}
\begin{tabulary}{\textwidth}{|l|p{35mm}|p{90mm}|}
\hline
{\bf Field name} & {\bf Bits} & {\bf Description} \\
\hline
\textbf{format} & 2 & 11 (sync): synchronisation\\
\hline
\textbf{subformat} & 2 & 00 (start): Start of tracing, or resync \\
\hline
\textbf{branch} & 1 & Set to 0 if the address points to a branch instruction, and the branch was taken.
Set to 1 if the instruction is not a branch or if the branch is not taken. \\
\hline
\textbf{privilege} & \textit {privilege\_width\_p} &
The privilege level of the reported instruction\\
\hline
\textbf{time} & \textit {time\_width\_p} or 0 if \textit {notime\_p} is 1 &
The time value.\\
\hline
\textbf{context} & \textit {context\_width\_p},
or 0 if \textit {nocontext\_p} is 1 &
The instruction context. \\
\hline
\textbf{address} & \textit {iaddress\_width\_p - iaddress\_lsb\_p} &
Full instruction address. Address alignment is determined by \textit {iaddress\_lsb\_p} Address must be left shifted in order to recreate original byte address. \\
\hline
\end{tabulary}
\end{table}
\subsection{Format 3 \textbf{branch} field}
This bit indicates the taken/not taken status in the case where the reported address points to a branch instruction.
Overall efficiency would be slightly improved if this bit was removed, and the branch status was instead
"carried over" and reported in the next \textit{te\_inst} packet. This was considered, but there are several
pathological cases where this approach fails. Consider for example the situation where the first traced instruction
is a branch, and this is then followed immediately by an exception. This results in format 3 packets being generated
on two consecutive instructions. The second packet does not contain a branch
map, so there is no way to report the branch status of the 1st branch, apart from by inserting a format 1 packet in
between. There are two issues with this:
\begin{itemize}
\item It would require the generation of 2 packets on the same cycle, which adds significant additional complexity
to the encoder;
\item It would complicate the algorithm shown in figure~\ref{fig:algo}.
\end{itemize}
\FloatBarrier
\section{Format 3 subformat 1 - Trap} \label{sec:format31}
This packet also contains all the information the decoder needs to fully identify an instruction.
It is sent following an exception or interrupt, and includes the cause,
the 'trap value' (for exceptions), and the address of the trap handler, or
of the exception itself - see section \ref{sec:thaddr}.
If the implicit exception mode is enabled (see section~\ref{sec:implicit-exception}), the trap handler
address is omitted if \textbf{thaddr} is 1.
\begin{table}[htp]
\centering
\caption{Packet format 3, subformat 1}
\label{tab:te_inst3-1}
\begin{tabulary}{\textwidth}{|l|p{35mm}|p{90mm}|}
\hline
{\bf Field name} & {\bf Bits} & {\bf Description} \\
\hline
\textbf{format} & 2 & 11 (sync): synchronisation\\
\hline
\textbf{subformat} & 2 & 01 (trap): Exception or interrupt cause and trap handler address.\\
\hline
\textbf{branch} & 1 & Set to 0 if the address points to a branch instruction, and the branch was taken.
Set to 1 if the instruction is not a branch or if the branch is not taken. \\
\hline
\textbf{privilege} & \textit {privilege\_width\_p} &
The privilege level of the reported instruction.\\
\hline
\textbf{time} & \textit {time\_width\_p} or 0 if \textit {notime\_p} is 1 &
The time value. \\
\hline
\textbf{context} & \textit {context\_width\_p}, or 0 if \textit {nocontext\_p} is 1 & The instruction context \\
\hline
\textbf{ecause} & \textit {ecause\_width\_p} & Exception or interrupt cause. \\
\hline
\textbf{interrupt} & 1 & Interrupt. \\
\hline
\textbf{thaddr} & 1 &
When set to 1, \textbf{address} points to the trap handler address.
When set to 0, \textbf{address} points to the EPC for an exception at the target of an updiscon,
and is undefined for other exceptions and interrupts.\\
\hline
\textbf{address} & \textit {iaddress\_width\_p - iaddress\_lsb\_p} &
Full instruction address. Address alignment is determined by \textit {iaddress\_lsb\_p}
Address must be left shifted in order to recreate original byte address. \\
\hline
\textbf{tval} & \textit {iaddress\_width\_p} &
Value from appropriate \textbf{utval/stval/vstval/mtval} CSR.
Field omitted for interrupts\\
\hline
\end{tabulary}
\end{table}
\subsection{Format 3 \textbf{thaddr} and \textbf{address} fields} \label{sec:thaddr}
If an exception occurs at the target of an uninferable PC discontinuity, the value of
the EPC cannot be infered from the program binary, and so \textbf{address} contains the EPC and
\textbf{thaddr} is set to 0. In this case, the trap handler address will be reported
via a subsequent format 3, subformat 1 packet.
Usually when an exception or interrupt occurs, the cause is reported along
with the 1st address of the trap handler, when that instruction retires. In this case,
\textbf{thaddr} is 1. However, if a second interrupt or exception occurs immediately, details of
this must still be reported, even though the 1st instruction of the handler hasn't retired. In this
situation, \textbf{thaddr} is 0, and \textbf{address} is undefined (unless it contains the EPC as
outlined in the previous paragraph).
(The reason for not reporting the EPC for all exceptions when \textbf{thaddr} is 0 is
that it may be at either the address of the next instruction or current instruction depending on the
exception cause, which can be inferred by the decoder without adding complexity to the encoder.)
\subsection{Format 3 \textbf{tval} field}
This field reports the "trap value" from the appropriate \textbf{utval/stval/vstval/mtval}
CSR, the meaning of which is dependent on the nature of the exception. It is omitted
from the packet for interrupts.
\FloatBarrier
\section{Format 3 subformat 2 - Context} \label{sec:format32}
This packet contains only the context and/or the timestamp, and is output when the context value
changes and can be reported imprecisely (see Table~\ref{tab:context-type}).
\begin{table}[htp]
\centering
\caption{Packet format 3, subformat 2}
\label{tab:te_inst3-2}
\begin{tabulary}{\textwidth}{|l|p{35mm}|p{90mm}|}
\hline
{\bf Field name} & {\bf Bits} & {\bf Description} \\
\hline
\textbf{format} & 2 & 11 (sync): synchronisation\\
\hline
\textbf{subformat} & 2 & 10 (context): Context change \\
\hline
\textbf{privilege} & \textit {privilege\_width\_p} &
The privilege level of the new context.\\
\hline
\textbf{time} & \textit {time\_width\_p} or 0 if \textit {notime\_p} is 1 &
The time value \\
\hline
\textbf{context} & \textit {context\_width\_p}, or 0 if \textit {nocontext\_p} is 1 & The instruction context. \\
\hline
\end{tabulary}
\end{table}
\section{Format 3 subformat 3 - Support} \label{sec:format33}
This packet provides supporting information to aid the decoder. It is issued when
\begin{itemize}
\item Trace is enabled or disabled;
\item The operating mode changes;
\item One or more trace packets cannot be sent (for example, due back-pressure from the packet transport infrastructure).
\end{itemize}
The \textbf{options} field is a placeholder that must be replaced by an implementation specific set of individual bits - one for each of the
optional modes supported by the encoder.
\begin{table}[htp]
\centering
\caption{Packet format 3, subformat 3}
\label{tab:te_inst3-3}
\begin{tabulary}{\textwidth}{|l|p{35mm}|p{90mm}|}
\hline
{\bf Field name} & {\bf Bits} & {\bf Description} \\
\hline
\textbf{format} & 2 & 11 (sync): synchronisation\\
\hline
\textbf{subformat} & 2 & 11 (support): Supporting information for the decoder \\
\hline
\textbf{ienable} & 1 & Indicates if the instruction trace encoder is enabled\\
\hline
\textbf{encoder\_mode} & N & Identifies trace algorithm\newline
Details and number of bits implementation dependent. Currently Branch trace is the only mode defined, indicated by the value 0.\\
\hline
\textbf{qual\_status} & 2 & Indicates qualification status\newline
00 (no\_change): No change to filter qualification \newline
01 (ended\_rep): Qualification ended, preceding \textbf{te\_inst} sent explicitly to indicate last qualification instruction\newline
10 (trace\_lost): One or more instruction trace packets lost.\newline
11 (ended\_ntr): Qualification ended, preceding \textbf{te\_inst} would have been sent anyway due to an updiscon, even if it wasn't the last qualified instruction)\\
\hline
\textbf{ioptions} & N & Values of all instruction trace run-time configuration bits\newline
Number of bits and definitions implementation dependent. Examples might be\newline
- 'sequentially inferred jumps' Don't report the targets of sequentially inferable jumps\newline
- 'implicit return' Don't report function return addresses \newline
- 'implicit exception' Exclude address from format 3, sub-format 1 \textit{te\_inst} packets if trap vector can be determined from \textit{ecause}\newline
- 'branch prediction' Branch predictor enabled\newline
- 'jump target cache' Jump target cache enabled\newline
- 'full address' Always output full addresses (SW debug option)\\
\hline
\textbf{denable} & 1 & Indicates if the data trace is enabled (if supported)\\
\hline
\textbf{dloss} & 1 & One of more data trace packets lost (if supported)\\
\hline
\textbf{doptions} & M & Values of all data trace run-time configuration bits\newline
Number of bits and definitions implementation dependent. Examples might be\newline
- 'no data' Exclude data (just report addresses)\newline
- 'no addr' Exclude address (just report data)\\
\hline
\end{tabulary}
\end{table}
\subsection{Format 3 subformat 3 \textbf{qual\_status} field} \label{sec:qual-status}
When tracing ends, the encoder reports the address of the last traced instruction, and follows this with a format 3,
subformat 3 (supporting information) packet. Two codes are provided for indicating that tracing has ended:
\textbf{ended\_rep} and \textbf{ended\_ntr}. This relates to exactly the same ambiguous case described in detail in
section~\ref{sec:updiscon}, and in principle, the mechanism described in that section can be used to disambiguate when the last traced
instruction is at looplabel. However, that mechanism relies on knowing when creating the format 1/2 packet, that
a format 3 packet will be generated from the next instruction. This is possible because the encoding algorithm uses
a 3-stage pipe with access to the previous, current and next instructions. However, decoding that the next instruction
is a privilege change or exception is straightforward, but determining whether the next instruction meets the filtering
criteria is much more involved, and this information won't typically be available, at least not without adding an
additional pipeline stage, which is expensive. This means a different mechanism is required, and that is provided
by having two codes to indicate that tracing has ended:
\begin{itemize}
\item \textbf{ended\_rep} indicates that the preceding packet would not have been issued if tracing hadn't ended,
which means that tracing stopped after executing looplabel in the 1st loop iteration;
\item \textbf{ended\_ntr} indicates that the preceding packet would have been issued anyway because of an uninferable
PC discontinuity, which means that tracing stopped after executing looplabel in the 2nd loop iteration;
\end{itemize}
If the encoder implementation does have early access to the filtering results, and the designer chooses to use the
\textbf{updiscon} bit when the last qualified instruction is also the instruction following an uninferable PC discontinuity,
loss of qualification should always be indicated using \textbf{ended\_rep}.
\FloatBarrier
\section{Format 2 packets} \label{sec:format2}
This packet contains only an instruction address, and is used when the address of an instruction must be reported,
and there is no unreported branch information. The address is in differential format unless full address mode is
enabled (see section~\ref{sec:full-address}).
\begin{table}[!h]
\centering
\caption{Packet format 2}
\label{tab:te_inst2}
\begin{tabulary}{\textwidth}{|l|p{35mm}|p{90mm}|}
\hline
{\bf Field name} & {\bf Bits} & {\bf Description} \\
\hline
\textbf{format} & 2 & 10 (addr-only): differential address and no branch information\\
\hline
\textbf{address} & \textit {iaddress\_width\_p - iaddress\_lsb\_p} &
Differential instruction address.\\
\hline
\textbf{notify} & 1 &
If the value of this bit is different from the MSB of \textbf{address}, it indicates that this
packet is reporting an instruction that is not the target of an uninferable discontinuity
because a notification was requested via \textbf{trigger[2]} (see section~\ref{sec:trigger}). \\
\hline
\textbf{updiscon} & 1 &
If the value of this bit is different from \textbf{notify}, it indicates that this
packet is reporting the instruction following an uninferable discontinuity and is also the
instruction before an exception, privilege change or resync
(i.e. it will be followed immediately by a format 3 \textit{te\_inst}).\\
\hline
\textbf{irreport} & 1 &
If the value of this bit is different from \textbf{updiscon}, it indicates that this packet is
reporting an instruction that is either: \newline
following a return because its address differs from the predicted return address at the top of
the implicit\_return return address stack, or \newline
the last retired before an exception, interrupt, privilege change or resync because it is necessary to report
the current address stack depth or nested call count. \\
\hline
\textbf{irdepth} & \textit {return\_stack\_size\_p + (return\_stack\_size\_p > 0 ? 1 : 0) + call\_counter\_size\_p} &
If the value of \textbf{irreport} is different from \textbf{updiscon}, this field
indicates the number of entries on the return address stack (i.e. the entry number of the return that
failed) or nested call count. If \textbf{irreport} is the same value as \textbf{updiscon},
all bits in this field will also be the same value as \textbf{updiscon}. \\
\hline
\end{tabulary}
\end{table}
\subsection{Format 2 \textbf{notify} field} \label{sec:notify}
This bit is encoded so that most of the time it will take the same value as the MSB of the \textbf{address} field,
and will therefore compress away, having no impact on the encoding efficiency. It is required in order to cover
the case where an address is reported as a result of a notification request, signalled by setting the
\textbf{trigger[2]} input to 1.
\subsection{Format 2 \textbf{notify} and \textbf{updiscon} fields} \label{sec:updiscon}
These bits are encoded so that most of the time they will compress away, having no impact on efficiency, by taking on
the same value as the preceding bit in the packet (\textbf{notify} is normally the same value as the MSB of the
\textbf{address} field, and \textbf{updiscon} is normally the same value as \textbf{notify}). They are required in
order to cover a pathological case where otherwise the decoding software would not be able to reconstruct the program
execution unambiguously. Consider the following code fragment:
looplabel~~-~4: \textbf{\textit{opcode A}} \newline
looplabel~~~~~: \textbf{\textit{opcode B}} \newline
looplabel~+~4: \textbf{\textit{opcode C}} \newline
~~: \newline
looplabel~+~N: \textbf{\textit{JALR}} \# Jump to looplabel\newline
This is a loop with an indirect jump back to the next iteration. This is an uninferable discontinuity, and will be
reported via a format 1 or 2 packet. Note however that the initial entry into the loop is fall-through from the
instruction at looplabel - 4, and will not be reported explicitly. This means that when reconstructing the execution
path of the program, the looplabel address is encountered twice. On first glance, it appears that the decoder can determine
when it reaches the loop label for the 1st time that this is not the end of execution, because the preceding
instruction was not one that can cause an uninferable discontinuity. It can therefore continue reconstructing the
execution path until it reaches the \textbf{\textit{JALR}}, from where it can deduce that \textbf{\textit{opcode B}} at
looplabel is the final retired instruction. However, there are circumstances where this approach
does not work. For example, consider the case where there is an exception at looplabel + 4. In this case, the decoder
cannot tell whether this occurred during the 1st or 2nd loop iterations, without additional information from the
encoder. This is the purpose of the \textbf{updiscon} field. In more detail:
There are four scenarios to consider:
\begin{enumerate}
\item Code executes through to the end of the 1st loop iteration, and the encoder reports looplabel using format 1/2 following
the \textbf{\textit{JALR}}, then carries on executing the 2nd pass of the loop. In this case \textbf{updiscon} == \textbf{notify}.
The next packet will be a format 1/2;
\item Code executes through to the end of the 1st loop iteration and jumps back to looplabel, but there is then an exception,
privilege change or resync in the second iteration at looplabel + 4. In this case, the encoder reports looplabel using
format 1/2 following the \textbf{\textit{JALR}}, with \textbf{updiscon} == !\textbf{notify}, and the next packet is a
format 3;
\item An exception occurs immediately after the 1st execution of looplabel. In this case, the encoder reports looplabel using
format 0/1/2 with \textbf{updiscon} == \textbf{notify}, and the next packet is a format 3;
\item The hart requests the encoder to notify retirement of the instruction at looplabel. In this case, the encoder reports the 1st
execution of looplabel with \textbf{notify} == !\textbf{address[MSB]}, and subsequent executions with \textbf{notify} ==
\textbf{address[MSB]} (because they would have been reported anyway as a result of the \textbf{\textit{JALR}}).
\end{enumerate}
Looking at this from the perspective of the decoder, the decoder receives a format 1/2 reporting the address of the 1st instruction in the
loop (looplabel). It follows the execution path from the last reported address, until it reaches looplabel. Because looplabel is not
preceded by an uninferable discontinuity, it must take the value of \textbf{notify} and \textbf{updiscon} into consideration, and may need
to wait for the next packet in order to determine whether it has reached the final retired instruction:
\begin{itemize}
\item If \textbf{updiscon} == !\textbf{notify}, this indicates case 2. The decoder must continue until it encounters
looplabel a 2nd time;
\item If \textbf{updiscon} == \textbf{notify}, the decoder cannot yet distinguish cases 1 and 3, and must wait for the
next packet.
\begin{itemize}
\item If the next packet is a format 3, this is case 3. The decoder has already reached the correct instruction;
\item If the next packet is a format 1/2, this is case 1. The decoder must continue until it encounters
looplabel a 2nd time.
\end{itemize}
\item If \textbf{notify} == !\textbf{address[MSB]}, this indicates case 4, 1st iteration. The decoder has reached the
correct instruction.
\end{itemize}
This example uses an exception at looplabel + 4, but anything that could cause a format 3 for looplabel + 4 would result in
the same behavior: a privilege change, or the expiry of the resync timer. It could also occur if looplabel was the last
traced instruction (because tracing was disabled for some reason). See section~\ref{sec:qual-status} for further discussion
of this point.
\textbf{Note:} Correct decoder behavior could have been achieved by implementing the \textbf{notify} bit only, setting it
to the inverse of \textbf{address[MSB]} whenever an address is reported and it is not the instruction following an
uninferable discontinuity. However, this would have been much less efficient, as this would have required \textbf{notify}
to be different from \textbf{address[MSB]} the majority of the time when outputting a format 1/2 before an exception,
interrupt or resync (as the probability of this instruction being the target of an uninferable jump is low). Using 2
separate bits results in superior compression.
\subsection{Format 2 \textbf{irreport} and \textbf{irdepth}} \label{sec:irxx}
These bits are encoded so that most of the time they will take the same value as the \textbf{updiscon} field,
and will therefore compress away, having no impact on the encoding efficiency. If implicit\_return mode is enabled, the
encoder keeps track of the number of traced nested calls, either as a simple count (\textit{call\_counter\_size\_p}
non-zero) or a stack of predicted return addresses (\textit{return\_stack\_size\_p} non-zero).
Where a stack of predicted return addresses is implemented, the predicted return addresses are compared with the actual
return addresses, and a \textit{te\_inst} packet will be generated with \textbf{irreport} set to the opposite value to
\textbf{updiscon} if a misprediction occurs.
In some cases it is also necessary to report the current stack depth or call count if the packet is reporting the last
instruction before an exception, interrupt, privilege change or resync. There are two cases of concern:
\begin{itemize}
\item If the reported address is the instruction following a return, and it is not mis-predicted, the encoder must
report the current stack depth or call count if it is non-zero. Without this, the decoder would attempt to follow
the execution path until it encountered the reported address from the outermost nested call;
\item If the reported address is not the instruction following a return, the encoder must report the current stack
depth or call count unless:
\begin{itemize}
\item There have been no returns since the last call (in which case the decoder will correctly stop in the
innermost call), or
\item There has been at least one branch since the last return (in which case the decoder will correctly
stop in the call where there are no unprocessed branches).
\end{itemize}
Without this, the decoder would follow the execution path until it encountered the reported address, and in most cases
this would be the correct point. However, this cannot be guaranteed for recursive functions, as the reported address
will occur multiple times in the execution path.
\end{itemize}
\FloatBarrier
\section{Format 1 packets} \label{sec:format1}
This packet includes branch information, and is used when either the branch information must be reported
(for example because the branch map is full), or when the address of an instruction must be reported, and there has
been at least one branch since the previous packet. If included, the address is in differential format unless full
address mode is enabled (see section~\ref{sec:full-address}).
\begin{table}[htp]
\centering
\caption{Packet format 1 - address, branch map}
\label{tab:te_inst1-addr-map}
\begin{tabulary}{\textwidth}{|l|p{35mm}|p{90mm}|}
\hline
{\bf Field name} & {\bf Bits} & {\bf Description} \\
\hline
\textbf{format} & 2 & 01 (diff-delta): includes branch information and may include differential address\\
\hline
\textbf{branches} & 5 & Number of valid bits \textbf{branch\_map}. The number of bits of \textbf{branch\_map} is determined as follows: \newline
0: (cannot occur for this format) \newline
1: 1 bit \newline
2-3: 3 bits \newline
4-7: 7 bits \newline
8-15: 15 bits \newline
16-31: 31 bits \newline
For example if branches = 12, \textbf{branch\_map} is 15 bits long, and the 12 LSBs are valid. \\
\hline
\textbf{branch\_map} & Determined by \newline
\textbf{branches} field. &
An array of bits indicating whether branches are taken or not.\newline
Bit 0 represents the oldest branch instruction executed. For each bit: \newline
0: branch taken \newline
1: branch not taken \\
\hline
\textbf{address} & \textit {iaddress\_width\_p - iaddress\_lsb\_p} &
Differential instruction address.\\
\hline
\textbf{notify} & 1 &
If the value of this bit is different from the MSB of \textbf{address}, it indicates that this
packet is reporting an instruction that is not the target of an uninferable discontinuity
because a notification was requested via \textbf{trigger[2]} (see section~\ref{sec:trigger}). \\
\hline
\textbf{updiscon} & 1 &
If the value of this bit is different from the MSB of \textbf{notify}, it indicates that this
packet is reporting the instruction following an uninferable discontinuity and is also the
instruction before an exception, privilege change or resync
(i.e. it will be followed immediately by a format 3 \textit{te\_inst}).\\
\hline
\textbf{irreport} & 1 &
If the value of this bit is different from \textbf{updiscon}, it indicates that this packet is
reporting an instruction that is either: \newline
following a return because its address differs from the predicted return address at the top of
the implicit\_return return address stack, or \newline
the last retired before an exception, interrupt, privilege change or resync because it is necessary to report
the current address stack depth or nested call count. \\
\hline
\textbf{irdepth} & \textit {return\_stack\_size\_p + (return\_stack\_size\_p > 0 ? 1 : 0) + call\_counter\_size\_p} &
If the value of \textbf{irreport} is different from \textbf{updiscon}, this field
indicates the number of entries on the return address stack (i.e. the entry number of the return that
failed) or nested call count. If \textbf{irreport} is the same value as \textbf{updiscon},
all bits in this field will also be the same value as \textbf{updiscon}. \\
\hline
\end{tabulary}
\end{table}
\begin{table}[htp]
\centering
\caption{Packet format 1 - no address, branch map}
\label{tab:te_inst1-noaddr-map}
\begin{tabulary}{\textwidth}{|l|p{35mm}|p{90mm}|}
\hline
{\bf Field name} & {\bf Bits} & {\bf Description} \\
\hline
\textbf{format} & 2 & 01 (diff-delta): includes branch information and may include differential address\\
\hline
\textbf{branches} & 5 & Number of valid bits in \textbf{branch\_map}. The length of \textbf{branch\_map} is determined as follows: \newline
0: 31 bits, no \textbf{address} in packet \newline
1-31: (cannot occur for this format) \\
\hline
\textbf{branch\_map} & 31 &
An array of bits indicating whether branches are taken or not.\newline
Bit 0 represents the oldest branch instruction executed. For each bit: \newline
0: branch taken \newline
1: branch not taken \\
\hline
\end{tabulary}
\end{table}
\subsection{Format 1 \textbf{updiscon} field}
See section~\ref{sec:updiscon}.
\subsection{Format 1 \textbf{branch\_map} field}
When the branch map becomes full it must be reported, but in most cases there is no need to report an address.
This is indicated by setting \textbf{branches} to 0. The exception to this is when the instruction immediately prior to
the final branch causes an uninferable discontinuity, in which case \textbf{branches} is set to 31.
The choice of sizes (1, 3, 7, 15, 31) is designed to minimize efficiency loss. On average there will be some 'wasted' bits
because the number of branches to report is less than the selected size of the \textbf{branch\_map} field.
Using a tapered set of sizes means that the number of wasted bits will on average be less for shorter packets.
If the number of branches between updiscons is randomly distributed then the probability of generating packets with large
branch counts will be lower, in which case increased waste for longer packets will have less overall impact.
Furthermore, the rate at which packets are generated can be higher for lower branch counts, and so reducing
waste for this case will improve overall bandwidth at times where it is most important.
\subsection{Format 1 \textbf{irreport} and \textbf{irdepth} fields}
See section~\ref{sec:irxx}.
\FloatBarrier
\section{Format 0 packets} \label{sec:format0}
This format is intended for optional efficiency extensions. Currently two extensions are defined, for reporting counts of
correctly predicted branches, and for reporting the jump target cache index.
If branch prediction is supported and is enabled, then there is a choice of whether to output a
full branch map (via format 1), or a count of correctly predicted branches.
The count format is used if the number of correctly predicted branches is at least 31. If there are 31 unreported
branches (i.e. the branch map is full), but not all of them were predicted correctly, then the branch map will be output.
A branch count will be output under the following conditions:
\begin{itemize}
\item A branch is mis-predicted. The count value will be the number of correctly predicted branches,
minus 31. No address information is provided - it is implicitly that of the branch which failed
prediction;
\item An updiscon, interrupt or exception requires the encoder to output an address. In this case
the encoder will output the branch count (number of correctly predicted branches, minus 31);
\item The branch count reaches its maximum value. Strictly speaking an address isn't required for this case,
but is included to avoid having to distinguish the packet format from the case above. It will occur so rarely
that the bandwidth impact can be ignored.
\end{itemize}
If a jump target cache is supported and enabled, and the address to report following an updiscon is
in the cache then the encoder can output the cache index index using format 0, subformat 1.
However, the encoder may still choose to output the differential address using format 1 or 2 if the
resulting packet is shorter. This may occur if the differential address is zero, or very small.
\begin{table}[htp]
\centering
\caption{Packet format 0, subformat 0 - no address, branch count}
\label{tab:te_inst0-0-noaddr-count}
\begin{tabulary}{\textwidth}{|l|p{35mm}|p{90mm}|}
\hline
{\bf Field name} & {\bf Bits} & {\bf Description} \\
\hline
\textbf{format} & 2 & 00 (opt-ext): formats for optional efficiency extensions\\
\hline
\textbf{subformat} & See section~\ref{sec:f0s} & 0 (correctly predicted branches)\\
\hline
\textbf{branch\_count} & 32 & Count of the number of correctly predicted branches, minus 31. \\
\hline
\textbf{branch\_fmt} & 2 & 00 (no-addr): Packet does not contain an \textbf{address}, and the branch following the
last correct prediction failed. \newline
01-11: (cannot occur for this format) \\
\hline
\end{tabulary}
\end{table}
\begin{table}[htp]
\centering
\caption{Packet format 0, subformat 0 - address, branch count}
\label{tab:te_inst0-0-addr-count}
\begin{tabulary}{\textwidth}{|l|p{35mm}|p{90mm}|}
\hline
{\bf Field name} & {\bf Bits} & {\bf Description} \\
\hline
\textbf{format} & 2 & 00 (opt-ext): formats for optional efficiency extensions\\
\hline
\textbf{subformat} & See section~\ref{sec:f0s} & 0 (correctly predicted branches)\\
\hline
\textbf{branch\_count} & 32 & Count of the number of correctly predicted branches, minus 31. \\
\hline
\textbf{branch\_fmt} & 2 & 10 (addr): Packet contains an \textbf{address}. If this points to
a branch instruction, then the branch was predicted correctly. \newline
11 (addr-fail): Packet contains an \textbf{address} that points to a branch which failed the prediction. \newline
00,01: (cannot occur for this format) \\
\hline
\textbf{address} & \textit {iaddress\_width\_p - iaddress\_lsb\_p} &
Differential instruction address.\\
\hline
\textbf{notify} & 1 &
If the value of this bit is different from the MSB of \textbf{address}, it indicates that this
packet is reporting an instruction that is not the target of an uninferable discontinuity
because a notification was requested via \textbf{trigger[2]} (see section~\ref{sec:trigger}). \\
\hline
\textbf{updiscon} & 1 &
If the value of this bit is different from \textbf{notify}, it indicates that this
packet is reporting the instruction following an uninferable discontinuity and is also the
instruction before an exception, privilege change or resync
(i.e. it will be followed immediately by a format 3 \textit{te\_inst}).\\
\hline
\textbf{irreport} & 1 &
If the value of this bit is different from \textbf{updiscon}, it indicates that this packet is
reporting an instruction that is either: \newline
following a return because its address differs from the predicted return address at the top of
the implicit\_return return address stack, or \newline
the last retired before an exception, interrupt, privilege change or resync because it is necessary to report
the current address stack depth or nested call count. \\
\hline
\textbf{irdepth} & \textit {return\_stack\_size\_p + (return\_stack\_size\_p > 0 ? 1 : 0) + call\_counter\_size\_p} &
If the value of \textbf{irreport} is different from \textbf{updiscon}, this field
indicates the number of entries on the return address stack (i.e. the entry number of the return that
failed) or nested call count. If \textbf{irreport} is the same value as \textbf{updiscon},
all bits in this field will also be the same value as \textbf{updiscon}. \\
\hline
\end{tabulary}
\end{table}
\begin{table}[htp]
\centering
\caption{Packet format 0, subformat 1 - jump target index, branch map}
\label{tab:te_inst0-1-cache-map}
\begin{tabulary}{\textwidth}{|l|p{35mm}|p{90mm}|}
\hline
{\bf Field name} & {\bf Bits} & {\bf Description} \\
\hline
\textbf{format} & 2 & 00 (opt-ext): formats for optional efficiency extensions\\
\hline
\textbf{subformat} & See section~\ref{sec:f0s} & 1 (jump target cache)\\
\hline
\textbf{index} & \textit {\textit{cache\_size\_p}} &
Jump target cache index of entry containing target address.\\
\hline
\textbf{branches} & 5 & Number of valid bits in \textbf{branch\_map}. The length of \textbf{branch\_map} is determined as follows: \newline
0: (cannot occur for this format) \newline
1: 1 bit \newline
2-3: 3 bits \newline
4-7: 7 bits \newline
8-15: 15 bits \newline
16-31: 31 bits \newline
For example if branches = 12, \textbf{branch\_map} is 15 bits long, and the 12 LSBs are valid. \\
\hline
\textbf{branch\_map} & Determined by \newline
\textbf{branches} field. &
An array of bits indicating whether branches are taken or not.\newline
Bit 0 represents the oldest branch instruction executed. For each bit: \newline
0: branch taken \newline
1: branch not taken \\
\hline
\textbf{irreport} & 1 &
If the value of this bit is different from \textbf{branch\_map[MSB]}, it indicates that this packet is
reporting an instruction that is either: \newline
following a return because its address differs from the predicted return address at the top of
the implicit\_return return address stack, or \newline
the last retired before an exception, interrupt, privilege change or resync because it is necessary to report
the current address stack depth or nested call count. \\
\hline
\textbf{irdepth} & \textit {return\_stack\_size\_p + (return\_stack\_size\_p > 0 ? 1 : 0) + call\_counter\_size\_p} &
If the value of \textbf{irreport} is different from \textbf{branch\_map[MSB]}, this field
indicates the number of entries on the return address stack (i.e. the entry number of the return that
failed) or nested call count. If \textbf{irreport} is the same value as \textbf{branch\_map[MSB]},
all bits in this field will also be the same value as \textbf{branch\_map[MSB]}. \\
\hline
\end{tabulary}
\end{table}
\begin{table}[htp]
\centering
\caption{Packet format 0, subformat 1 - jump target index, no branch map}
\label{tab:te_inst0-1-cache-nomap}
\begin{tabulary}{\textwidth}{|l|p{35mm}|p{80mm}|}
\hline
{\bf Field name} & {\bf Bits} & {\bf Description} \\
\hline
\textbf{format} & 2 & 00 (opt-ext): formats for optional efficiency extensions\\
\hline
\textbf{subformat} & See section~\ref{sec:f0s} & 1 (jump target cache)\\
\hline
\textbf{index} & \textit {\textit{cache\_size\_p}} &
Jump target cache index of entry containing target address.\\
\hline
\textbf{branches} & 5 & Number of valid bits in \textbf{branch\_map}. The length of \textbf{branch\_map} is determined as follows: \newline
0: no \textbf{branch\_map} in packet \newline
1-31: (cannot occur for this format) \\
\hline
\textbf{irreport} & 1 &
If the value of this bit is different from \textbf{branches[MSB]}, it indicates that this packet is
reporting an instruction that is either: \newline
following a return because its address differs from the predicted return address at the top of
the implicit\_return return address stack, or \newline
the last retired before an exception, interrupt, privilege change or resync because it is necessary to report
the current address stack depth or nested call count. \\
\hline
\textbf{irdepth} & \textit {return\_stack\_size\_p + (return\_stack\_size\_p > 0 ? 1 : 0) + call\_counter\_size\_p} &
If the value of \textbf{irreport} is different from \textbf{branches[MSB]}, this field
indicates the number of entries on the return address stack (i.e. the entry number of the return that
failed) or nested call count. If \textbf{irreport} is the same value as \textbf{branches[MSB]},
all bits in this field will also be the same value as \textbf{branches[MSB]}. \\
\hline
\end{tabulary}
\end{table}
\subsection{Format 0 subformat field} \label{sec:f0s}
The width of this field depends on the number of optional formats supported. Currently, two optional formats are
defined (correctly predicted branches and jump target cache). The width is specified by the
\textit{f0s\_width} discovery field (see section~\ref{sec:disco}). If multiple optional formats are supported, the field
width must be non-zero. However, if only one optional format is supported, the field can be
omitted, and the value of the field inferred from the \textbf{options} field in the support packet (see section~\ref{sec:format33}.
This provision allows additional formats to be added in future without reducing the efficiency of the existing formats.
\subsection{Format 0 \textbf{branch\_fmt} field}
This is encoded so that when no address is required it will be zero, allowing the upper bits of the \textbf{branch\_count}
field to be compressed away.
When a branch count is reported without an address it is because a branch has failed the prediction. However, when an address is
reported along with a branch count, it will be because the packet was initiated by an uninferable discontinuity, an exception, or
because a branch has been encountered when \textbf{branch\_count} is 0xffff\_ffff. For the latter case, the
reported address will always be for a branch, and in the former cases it may be. If it is a branch,
it is necessary to be explicit about whether or not the prediction was met or not. If it is met, then the reported address is
that of the last correctly predicted branch.
\subsection{Format 0 \textbf{irreport} and \textbf{irdepth} fields}
These bits are encoded so that most of the time they will take the same value as the immediately preceding bit
(\textbf{updiscon}, \textbf{branch\_map[MSB]} or \textbf{branches[MSB]} depending on the specific packet format).
Purpose and behavior is as described in section~\ref{sec:irxx}.
For the jump target cache (subformat 1), they are included to allow return addresses that fail the implicit return
prediction but which reside in the jump target cache to be reported using this format. An implementation
could omit these if all implicit return failures are reported using format 1.