forked from clMathLibraries/clBLAS
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCHANGELOG
245 lines (211 loc) · 10.1 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
# ########################################################################
# Copyright 2013 Advanced Micro Devices, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ########################################################################
clBLAS Readme
Version: 1.10
Release Date: April 2013
ChangeLog:
____________
Current Version:
New:
* New Level 1 routines added (an 'x' implies all 4 precisions)
xSWAP, xCOPY, xSCAL, CSSCAL, ZDSCAL, xAXPY, SDOT, DDOT,
CDOTU, ZDOTU, CDOTC, ZDOTC, xROTG, SROTMG, DROTMG,
SROT, DROT, CSROT, ZDROT, SROTM, DROTM, SNRM2, DNRM2,
SCNRM2, DZNRM2, ixAMAX, SASUM, DASUM, SCASUM, DZASUM
* Samples have been added for the new functions
* This release tested using the 9.012 runtime driver and the 2.8 APPSDK
Fixed:
* Failures in *trsm functions with clMAGMA tests
Known Issues:
* Failures & hangs in ztrmm, *trsv, *tpsv functions on Southern Island GPU devices
* Failures in zgemm functions on Northern Island GPU devices
* Failures & hangs are expected to be fixed in the upcoming AMD graphics driver versions.
It is strongly recommended that users keep their graphics driver versions up to date.
____________
Version 1.8.291:
Fixed:
* Failures in the following functions: ssyr2, ssyr2k, strsm, strsv, ssyrk, cher,
ctrsv, csymm, cher2, ztrmm on Southern Island GPU devices.
* Failures in the following functions: dsyr, dsyr2, dgemv, dsyrk,
dsyr2k, zsyr2k on Trinity platforms.
Known Issues:
* Failures in *trsm functions with clMAGMA tests
____________
Version 1.8.269 (Beta, clMAGMA support):
New:
* No new routines
* This release tested using the 8.961 runtime driver and the 2.6 APPSDK
Known Issues:
* The clBLASTune executable has been observed to hang on Windows. If
this happens, abort execution of the tune program; it is not required
for correct operation of the BLAS routines (as of 8.872).
* clBLAS can return invalid results on CPU devices (as
of 8.961). The CPU device is primarily a test/debug device, and GPU
devices are unaffected.
* clBLAS can return invalid results for double precision functions (dsyr,
dsyr2, dgemv, dsyrk, dsyr2k, zsyr2k) on Trinity platforms (as of
8.961).
* clBLAS can return invalid results (ssyr2, ssyr2k, strsm, strsv, ssyrk, cher,
ctrsv, csymm, cher2, ztrmm) on Southern Island GPU devices (as of 8.961).
____________
Version 1.7 (Beta, clMAGMA support):
New:
* New Level 3 routines added (an 'x' implies all 4 precisions)
CHER2K, ZHER2K
* New Level 2 routines added (an 'x' implies all 4 precisions)
xTPMV, xTPSV, SSPVM, DSPMV, CHPMV, ZHPMV, SSPR, DSPR, CHPR, ZHPR,
SSPR2, DSPR2, CHPR2, ZHPR2, xGBMV, CHBMV, ZHBMV, SSBMV, DSBMV,
xTBMV, xTBSV
* Samples have been added for the new functions, but are not fully tested
* This release tested using the 8.951 runtime driver and the 2.6 APPSDK
* Note that documentation is incomplete for the new functions
Known Issues:
* The clBLASTune executable has been observed to hang on Windows. If
this happens, abort execution of the tune program; it is not required
for correct operation of the BLAS routines (as of 8.872).
* clBLAS can return invalid results on CPU devices that support AVX (as
of 8.951). CPU devices that support up to SSE3 are unaffected. The
CPU device is primarily a test/debug device, and GPU devices are
unaffected.
* clBLAS can return invalid results for double precision functions (dsyr,
dsyr2, dgemv, dsyrk, dsyr2k, zsyr2k) on Trinity platforms (as of
8.951).
* clBLAS can return invalid results (ssyr, ssyr2, strsv, ctrsv, ssyrk,
ssyr2k, ztrmm) on Southern Island GPU devices (as of 8.951).
____________
Version 1.6:
New:
* New Level 3 routines added (an 'x' implies all 4 precisions)
CSYRK, ZSYRK, CSYR2K, ZSYR2K, CHEMM, ZHEMM, CHERK, ZHERK, xSYMM
* New Level 2 routines added (an 'x' implies all 4 precisions)
CGEMV, ZGEMV, xTRMV, xTRSV, CHEMV, ZHEMV, SGER, DGER, CGERU, ZGERU,
CGERC, ZGERC, CHER, ZHER, CHER2, ZHER2, SSYR, DSYR, SSYR2, DSYR2
* For all the original functions prior to 1.6, a new API has been introduced
with an *Ex suffix. These extended API's add new parameters that allow
users to specify an offset to a matrix argument. This allows efficient
sub-matrix indexing within a clBLAS routine without requiring expensive
sub-matrix copy operations.
* Samples have been added for the new functions
* Preview: Support for AMD Radeon™ HD7000 series GPUs
* This release tested using the 8.92 runtime driver and the 2.6 APP SDK
Known Issues:
* The clBLASTune executable has been observed to hang on Windows. If this
happens, abort execution of the tune program; it is not required for
correct operation of the BLAS routines (as of 8.872).
* The CPU device for clBLAS is not functioning for this release (as of
8.872). The CPU device is primarily a test/debug device, and GPU
devices are unaffected.
____________
Version 1.4:
New:
* New Level 3 routines added
SSYRK, DSYRK, SSYR2K, DSYR2K
* New Level 2 routines added
SGEMV, DGEMV, SSYMV, DSYMV
* The image support functions (clblasAddScratchImage,
clblasRemoveScratchImage) have been deprecated. Images are no
longer required for the highest performance.
* InstallShield is now used for APPML libraries. The default install
location has changed from c:\amd\clBLAS to
C:\Program Files (x86)\AMD\clBLAS. It is recommended that previous
versions of clBLAS are uninstalled first.
* Samples have been added for the new functions
* This release tested using the 8.872 runtime driver and the 2.5 APP SDK
Known Issues:
* The clBLASTune executable has been observed to hang on Windows. If this
happens, abort execution of the tune program; it is not required for
correct operation of the BLAS routines (as of 8.872).
* The CPU device for clBLAS is not functioning for this release (as of
8.872). The CPU device is primarily a test/debug device, and GPU
devices are unaffected.
____________
Version 1.2:
* The library now supports both 32- and 64-bit Windows and Linux operating
systems.
* xTRSM routines are available in 1.2.
* clBLAS routines return clBLASStatus error codes, instead of native
OpenCL error codes
Fixed:
* xTRMM routines were not properly handling implicit unit diagonal
elements and implicit off-diagonal zero values specified by the BLAS
parameters SIDE, UPLO and DIAG.
* Possible crash with CPU device on 32-bit systems.
* clblasDgemm routine return an invalid event as its last argument.
* clBLAS routines return clblasStatus error codes, instead of
native OpenCL error codes.
Known Issues:
* The clBLASTune executable has been observed to hang on Windows. If this
happens, abort execution of the tune program; it is not required for
correct operation of the BLAS routines (as of 8.872).
* The CPU device for clBLAS is not functioning for this release (as of
8.872). The CPU device is primarily a test/debug device, and GPU
devices are unaffected.
____________________
Version 1.0:
* Initial release
Known Issues:
* Available only on Linux64.
* xTRMM routines were not properly handling implicit unit diagonal elements
and implicit off-diagonal zero values specified by the BLAS parameters
SIDE, UPLO and DIAG
* clblasDgemm returned an invalid event as its last argument
_____________
Building the Samples:
To install the Linux versions of clBLAS, uncompress the initial download, then
execute the install script.
For example:
tar -xf clBLAS-${version}-Linux.tar.gz
- This installs three files into the local directory, one being an
executable bash script.
sudo mkdir /opt/clBLAS-${version}
- This pre-creates the install directory with proper permissions
in /opt if it is to be installed there. (This is the default.)
./install-clBLAS-${version}.sh
- This prints an EULA and uncompresses files into the chosen install
directory.
cd ${installDir}/bin64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${OpenCLLibDir}:${clBLASLibDir}
- Be sure to export library dependencies to resolve all external
linkages to the client program; you can create a bash script to
help automate this procedure.
./example_sgemm
- Run a simple client; one example is provided for each supported
main BLAS function family.
The sample program does not ship with native build files; instead, a CMake
file is shipped, and the user generates a native build file for their system.
For example:
cd ${installDir}
mkdir samplesBin/
- This creates a sister directory to the samples directory that
houses the native makefiles and the generated files from the
build.
cd samplesBin/
ccmake ../samples/
- ccmake is a curses-based cmake program; it takes a parameter
that specifies the location of the source code to compile.
- Hit 'c' to configure for the platform; ensure that the
dependencies to external libraries are satisfied, including
paths to 'ATI Stream SDK'.
- After dependencies are satisfied, hit 'c' again to finalize
configuration. Then, hit 'g' to generate a makefile and
exit ccmake.
make help
- Look at the options available for make.
make
- Build the sample client program.
./example_sgemm
- Run a simple client; one example is provided for each supported main
BLAS function family.