Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update blis to 1.0 #6

Open
wants to merge 749 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
749 commits
Select commit Hold shift + click to select a range
1fc23d2
Safelist 'master', 'dev', 'amd' branches.
fgvanzee Sep 21, 2021
c52c431
Merge branch 'dev'
fgvanzee Sep 26, 2021
89aaf00
Updates to FAQ.md, Sandboxes.md, and README.md.
fgvanzee Sep 28, 2021
3442d40
More minor fixes to FAQ.md and Sandboxes.md.
fgvanzee Sep 28, 2021
b36fb0f
Fixed newly broken link to CREDITS in FAQ.md.
fgvanzee Sep 28, 2021
5013a6c
More edits and fixes to docs/FAQ.md.
fgvanzee Sep 29, 2021
ae0eeea
Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs.
devinamatthews Sep 29, 2021
13dbd5b
Apply patch from @xrq-phys.
devinamatthews Oct 2, 2021
0a45bc0
Merge pull request #552 from flame/armsve_beta_0
devinamatthews Oct 2, 2021
abc6483
Armv8 Fix 6x8 Row-Maj Ukr
xrq-phys Oct 3, 2021
f5c03e9
Armv8 Handle *beta == 0 for GEMMSUP ?rc Case.
xrq-phys Oct 3, 2021
91408d1
Use @path-based install name on MacOS and use relocatable RPATH entri…
devinamatthews Oct 4, 2021
d0a0b4b
Arm micro-architecture dispatch (#344)
loveshack Oct 4, 2021
c4a3168
Fix $ORIGIN usage on linux.
devinamatthews Oct 4, 2021
64a421f
Add an option to control whether or not to use @rpath.
devinamatthews Oct 4, 2021
80c5366
Move unused ARM SVE kernels to "old" directory.
devinamatthews Oct 4, 2021
53377fc
Merge pull request #554 from flame/armsve-cleanup
devinamatthews Oct 4, 2021
6d3036e
Merge pull request #545 from hominhquan/clean_error
devinamatthews Oct 4, 2021
9905f44
Merge pull request #553 from flame/rpath-fix
devinamatthews Oct 4, 2021
079fbd4
Merge branch 'master' into arm64-hi-bw
devinamatthews Oct 4, 2021
40baf83
Armv8 Handle *beta == 0 for GEMMSUP ??r Case.
xrq-phys Oct 5, 2021
4bfadf9
Firestorm Block Size Fixes
xrq-phys Oct 5, 2021
353a0d8
Update .appveyor.yml
devinamatthews Oct 5, 2021
c302499
Fix data race in testsuite.
devinamatthews Oct 5, 2021
34919de
Make error checking level a thread-local variable.
devinamatthews Oct 2, 2021
b9da6d5
Armv8 GEMMSUP Edge Cases Require Signed Ints
xrq-phys Oct 6, 2021
a024715
Firestorm CPUID Dispatcher
xrq-phys Oct 6, 2021
14b1358
Add test for Apple M1 (firestorm)
devinamatthews Oct 6, 2021
2920dde
Armv8 DGEMMSUP Fix 8x4m Store Inst. Typo
xrq-phys Oct 6, 2021
d7a3372
Armv8 DGEMMSUP Fix Edge 6x4 Switch Case Typo
xrq-phys Oct 6, 2021
a4066f2
Register firestorm into arm64 Metaconfig
xrq-phys Oct 6, 2021
1e32003
Revert __has_include(). Distinguish w/ BLIS_FAMILY_**
xrq-phys Oct 6, 2021
2604f40
Config ArmSVE Unregister 12xk. Move 12xk to Old
xrq-phys Oct 6, 2021
70b52ca
Enable testing 1m in `make check`.
devinamatthews Oct 7, 2021
f44149f
Armv8 Trash New Bulk Kernels
xrq-phys Oct 7, 2021
2329d99
Update Travis CI badge
devinamatthews Oct 7, 2021
4277fec
Merge pull request #533 from xrq-phys/arm64-hi-bw
devinamatthews Oct 7, 2021
49b9d79
Arm SVE Add ZGEMM 2Vx8 Unindexed
xrq-phys Sep 13, 2021
e13abde
Arm SVE Add ZGEMM 2Vx7 Unindexed
xrq-phys Sep 14, 2021
c19db2f
Arm SVE Add ZGEMM 2Vx10 Unindexed
xrq-phys Sep 15, 2021
3f68e83
Arm SVE ZGEMM Support Gather Load / Scatt. St.
xrq-phys Sep 15, 2021
b677e0d
Arm SVE Add SGEMM 2Vx10 Unindexed
xrq-phys Sep 15, 2021
e4cabb9
Arm SVE Typo Fix ZGEMM/CGEMM C Prefetch Reg
xrq-phys Sep 15, 2021
f7c6c2b
A64FX Config Use ZGEMM/CGEMM
xrq-phys Sep 15, 2021
9e1e781
Arm SVE ZGEMM 2Vx10 Unindex Process Alpha=1.0
xrq-phys Sep 19, 2021
66a018e
Arm SVE CGEMM 2Vx10 Unindex Process Alpha=1.0
xrq-phys Sep 19, 2021
f76ea90
Arm SVE: Update Perf. Graph
xrq-phys Sep 21, 2021
4b648e4
Arm SVE Config armsve Use ZGEMM/CGEMM
xrq-phys Sep 22, 2021
1749dfa
Arm SVE C/ZGEMM Support *beta==0
xrq-phys Oct 8, 2021
82b6128
SH Kernel Unused Eigher
xrq-phys Oct 8, 2021
ccf1628
Arm SVE C/ZGEMM Fix FMOV 0 Mistake
xrq-phys Oct 8, 2021
408906f
Merge pull request #542 from xrq-phys/armsve-zgemm
devinamatthews Oct 9, 2021
32a6d93
Merge pull request #543 from xrq-phys/armsve-packm-fix
devinamatthews Oct 9, 2021
327481a
Fix insufficient pool-growing logic in bli_pool.c. (#559)
hominhquan Oct 12, 2021
81e1034
Alloc at least 1 elem in pool_t block_ptrs. (#560)
hominhquan Oct 13, 2021
e9da642
Allow use of 1m with mixing of row/col-pref ukrs.
fgvanzee Oct 13, 2021
514fd10
Fixed substitution bug in configure.
fgvanzee Oct 14, 2021
290ff4b
Disable SDE testing of old AMD microarchitectures.
fgvanzee Oct 14, 2021
e8caf20
Updated do_sde.sh to get SDE from GitHub.
fgvanzee Oct 18, 2021
f065a80
Removed support for 3m, 4m induced methods.
fgvanzee Oct 28, 2021
cfa3db3
Fixed bug in mixed-dt gemm introduced in e9da642.
fgvanzee Nov 3, 2021
28b0982
Refactored her[2]k/syr[2]k in terms of gemmt. (#531)
devinamatthews Nov 10, 2021
7bc8ab4
Added BLAS/CBLAS APIs for axpby, gemm_batch. (#566)
Meghana-vankadari Nov 11, 2021
7bde468
Added support for addons.
fgvanzee Nov 13, 2021
78cd1b0
Added 'Example Code' section to README.md.
fgvanzee Nov 16, 2021
cbc88fe
Marked some markdown shell code blocks as 'bash'.
fgvanzee Nov 16, 2021
74c0c62
Reverted cbc88fe.
fgvanzee Nov 16, 2021
26e4b6b
Added support for AMD's Zen3 microarchitecture.
dzambare Nov 17, 2021
9be97c1
Support all four dts in test/test_her[2][k].c (#578)
madanm3 Nov 17, 2021
b727645
Merge branch 'dev'
fgvanzee Nov 19, 2021
a4bc03b
Brief mention/link to Addons.md in README.md.
fgvanzee Nov 19, 2021
12c66a4
Minor updates to README.md, docs/Addons.md.
fgvanzee Nov 19, 2021
e229e04
Added recu-sed.sh script to 'build' directory.
fgvanzee Dec 1, 2021
cf7d616
Enable user-customized packm ukernel/variant. (#549)
devinamatthews Dec 2, 2021
961d9d5
Re-add BLIS_ENABLE_ZEN_BLOCK_SIZES macro for 'zen'.
kvaragan Dec 7, 2021
54fa28b
Move edge cases to gemm ukr; more user-custom mods. (#583)
devinamatthews Dec 24, 2021
08174a2
Evict <arm_sve.h> Requirement for SVE GEMM
xrq-phys Jan 1, 2022
466b68a
Add unique tag to branch labels for Apple ARM64.
devinamatthews Jan 2, 2022
864bfab
CREDITS file update.
fgvanzee Jan 4, 2022
3f2440b
Added m, n dims to gemmd/gemmlike ukernel calls.
fgvanzee Jan 6, 2022
268ce1f
Relax alignment constraints
devinamatthews Jan 10, 2022
81f93be
Fix row-/column-major pref. in 16x8 haswell sgemm ukr (unused)
devinamatthews Jan 10, 2022
0ab20c0
the Apple local label thing is required by Clang in general
jeffhammond Jan 13, 2022
0be9282
Updated zen3 macro constant names.
fgvanzee Jan 26, 2022
35195bb
Add armclang detection to configure.
devinamatthews Jan 31, 2022
b5df181
Armv8a, ArmSVE: Simplify Gen-C
xrq-phys Feb 2, 2022
9cc897f
Fix SVE Compil.
xrq-phys Feb 3, 2022
72089bb
ArmSVE Use Predicate in M-Direction
xrq-phys Feb 5, 2022
2f3872e
ArmSVE Adopts Label Wrapper
xrq-phys Feb 7, 2022
2674291
Update CC_VENDOR logic
devinamatthews Feb 13, 2022
5a4d3f5
Use -flat_namespace option to link on macOS
devinamatthews Feb 13, 2022
2506159
Don't use `-Wl,-flat-namespace`.
devinamatthews Feb 14, 2022
ee9ff98
Move edge cases to gemmtrsm ukrs; doc updates.
fgvanzee Feb 15, 2022
c9700f3
Renamed SIMD-related macro constants for clarity.
fgvanzee Feb 15, 2022
4d83523
Add armsve to arm64 Metaconfig (#614)
xrq-phys Feb 22, 2022
d514658
ArmSVE Ensure Non-zero Block Size (#615)
xrq-phys Feb 22, 2022
84732bf
Revamp how tools are handled/checked by configure.
fgvanzee Feb 28, 2022
71851a0
Fixed level-3 performance bug in haswell ukernels.
fgvanzee Mar 8, 2022
cad1041
POWER10: edge cases in microkernel (#620)
ivan23kor Mar 10, 2022
7c07b47
Avoid gemmsup barriers when not packing A or B. (#622)
fgvanzee Mar 11, 2022
f1dbb0e
Trival whitespace change; commit log addendum.
fgvanzee Mar 11, 2022
d681000
Update Multithreading.md
devinamatthews Mar 14, 2022
0db2bd5
Added BLAS/CBLAS APIs for gemm3m. (#590)
BhaskarNallani Mar 24, 2022
1ec020b
AMD kernel updates; frame-specific AMD updates. (#597)
dzambare Mar 29, 2022
cf06364
Fixed typo in BLAS gemm3m call to _check().
fgvanzee Mar 29, 2022
bee7678
CREDITS file update.
fgvanzee Mar 31, 2022
99bb900
ReleaseNotes.md update in advance of next version.
fgvanzee Apr 1, 2022
14c86f6
Version file update (0.9.0)
fgvanzee Apr 1, 2022
88cab83
CHANGELOG update (0.9.0)
fgvanzee Apr 1, 2022
69fa915
Fixed broken "tagged releases" link in README.md.
fgvanzee Apr 1, 2022
b3e674d
README.md update to link to releases page.
fgvanzee Apr 4, 2022
ae10d94
Simplify and rewrite reference packm kernels. (#610)
devinamatthews Apr 7, 2022
9fea633
Partial addition of 'const' to all interfaces above the (micro)kernel…
devinamatthews Apr 13, 2022
6431c9e
Added missing 'const' to zen bli_gemm_small.c.
fgvanzee Apr 14, 2022
1c73340
Fix version check for znver3, which needs gcc >= 10.3 (#628)
jedbrown Apr 28, 2022
64a9b06
Fixed misspelling of 'xpbys' in gemm macrokernel.
fgvanzee May 10, 2022
4603324
Init/finalize via bli_pthread_switch_t API (#634).
fgvanzee May 19, 2022
5677289
Added SMU citation to README.md intro.
fgvanzee Jun 1, 2022
d93df02
Removed unused dt arg in bli_gks_query_ind_cntx().
fgvanzee Jun 15, 2022
d429b6b
Support clang targetting MinGW (#639)
isuruf Jun 28, 2022
667f201
Fixed type bug in bli_cntx_set_ukr_prefs().
fgvanzee Jul 7, 2022
7cba7ce
Minor cleanups, comment updates to bli_gks.c.
fgvanzee Jul 8, 2022
ffde54c
Minor changes to .gitignore and LICENSE files. (#642)
jdiamondGitHub Jul 11, 2022
98d4678
Change complex_return='intel' for ifx. (#637)
bartoldeman Jul 11, 2022
9b1beec
Use BLIS_ENABLE_COMPLEX_RETURN_INTEL in blastest files (#636)
bartoldeman Jul 12, 2022
cc260fd
Allow uniform max problem sizes in test/3/runme.sh.
fgvanzee Jul 13, 2022
17b0caa
Fixed out-of-bounds read in haswell gemmsup kernels.
fgvanzee Jul 14, 2022
af3a41e
Add autodetection for POWER7, POWER9 & POWER10 (#647)
Flamefire Jul 21, 2022
6826c1c
Add `#line` directives to flattened `blis.h`. (#643)
devinamatthews Jul 25, 2022
4dde947
Fixed out-of-bounds bug in sup s6x16m haswell kernel.
fgvanzee Jul 26, 2022
56de31b
Disable modification of KC in the gemmsup kernels. (#648)
devinamatthews Jul 27, 2022
5b29893
Removed buggy cruft from power10 subconfig.
fgvanzee Jul 28, 2022
a48e29d
CREDITS file update.
fgvanzee Jul 28, 2022
bbaf29a
Very minor variable updates to common.mk.
fgvanzee Aug 4, 2022
775148b
Updated ARMv8a kernels to fix 2 prefetching issues. (#649)
jdiamondGitHub Aug 5, 2022
9e5594a
Temporarily disabled #line directives from 6826c1c.
fgvanzee Aug 11, 2022
dfa5413
Arm64 dgemmsup with extended MR&NR (#655)
xrq-phys Aug 30, 2022
a87eae2
Added '-q' quiet mode option to testsuite. (#657)
fgvanzee Sep 6, 2022
4afe0cf
Defined invscalv, invscalm, invscald operations. (#661)
fgvanzee Sep 8, 2022
6e5431e
Fix line number issue in flattened blis.h. (#660)
devinamatthews Sep 10, 2022
cb74202
Fixed incorrect sizeof(type) in edge case macros. (#662)
fgvanzee Sep 13, 2022
fd885cf
Use kernel CFLAGS for 'kernels' subdirs in addons. (#658)
fgvanzee Sep 13, 2022
05a811e
Initialize rntm_t nt/ways fields with 1 (not -1). (#663)
fgvanzee Sep 14, 2022
63177dc
Fixed gemmlike sandbox bug introduced in 7c07b47.
fgvanzee Sep 15, 2022
e86076b
Test the 'gemmlike' sandbox via AppVeyor. (#664)
fgvanzee Sep 15, 2022
fb91337
Fixed a harmless pc_nt bug in 05a811e.
fgvanzee Sep 16, 2022
89df7b8
De-templatized _sup_var1n2m.c; unified _sup_packm_a/b(). (#659)
devinamatthews Sep 18, 2022
a1a5a9b
Implemented support for fat multithreading. (#665)
fgvanzee Sep 21, 2022
036a4f9
Refactored some rntm_t management code. (#666)
fgvanzee Sep 22, 2022
ee81efc
Parameterized test/3 drivers via command line args. (#667)
fgvanzee Sep 23, 2022
b861c71
Add consistent NaN/Inf handling in sumsqv. (#668)
devinamatthews Sep 23, 2022
42d0e66
Add AddressSanitizer (-fsanitize=address) option. (#669)
devinamatthews Sep 29, 2022
63470b4
Fix some bugs in bli_pool.c (#670)
devinamatthews Sep 29, 2022
76a23bd
Reinstate sanity check in bli_pool_finalize. (#671)
devinamatthews Oct 3, 2022
9453e0f
CREDITS file update.
fgvanzee Oct 4, 2022
23f5b8d
Shuffled checked properties in bli_l3_check.c. (#676)
fgvanzee Oct 18, 2022
88105db
Added Discord documentation (#677)
fgvanzee Oct 21, 2022
2dd692b
Fix auto-detection of firestorm (Apple M1).
devinamatthews Oct 26, 2022
c803b03
Add check to disable armsve on Apple M1.
devinamatthews Oct 26, 2022
aeb5f0c
Omnibus PR - Oct 2023 (#678)
devinamatthews Oct 27, 2022
29f79f0
Fixed performance bug caused by redundant packing. (#680)
devinamatthews Oct 31, 2022
5eea6ad
Add mention of Wilkinson Prize to README.md. (#683)
fgvanzee Nov 2, 2022
edcc2f9
Support --nosup, --sup configure options. (#684)
fgvanzee Nov 3, 2022
872898d
Fixed trmm[3]/trsm performance bug in cf7d616. (#685)
fgvanzee Nov 3, 2022
6774bf0
Fix typo in configure --help text. (#686)
leekillough Nov 3, 2022
8d813f7
Some decluttering of the top-level directory.
fgvanzee Nov 4, 2022
713d078
Delete mpi_test garbage. (#689)
fgvanzee Nov 4, 2022
dc6e5f3
Enhance emacs formatting of C files to remove trailing whitespace and…
leekillough Nov 3, 2022
e1ea25d
Fixed subtle barrier_fpa bug in bli_thrcomm.c. (#690)
fgvanzee Nov 11, 2022
2b05948
blis support for hpx (#682)
ct-clmsn Nov 13, 2022
f0337b7
Trival whitespace/comment tweaks.
fgvanzee Nov 14, 2022
db10dd8
Fixed _gemm_small() prototype; disabled gemm_small.
fgvanzee Nov 30, 2022
4833ba2
Fixed perf of mt sup with packing, and mt gemmlike. (#696)
fgvanzee Dec 13, 2022
3accacf
Skip 1m optimization when forcing hemm_l/symm_l. (#697)
fgvanzee Dec 16, 2022
7d23dc2
Fix a race condition which manifested as incorrect results (rarely). …
devinamatthews Dec 26, 2022
538150c
Applied race condition fix to sup thread decorator.
fgvanzee Dec 26, 2022
f956b79
Switch to l3 sup decorator in gemmlike sandbox. (#704)
fgvanzee Jan 1, 2023
b6735ca
Refactor structure awareness in packm_blk_var1.c. (#707)
devinamatthews Jan 6, 2023
2e1ba9d
Tile-level partitioning in jr/ir loops (ex-trsm). (#695)
fgvanzee Jan 11, 2023
d220f9c
Fix k = 0 edge case in power10 microkernels (#706)
nisanthmp Jan 11, 2023
cdb22b8
Disable power10 kernels other than sgemm, dgemm. (#705)
nisanthmp Jan 11, 2023
38d88d5
Define new global scalar (obj_t) constants. (#703)
devinamatthews Jan 11, 2023
b895ec9
Fixing type-mismatch errors in power10 sandbox (#701)
nisanthmp Jan 11, 2023
9a366b1
Implement cntx_t pointer caching in gks. (#709)
fgvanzee Jan 12, 2023
16d2e9e
Defined lt, lte, gt, gte + misc. other updates. (#712)
fgvanzee Jan 14, 2023
5793a77
Fixed mis-mapped instruction for VEXTRACTF64X2. (#713)
HarshDave12 Jan 17, 2023
c334ec2
Merge tlb- and slab/rr-specific gemm macrokernels. (#711)
devinamatthews Jan 18, 2023
ecbcf40
Use here-document for 'configure --help' output. (#714)
leekillough Jan 19, 2023
dc5d00a
Typecast printf() args to avoid compiler warnings. (#716)
leekillough Jan 27, 2023
e730c68
Define `BLIS_VERSION_STRING` in `blis.h`. (#720)
fgvanzee Feb 6, 2023
e3d352f
Added runtime selection of 'power' config family. (#718)
nisanthmp Feb 8, 2023
b1d3fc7
Redirect grep stderr to /dev/null. (#723)
fgvanzee Feb 10, 2023
0b421ef
Added an 'arm64' entry to `.travis.yml`. (#726)
fgvanzee Feb 18, 2023
059f151
Updated hpx namespace for make_count_shape. (#725)
ct-clmsn Feb 18, 2023
0ba6e9e
Refined emacs handling of indentation. (#717)
leekillough Feb 18, 2023
4e18cd3
Restored ArmSVE general storage case. (#708)
xrq-phys Feb 18, 2023
93c63d1
Use 'const' pointers in kernel APIs. (#722)
fgvanzee Feb 20, 2023
fab18dc
Use 'void*' datatypes in kernel APIs. (#727)
fgvanzee Feb 22, 2023
60f3634
Fixed bugs in scal2v ref kernel when alpha == 1. (#728)
fgvanzee Feb 23, 2023
72c37eb
Updated configure to pass all shellcheck checks. (#729)
leekillough Mar 23, 2023
5f84130
Omit -fPIC if shared library build is disabled. (#732)
fgvanzee Mar 25, 2023
04090df
Fixed compile errors with `BLIS_DISABLE_BLAS_DEFS`. (#730)
fgvanzee Mar 27, 2023
9d778e0
Move -fPIC insertion to subconfigs' make_defs.mk. (#738)
fgvanzee Mar 29, 2023
17cd260
Added mm_algorithm pptx files (bp and pb).
fgvanzee Mar 30, 2023
38fc523
Added mm_algorithm pdf files (bp and pb).
fgvanzee Mar 30, 2023
3f1432a
Add output.testsuite to .gitignore (#736)
leekillough Apr 3, 2023
aea8e1d
Optionally disable thread-local storage. (#735)
fgvanzee Apr 3, 2023
259f684
CREDITS file update.
fgvanzee Apr 7, 2023
593d017
CREDITS file update.
fgvanzee Apr 8, 2023
6b38c5a
Add RISC-V target (#693)
angsch Apr 11, 2023
8215b02
Apply #738 to make_defs.mk of RISC-V subconfigs. (#740)
leekillough Apr 12, 2023
6fd9aab
Fix bug in detecting Fortran compiler vendor (#745)
devinamatthews May 5, 2023
ef9d3e6
Added missing #include <io.h> for Windows. (#747)
h-vetinari May 6, 2023
0873c0f
Consolidate INSERT_ macro sets via variadic macros. (#744)
devinamatthews May 7, 2023
138de3b
add nvhpc compiler support (#719)
ajaypanyala May 7, 2023
89b7863
Fix 1m enablement for herk/her2k/syrk/syr2k. (#743)
devinamatthews May 8, 2023
d639554
Pad thrcomm_t fields to avoid false sharing.
fgvanzee Jun 7, 2023
6b894c3
Rewrote/fixed broken tree barrier implementation.
fgvanzee Jun 12, 2023
a0b04e3
Rewrote regen-symbols.sh (gen-libblis-symbols.sh). (#751)
fgvanzee Jun 26, 2023
c91b41d
Auto-detect the RISC-V ABI of the compiler and use -mabi= during RISC…
leekillough Jul 26, 2023
22ad8c1
Small fixes to support hpx in the testsuite (#759)
ct-clmsn Jul 27, 2023
2db31e0
Exclude -lrt on Android with Bionic libraries. (#755)
leekillough Jul 27, 2023
915daaa
Fix typos in docs + example code comments. (#753)
jip Jul 27, 2023
dbc7981
CREDITS file update.
fgvanzee Jul 28, 2023
3cf17b4
Small fixes/improvements to docs/Multithreading.md. (#764)
fgvanzee Aug 7, 2023
634e532
Set thrcomm timpl_t id inside init functions. (#766)
fgvanzee Aug 10, 2023
fa6a9b2
Fixed error when using common.mk from testsuite. (#768)
fgvanzee Aug 19, 2023
6dcf766
Revamped bli_init() to use TLS where feasible. (#767)
fgvanzee Aug 27, 2023
c6546c1
Fixed broken link in Multithreading.md. (#774)
jmather-sesi Sep 20, 2023
a4a6329
Fixes to HPC runtime code path. (#773)
srinivasyadav18 Sep 26, 2023
6f41220
Added 'altra', 'altramax' subconfigs. (#775)
fgvanzee Sep 26, 2023
37ca4fd
Implemented [cz]symv_(), [cz]syr_(), [cz]rot_(). (#778)
fgvanzee Sep 28, 2023
c2099ed
Fixed brokenness when sba is disabled. (#777)
fgvanzee Oct 2, 2023
1e264a4
Update zen3 subconfig to support NVHPC compilers. (#779)
abagusetty Oct 2, 2023
8fff1e3
Fixed bug in sup threshold registration. (#782)
fgvanzee Oct 12, 2023
7a87e57
Fixed HPX barrier synchronization (#783)
srinivasyadav18 Oct 14, 2023
05388dd
Added 'sifive_x280' subconfig, kernel set. (#737)
Aaron-Hutchinson Nov 3, 2023
f7ce54a
CREDITS file update.
fgvanzee Nov 3, 2023
8109d18
Install helper headers to INCDIR prefix. (#787)
fgvanzee Dec 12, 2023
4e68494
Fixed random segfault in test/3 drivers. (#788)
fgvanzee Dec 12, 2023
968c9be
Include bli_config.h before bli_system.h in cblas.h. (#789)
fgvanzee Dec 12, 2023
5eff5f9
Allow test/3 drivers to use default ind_t method. (#804)
fgvanzee Apr 30, 2024
7d48631
Use "-i auto" by default in test/3 drivers.
fgvanzee Apr 30, 2024
49af224
ReleaseNotes.md update.
fgvanzee May 6, 2024
06d9880
Add no_skx builds
honnibal Mar 13, 2019
0d089ae
Use config_registry from v0.9.1
honnibal Dec 12, 2024
0b72c75
Import bli_arch changes from v0.9.1 branch
danieldk May 6, 2022
870c58f
Cherry-pick arch changes from v0.9.1 branch
danieldk Apr 4, 2022
bcfd034
Merge branch 'master' into blis-1.0
honnibal Dec 12, 2024
1bdac24
Fix syntax
honnibal Dec 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
12 changes: 11 additions & 1 deletion .appveyor.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
skip_branch_with_pr: true

environment:
matrix:
- LIB_TYPE: shared
Expand All @@ -21,6 +23,12 @@ environment:
CC: clang
THREADING: openmp

- LIB_TYPE: static
CONFIG: auto
CC: clang
THREADING: openmp
SANDBOX: yes

install:
- set "PATH=C:\msys64\mingw64\bin;C:\msys64\bin;%PATH%"
- if [%CC%]==[clang] set "PATH=C:\Program Files\LLVM\bin;%PATH%"
Expand All @@ -32,17 +40,19 @@ build_script:
- if [%LIB_TYPE%]==[shared] set "CONFIGURE_OPTS=%CONFIGURE_OPTS% --enable-shared --disable-static"
- if [%LIB_TYPE%]==[static] set "CONFIGURE_OPTS=%CONFIGURE_OPTS% --disable-shared --enable-static"
- if not [%CBLAS%]==[no] set "CONFIGURE_OPTS=%CONFIGURE_OPTS% --enable-cblas"
- if [%SANDBOX%]==[yes] set "CONFIGURE_OPTS=%CONFIGURE_OPTS% -s gemmlike"
- set RANLIB=echo
- set LIBPTHREAD=
- set "PATH=%PATH%;C:\blis\lib"
- set "CFLAGS=-Wno-macro-redefined"
- bash -lc "cd /c/projects/blis && ./configure %CONFIGURE_OPTS% --enable-threading=%THREADING% --enable-arg-max-hack --prefix=/c/blis %CONFIG%"
- bash -lc "cd /c/projects/blis && mingw32-make -j4 V=1"
- bash -lc "cd /c/projects/blis && mingw32-make install"
- ps: Compress-Archive -Path C:\blis -DestinationPath C:\blis.zip
- 7z a C:\blis.zip C:\blis
- ps: Push-AppveyorArtifact C:\blis.zip

test_script:
# "make checkblas" does not work with shared linking Windows due to inability to override xerbla_
- if [%LIB_TYPE%]==[shared] set "TEST_TARGET=checkblis-fast"
- if [%LIB_TYPE%]==[static] set "TEST_TARGET=check"
- bash -lc "cd /c/projects/blis && mingw32-make %TEST_TARGET% -j4 V=1"
Expand Down
32 changes: 32 additions & 0 deletions .dir-locals.el
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
;; Emacs formatting for the BLIS layout requirements.

(
;; Recognize *.mk files as Makefile fragments
(auto-mode-alist . (("\\.mk\\'" . makefile-mode)) )

;; Makefiles require tabs and are almost always width 8
(makefile-mode . (
(indent-tabs-mode . t)
(tab-width . 8)
)
)

;; C code formatting roughly according to docs/CodingConventions.md
(c-mode . (
(c-file-style . "bsd")
(c-basic-offset . 4)
(comment-start . "// ")
(comment-end . "")
(parens-require-spaces . nil)
)
)

;; Default formatting for all source files not overriden above
(prog-mode . (
(indent-tabs-mode . nil)
(tab-width . 4)
(require-final-newline . t)
(eval add-hook `before-save-hook `delete-trailing-whitespace)
)
)
)
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@

config.mk
bli_config.h
bli_addon.h

# -- monolithic headers --

Expand All @@ -44,6 +45,15 @@ include/*/*.h

# BLIS testsuite output file
output.testsuite
output.testsuite.*

# BLAS test output files
out.*

# GTAGS database
GPATH
GRTAGS
GTAGS

# Mac DS.store files
.DS_Store
137 changes: 93 additions & 44 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,83 +1,132 @@
language: c
sudo: required
dist: trusty
env:
global:
secure: "Ty3PM1xGhXwxfJG6YyY9bUZyXzw98ekHxQEqU9VnrMXTZb28IxfocPCXHjL34r9HTGosO5Pmierhal1Cs3ZKE5ZAJqJhCfck+kwlH21Uay5CNYglDtSmy2qxtbbDG4AxpEZ1UKlIZr1pNh/x+pRemSmnMEnQp/E7QJqdkhm4+aMX2bWKyLPtrdL+B9QXLVT2nT6/Fw3i05aBhpcFJpSPfvYX2KoCZYdJOSKcKci4T8nAfP/c0olkz+jAkBZxZFgO9Ptrt/lvHtVPrkh5o29GvHg2i/4vucbsMltoxlV31/2eYpdr17Ngtt41MMVn2fHV4lVhLmENc04nlm084fBtg73T6b8hNy5JlcA44xI/UrPJsQAJ+0A0ds9BbBQKPxOmaF/O8WGXhwiwdKT6DGS9lj05f3S+yZfeNE3pQhLEcvwXLO5SW3VvKXMj0t/lZyG+XCkvFjD7KEPQV4g+BZc2zzD9TwDx3ydn8Uzd6zZlq1erQUzCnODP24wuwfrNP8nqxFYG0VtI8oZW62IC9U2hcnAF5QNXXW3yDYD65k3BHbigfI28gu9iO9G8RxOglR27J7Whdqkqw3AMRaqyHt2tdbz7tM2dLZ0EatT5m8esjC+LP4EshW9C59jP2U9vJ/94YEgOfwiqk8+e6fL/7dJvOumbwu1RclRI9DS88PPYb3Q="
dist: focal
branches:
only:
- master
- dev
- amd
matrix:
include:
# full testsuite (all tests except for mixed datatype)
# full testsuite (all tests + mixed datatype (gemm_nn only) + salt + SDE + OOT)
- os: linux
compiler: gcc
env: OOT=0 TEST=1 SDE=0 THR="none" CONF="auto"
# mixed-datatype testsuite (gemm_nn only)
- os: linux
compiler: gcc
env: OOT=0 TEST=MD SDE=0 THR="none" CONF="auto"
# salt testsuite (fast set of operations+parameters)
- os: linux
compiler: gcc
env: OOT=0 TEST=SALT SDE=0 THR="none" CONF="auto"
# test x86_64 ukrs with SDE
- os: linux
compiler: gcc
env: OOT=0 TEST=0 SDE=1 THR="none" CONF="x86_64"
env: OOT=1 TEST=ALL SDE=1 THR="none" CONF="x86_64" \
PACKAGES="gcc-9 binutils"
# openmp build
- os: linux
compiler: gcc
env: OOT=0 TEST=0 SDE=0 THR="openmp" CONF="auto"
env: OOT=0 TEST=FAST SDE=0 THR="openmp" CONF="auto" \
PACKAGES="gcc-9 binutils"
# pthreads build
- os: linux
compiler: gcc
env: OOT=0 TEST=0 SDE=0 THR="pthreads" CONF="auto"
# out-of-tree build
- os: linux
compiler: gcc
env: OOT=1 TEST=0 SDE=0 THR="none" CONF="auto"
env: OOT=0 TEST=FAST SDE=0 THR="pthreads" CONF="auto" \
PACKAGES="gcc-9 binutils"
# clang build
- os: linux
compiler: clang
env: OOT=0 TEST=0 SDE=0 THR="none" CONF="auto"
env: OOT=0 TEST=FAST SDE=0 THR="none" CONF="auto"
# There seems to be some difficulty installing two Clang toolchains of
# different versions.
# Use the TravisCI default.
# PACKAGES="clang-8 binutils"
# macOS with system compiler (clang)
- os: osx
compiler: clang
env: OOT=0 TEST=1 SDE=0 THR="none" CONF="auto"
env: OOT=0 TEST=FAST SDE=0 THR="none" CONF="auto"
# cortexa15 build and fast testsuite (qemu)
- os: linux
compiler: arm-linux-gnueabihf-gcc
env: OOT=0 TEST=FAST SDE=0 THR="none" CONF="cortexa15" \
PACKAGES="gcc-arm-linux-gnueabihf qemu-system-arm qemu-user" \
CC=arm-linux-gnueabihf-gcc CXX=arm-linux-gnueabihf-g++ \
PACKAGES="gcc-arm-linux-gnueabihf g++-arm-linux-gnueabihf libc6-dev-armhf-cross qemu-system-arm qemu-user" \
TESTSUITE_WRAPPER="qemu-arm -cpu cortex-a15 -L /usr/arm-linux-gnueabihf/"
# cortexa57 build and fast testsuite (qemu)
- os: linux
compiler: aarch64-linux-gnu-gcc
env: OOT=0 TEST=FAST SDE=0 THR="none" CONF="cortexa57" \
PACKAGES="gcc-aarch64-linux-gnu qemu-system-arm qemu-user" \
CC=aarch64-linux-gnu-gcc CXX=aarch64-linux-gnu-g++ \
PACKAGES="gcc-aarch64-linux-gnu g++-aarch64-linux-gnu libc6-dev-arm64-cross qemu-system-arm qemu-user" \
TESTSUITE_WRAPPER="qemu-aarch64 -L /usr/aarch64-linux-gnu/"
# Apple M1 (firestorm) build and fast testsuite (qemu)
- os: linux
compiler: aarch64-linux-gnu-gcc
env: OOT=0 TEST=FAST SDE=0 THR="none" CONF="firestorm" \
CC=aarch64-linux-gnu-gcc CXX=aarch64-linux-gnu-g++ \
PACKAGES="gcc-aarch64-linux-gnu g++-aarch64-linux-gnu libc6-dev-arm64-cross qemu-system-arm qemu-user" \
TESTSUITE_WRAPPER="qemu-aarch64 -L /usr/aarch64-linux-gnu/"
# armsve build and fast testsuite (qemu)
- os: linux
compiler: aarch64-linux-gnu-gcc-10
env: OOT=0 TEST=FAST SDE=0 THR="none" CONF="armsve" \
CC=aarch64-linux-gnu-gcc-10 CXX=aarch64-linux-gnu-g++-10 \
PACKAGES="gcc-10-aarch64-linux-gnu g++-10-aarch64-linux-gnu libc6-dev-arm64-cross qemu-system-arm qemu-user" \
TESTSUITE_WRAPPER="qemu-aarch64 -cpu max,sve=true,sve512=true -L /usr/aarch64-linux-gnu/"
# arm64 build and fast testsuite (qemu)
# NOTE: This entry omits the -cpu flag so that while both NEON and SVE kernels
# are compiled, only NEON kernels will be tested. (h/t to RuQing Xu)
- os: linux
compiler: aarch64-linux-gnu-gcc-10
env: OOT=0 TEST=FAST SDE=0 THR="none" CONF="arm64" \
CC=aarch64-linux-gnu-gcc-10 CXX=aarch64-linux-gnu-g++-10 \
PACKAGES="gcc-10-aarch64-linux-gnu g++-10-aarch64-linux-gnu libc6-dev-arm64-cross qemu-system-arm qemu-user" \
TESTSUITE_WRAPPER="qemu-aarch64 -L /usr/aarch64-linux-gnu/"
# The RISC-V targets require the qemu version available in jammy or newer.
# When CI is upgraded, the packages should be activated and do_script.sh
# cleaned up.
# PACKAGES="qemu-user qemu-user-binfmt"
- os: linux
compiler: riscv64-unknown-linux-gcc
env: OOT=0 TEST=FAST SDE=0 THR="none" BLD="--disable-shared" CONF="rv64iv" \
CC=riscv64-unknown-linux-gnu-gcc \
LDFLAGS=-static
- os: linux
compiler: riscv32-unknown-linux-gcc
env: OOT=0 TEST=FAST SDE=0 THR="none" BLD="--disable-shared" CONF="rv32iv" \
CC=riscv32-unknown-linux-gnu-gcc \
LDFLAGS=-static
- os: linux
compiler: clang
env: OOT=0 TEST=FAST SDE=0 THR="none" BLD="--disable-shared" CONF="sifive_x280" \
CC=clang \
LDFLAGS=-static
install:
- if [ "$TRAVIS_OS_NAME" = "linux" ]; then sudo rm -f /usr/bin/as; fi
- if [ "$TRAVIS_OS_NAME" = "linux" ]; then sudo ln -s /usr/lib/binutils-2.26/bin/as /usr/bin/as; fi
- if [ "$TRAVIS_OS_NAME" = "linux" ]; then sudo rm -f /usr/bin/ld; fi
- if [ "$TRAVIS_OS_NAME" = "linux" ]; then sudo ln -s /usr/lib/binutils-2.26/bin/ld /usr/bin/ld; fi
- if [ "$CC" = "gcc" ] && [ "$TRAVIS_OS_NAME" = "linux" ]; then export CC="gcc-6"; fi
- if [ -n "$PACKAGES" ]; then sudo apt-get install -y $PACKAGES; fi
addons:
apt:
sources:
- ubuntu-toolchain-r-test
packages:
- gcc-6
- binutils-2.26
- clang
- if [ "$CC" = "gcc" ] && [ "$TRAVIS_OS_NAME" = "linux" ]; then export CC="gcc-9"; fi
- if [ -n "$PACKAGES" ] && [ "$TRAVIS_OS_NAME" = "linux" ]; then sudo apt-get install -y $PACKAGES; fi
script:
- export DIST_PATH=.
- pwd
- if [ $OOT -eq 1 ]; then export DIST_PATH=`pwd`; mkdir ../oot; cd ../oot; chmod -R a-w $DIST_PATH; fi
- pwd
- $DIST_PATH/configure -t $THR CC=$CC $CONF
- if [ "$CONF" = "rv64iv" ]; then
$DIST_PATH/travis/do_riscv.sh "$CONF";
export CC=$DIST_PATH/../toolchain/riscv/bin/riscv64-unknown-linux-gnu-gcc;
export CXX=$DIST_PATH/../toolchain/riscv/bin/riscv64-unknown-linux-gnu-g++;
export TESTSUITE_WRAPPER="$DIST_PATH/../toolchain/qemu-riscv64 -cpu rv64,vext_spec=v1.0,v=true,vlen=128 -B 0x100000";
fi
- if [ "$CONF" = "rv32iv" ]; then
$DIST_PATH/travis/do_riscv.sh "$CONF";
export CC=$DIST_PATH/../toolchain/riscv/bin/riscv32-unknown-linux-gnu-gcc;
export CXX=$DIST_PATH/../toolchain/riscv/bin/riscv32-unknown-linux-gnu-g++;
export TESTSUITE_WRAPPER="$DIST_PATH/../toolchain/qemu-riscv32 -cpu rv32,vext_spec=v1.0,v=true,vlen=128 -B 0x100000";
fi
- if [ "$CONF" = "sifive_x280" ]; then
$DIST_PATH/travis/do_riscv.sh "$CONF";
export CC=$DIST_PATH/../toolchain/riscv/bin/clang;
export CXX=$DIST_PATH/../toolchain/riscv/bin/clang++;
export TESTSUITE_WRAPPER="$DIST_PATH/../toolchain/qemu-riscv64 -cpu rv64,vext_spec=v1.0,v=true,vlen=512 -B 0x100000";
fi
- $DIST_PATH/configure -p `pwd`/../install -t $THR $BLD CC=$CC $CONF
- pwd
- ls -l
- $CC --version
- $CC -v
- make -j 2
- make install
- if [ "$BLD" = "" ]; then $DIST_PATH/travis/cxx/cxx-test.sh $DIST_PATH $(ls -1 include); fi
# Qemu SVE is failing sgemmt in some cases. Skip as this issue is not observed
# on real chip (A64fx).
- if [ "$CONF" = "armsve" ]; then sed -i 's/.*\<gemmt\>.*/0/' $DIST_PATH/testsuite/input.operations.fast; fi
- if [ "$TEST" != "0" ]; then travis_wait 30 $DIST_PATH/travis/do_testsuite.sh; fi
- if [ $SDE -eq 1 ] && [ "$TRAVIS_PULL_REQUEST" = "false" ] ; then travis_wait 30 $DIST_PATH/travis/do_sde.sh; fi
- if [ "$SDE" = "1" ]; then travis_wait 30 $DIST_PATH/travis/do_sde.sh; fi
Loading