-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME
769 lines (617 loc) · 35.1 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
NAME
Inline::CUDA - Inline NVIDIA's CUDA code and GPU processing from within
any Perl script.
VERSION
Version 0.15
SYNOPSIS
WARNING: see section "INSTALLATION" for how to install this package.
WARNING: prior to installation, please install
https://github.com/hadjiprocopis/perl-nvidia2-ml
Inline::CUDA is a module that allows you to write Perl subroutines in C
or C++ with CUDA extensions.
Similarly to Inline::C, Inline::CUDA is not meant to be used directly
but rather in this way:
#Firstly, specify some configuration options:
use Inline CUDA => Config =>
# optionally specify some options,
# you don't have to
# if they are already stored in a configuration file
# which is consulted before running any CUDA program
# host_compiler_bindir => '/usr/local/gcc82/bin',
# cc => '/usr/local/gcc82/bin/gcc82',
# cxx => '/usr/local/gcc82/bin/g++82',
# ld => '/usr/local/gcc82/bin/gcc82',
# nvcc => '/usr/local/cuda/bin/nvcc',
# nvld => '/usr/local/cuda/bin/nvcc',
# pass options to nvcc:
# this is how to deal with unknown compiler flags passed on to nvcc: pass them all to gcc
# only supported in nvcc versions 11+
# nvccflags => '--forward-unknown-to-host-compiler',
# do not check compiler version, use whatever the user wants
# nvccflags => '--allow-unsupported-compiler',
# this will use CC or CXX depending on the language specified here
# you can use C++ in your CUDA code, and there are tests in t/*
# which check if c or c++ and show how to do this:
host_code_language => 'c', # or 'c++' or 'cpp', case insensitive, see also cxx =>
# optional extra Include and Lib dirs
#inc => '-I...',
#libs => '-L... -l...',
# for debugging
BUILD_NOISY => 1,
# code will be left in ./_Inline/build/ after successful build
clean_after_build => 0,
warnings => 10,
;
# and then, suck in code from __DATA__ and run it at runtime
# notice that Inline->use(CUDA => <<'EOCUDA') is run at compiletime
my $codestr;
{ local $/ = undef; $codestr = <DATA> }
Inline->bind( CUDA => $codestr );
if( do_add() ){ die "error running do_add()..." }
1;
__DATA__
/* this is C code with CUDA extensions */
#include <stdio.h>
#define N 1000
/* This is the CUDA Kernel which nvcc compiles: */
__global__
void add(int *a, int *b) {
int i = blockIdx.x;
if (i<N) b[i] = a[i]+b[i];
}
/* this function can be called from Perl.
It returns 0 on success or 1 on failure.
This simple code does not support passing parameters in,
which is covered elsewhere.
*/
int do_add() {
cudaError_t err;
// Create int arrays on the CPU.
// ('h' stands for "host".)
int ha[N], hb[N];
// Create corresponding int arrays on the GPU.
// ('d' stands for "device".)
int *da, *db;
if( (err=cudaMalloc((void **)&da, N*sizeof(int))) != cudaSuccess ){
fprintf(stderr, "do_add(): error, call to cudaMalloc() has failed for %zu bytes for da: %s\n",
N*sizeof(int), cudaGetErrorString(err)
);
return 1;
}
if( (err=cudaMalloc((void **)&db, N*sizeof(int))) != cudaSuccess ){
fprintf(stderr, "do_add(): error, call to cudaMalloc() has failed for %zu bytes for db: %s\n",
N*sizeof(int), cudaGetErrorString(err)
);
return 1;
}
// Initialise the input data on the CPU.
for (int i = 0; i<N; ++i) ha[i] = i;
// Copy input data to array on GPU.
if( (err=cudaMemcpy(da, ha, N*sizeof(int), cudaMemcpyHostToDevice)) != cudaSuccess ){
fprintf(stderr, "do_add(): error, call to cudaMemcpy(cudaMemcpyHostToDevice) has failed for %zu bytes for ha->da: %s\n",
N*sizeof(int), cudaGetErrorString(err)
);
return 1;
}
// Launch GPU code with N threads, one per array element.
add<<<N, 1>>>(da, db);
if( (err=cudaGetLastError()) != cudaSuccess ){
fprintf(stderr, "do_add(): error, failed to launch the kernel into the device: %s\n",
cudaGetErrorString(err)
);
return 1;
}
// Copy output array from GPU back to CPU.
if( (err=cudaMemcpy(hb, db, N*sizeof(int), cudaMemcpyDeviceToHost)) != cudaSuccess ){
fprintf(stderr, "do_add(): error, call to cudaMemcpy(cudaMemcpyDeviceToHost) has failed for %zu bytes for db->ha: %s\n",
N*sizeof(int), cudaGetErrorString(err)
);
return 1;
}
//for (int i = 0; i<N; ++i) printf("%d\n", hb[i]); // print results
// Free up the arrays on the GPU.
if( (err=cudaFree(da)) != cudaSuccess ){
fprintf(stderr, "do_add(): error, call to cudaFree() has failed for da: %s\n",
cudaGetErrorString(err)
);
return 1;
}
if( (err=cudaFree(db)) != cudaSuccess ){
fprintf(stderr, "do_add(): error, call to cudaFree() has failed for db: %s\n",
cudaGetErrorString(err)
);
return 1;
}
return 0;
}
The statement: use Inline::CUDA => ...; is executed at compile-time.
Often this is not desirable because you may want to read code from
file, modify code at runtime or even auto-generate the inlined code at
runtime. In these situations Inline provides bind().
Here is how to inline code read at runtime from a file called
my_cruncher.cu, whose contents are exactly the same as the __DATA__
section in the previous example,
use Inline;
use File::Slurp;
my $data = read_file('my_cruncher.cu');
Inline->bind(CUDA => $data);
Using Inline->use(CUDA => "DATA") seems to have a problem when __DATA__
section contains identifiers enclosed in double underscores, e.g.
__global__ (this is a CUDA reserved keyword) one workaround is to
declare #define CUDA_GLOBAL __global__ and then replace all __global__
with CUDA_GLOBAL.
Sometimes, it is more convenient to configure Inline::CUDA not in a use
statement (as above) but in a require statement. The latter is executed
during the runtime of your script as opposed to loading the file during
compile time for the former. This has certain benefits as you can
enclose it in a conditional, eval or try/catch blocks. This is how
(thank you [email protected]
<https://perlmonks.org/?node_id=11159977>):
require Inline;
# configuration:
Inline->import(
CUDA => Config =>
ccflagsex => '...'
);
# compile your code:
Inline->import(
CUDA => $my_code
);
CUDA
The somewhat old news, at least since 2007, is that a Graphics
Processing Unit (GPU) has found uses beyond its traditional role in
calculating and displaying graphics to our computer monitor. This stems
from the fact that a GPU is a highly parallel computing machinery.
Similar to the operating system sending data and instructions to that
GPU frame-after-frame from the time it is booted in order to display
windows, widgets, transparent menus, spinning animations, video games
and visual effects, a developer can now send data and instructions to
the GPU for doing any sort of arithmetic calculation in a highly
parallel manner. Case in point is matrix multiplication where thousands
of GPU computing elements are processing the matrices' elements in
parallel. True parallelism, that is. As opposed to the emulated or
limited, by the number of cores, 2, 4, 8 for cheap desktops, CPU's
parallelism. It goes without saying that GPU processing is very
powerful and opens up to a new world of nunber-crunching possibilities
without the need for expensive super-computer capabilities.
NVIDIA's CUDA is "a parallel computing platform and programming model
that makes using a GPU for general purpose computing simple and
elegant" (from NVIDIA's site
<https://blogs.nvidia.com/blog/2012/09/10/what-is-cuda-2/>). In short,
we use CUDA to dispatch number-crunching code to a Graphics Processing
Unit (GPU) and then get the results back.
NVIDIA's CUDA comprises of a few keywords which can be inserted in C,
C++, Fortran, etc. code. In effect, developers still write programs in
their preferred language (C, C++ etc.) and whenever they need to access
the GPU they use the CUDA extensions. For more information check CUDA
Programming Guide
<https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html> .
A CUDA program is, therefore, a C or C++ program with a few CUDA
keywords added. Generally, compiling such a program is done by a CUDA
compiler, namely nvcc (nvidia cuda compiler) which, simplistically put,
splits the code in two parts, the CUDA part and the C part. The C part
is delegated to a C compiler, like gcc, and the CUDA part is handled by
nvcc. Finally nvcc links these components into an ordinary standalone
executable. For more information read CUDA Toolkit Documentation
<https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#cuda-source>
Notice that in NVIDIA jargon, a "device" is (one of) the GPU and "host"
is the CPU and the OS.
CAVEATS
In practice there are huge caveats which their conquering can be
surprisingly easy with some CLI magic. This is fine in Linux or even
OSX but for poor M$-windows victims, the same process can be painfully
tortuous and possibly ending to a mental breaker. As I don't belong to
that category I will not be able to help you with very specific
requests regarding the so-called OS.
And on to the caveats.
Does your GPU support CUDA?
First of all, not all GPUs support CUDA. But new NVIDIA ones usually do
and at a price of less or around 100 euros.
Different CUDA SDK exists for different hardware
Secondly, different GPUs have different "compute capability" requiring
different versions of the CUDA SDK, which provides the nvcc and
friends. For example my GeForce GTX 650 has a compute capability of 3.0
and that requires a SDK version of 10.2. That's the last SDK to support
a 3.x capability GPU. Currently, the SDK has reached version 11.4 and
supports compute capabilities of 3.5 to 8.6. See the Wikipedia article
on CUDA <https://en.wikipedia.org/wiki/CUDA#GPUs_supported> for what
GPUs are supported and by what CUDA SDK version.
CUDA compiler requires specific compiler version
Thirdly and most importantly, nvcc has specific and strict requirements
regarding the version of the "host compiler", for example, gcc/g++,
clang, cl.exe. See which compilers are supported at
Linux
<https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html>,
mac-OSX
<https://docs.nvidia.com/cuda/cuda-installation-guide-mac-os-x/index.html>,
Windows
<https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html>
For example, my GPU's compute capability (3.0) requires CUDA SDK
version 10.2 which requires gcc version less or equal to 8. Find out
what compiler your CUDA SDK supports in this ax3l's gist
<https://gist.github.com/ax3l/9489132>
There is a hack to stop nvcc checking compiler version and using
whatever compiler it is specified by the user. Simply pass
--allow-unsupported-compiler to nvcc and hope for the best. According
to CUDA Toolkit Documentation
<https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html>,
this flag has no effect in MacOS.
xt/30-check-basic-with-system-compiler.t shows how to tell Inline::CUDA
to use the system compiler and also tell nvcc to not check compiler
version. This test can fail in particular OS/versions. It seems to have
worked for my particular setting. With this option you are at least
safe from getting into trouble because of "Perl and XS objects with
mismatched compiler versions".
GPU Programming: memory transfers overheads
Additionally, general GPU programming, in practice, has quite some
caveats of its own that the potential GPU programmer must be aware of.
To start with, there are some quite large overheads associated with
sending data to the GPU and receiving it back. Because the memory
generally accessible to any program running on the CPU (e.g. the C-part
of the CUDA code) is not available to the GPU in the simple and elegant
manner C programmers take for granted when presented with a memory
pointer and read the memory space it points to. And vice versa. Memory
in the C-part of the code must be cudaMemcpy()'ed (the equivalent of
memcpy() for host-to-device and device-to-host data transfers) to the
GPU. And the results calculated in the GPU remain there until are
transfered back to host using another cudaMemcpy() call.
Add to this the overhead of copying the value of each item of a Perl
array into a C array which cudaMemcpy() understands and expects and you
get quite a significant overhead and a lot of paper-pushing for finally
getting the same block of data onto the GPU. And the same applies in
doing the reverse.
Here is a rough sketch of what memory transfers are required for
calling an Inline::CUDA function from Perl and doing GPU processing:
my @array = (1..5); # memory allocated for Perl array
inline_cuda_function(\@array, $result);
...
// now inside a Inline::CUDA code block
int inline_cuda_function(SV *in, SV *out){
// allocate memory for copying Perl array (in) to C
h_in = malloc(5*sizeof(int));
// allocate memory for holding the results on host
h_out = malloc(5*sizeof(int));
// allocate memory on the GPU for this same data
cudaMalloc((void **)&d_in, 5*sizeof(int));
// allocate memory on the GPU for the result
cudaMalloc((void **)&d_out, 5*sizeof(int));
// transfer Perl data onto host's C-array
AV *anAV = (AV *)SvRV(in);
for(int i=0;i<5;i++){
SV *anSV = *av_fetch((AV *)SvRV(anAV), i, FALSE);
h_in[i] = SvNV(anSV);
}
// and now transfer host's C-array onto the GPU
cudaMemcpy(d_in, h_in, 5*sizeof(int), cudaMemcpyHostToDevice);
// launch the kernel and do the processing onto the GPU
...
// extract results from the GPU onto host memory
cudaMemcpy(h_out, d_in, 5*sizeof(int), cudaMemcpyDeviceToHost);
// and now from host memory (the C array) onto Perl
// we have been passed a scalar, we create a new arrayref
// and place it to its RV slot
anAV = newAV();
av_extend(anAV, 5); // resize the Perl array to fit the result
// sv_setrv() is a macro created by LeoNerd, see above
// it places the new array we created onto the passed scalar (out)
sv_setrv(SvRV(out), (SV *)av);
for(int i=0;i<5;i++){
av_store(av, i, newSVnv(h_out[i]));
}
free(h_in); free(h_out);
cudaFree(d_in); cudaFree(d_out);
return 0; // success
}
There are some benchmarks in xt/benchmarks/*.b which compare the
performance of a small (size ~10x10), medium (size ~100x100) and large
(size ~1000x1000) data scenario for doing matrix multiplication (run
them with make benchmark). In my computer at least the pure-C,
CPU-hosted outperforms the GPU for the small, medium scenaria exactly
because of these overheads. But the GPU is a clear winner for large
data scenario.
See for example this particular benchmark:
xt/benchmarks/30-matrix-multiply.b
Perl and XS objects with mismatched compiler versions
Finally, there is an issue with compiling XS code, which is essentially
what Inline::CUDA does, with a compiler which is different to the
compiler current Perl is built with. This is the case when a special
host compiler had to be installed because of the CUDA SDK version. if
that's true then you are essentially loading XS code compiled with
gcc82 (as per the example in section "INSTALLATION") with a perl
executable which was compiled with system compiler, for example gcc11.
If that is really an issue then it will be insurmountable and the only
solution will be to perlbrew <https://perlbrew.pl/> a new Perl built
with the special host compiler, e.g. gcc82.
The manual on installing Perl
<https://metacpan.org/dist/perl/view/INSTALL#C-compiler> states that
specifying the compiler is as simple as sh Configure
-Dcc=/usr/local/gcc82/bin/gcc82
If you want to compile and install a new Perl using perlbrew
<https://perlbrew.pl/> then this will do it (thank you
[email protected] <https://perlmonks.org/?node_id=11159958>:
PERLBREW_CONFIGURE_FLAGS='-d -Dcc=/usr/local/gcc82/bin/gcc' perlbrew install 5.38.2 --as 5.38.2.gcc82
The -d is for not being asked trivial questions about the compilation
options and use sane defaults. The --as 5.38.2.gcc82 tells perlbrew
<https://perlbrew.pl/> to rename the new installed perl in case there
is already one with the same name.
INSTALLATION
Installation of Inline::CUDA is a nightmare because it depends on
external dependencies. It needs NVIDIA's CUDA SDK (providing nvcc (the
nvidia cuda compiler) which requires specific host compiler versions.
Which means that it is very likely that you will also need to install
in your system an older compiler compatible with nvcc version. Even if
your GPU supports the latest CUDA SDK version (at 11.4 as of July
2021), the maximum gcc version allowed with that is 10.21. Currently,
gcc is at version 11.2 and upgrades monthly.
Installing a "private" compiler, in Linux, can be easy or hard
depending whether the package manager allows it. Mine does not. See
"how-to-install-compiler" for instructions on how to do that on Linux
and label the new compiler with its own name so that one can have
system compiler and older compiler living in parallel and not
disturbing each other.
That said, there is a workaround: add this to pass the
--allow-unsupported-compiler flag to nvcc. This can be achieved via the
use Inline = Config => ...>, as below:
use Inline => Config =>
nvccflags => '--allow-unsupported-compiler',
... # other config options
;
... # Inline::CUDA etc.
The long and proper way of installing Inline::CUDA is described below.
So, if all goes Merfy you will have to install nvcc and an additional
host compiler gcc. The latter is not the most pleasant of experiences
in Linux. I don't know what's the situation with Windows. I can only
imagine the horror.
Here is a rough sketch of what one should do.
Find the NVIDIA GPU name+version you have installed on your hardware
kit. For example, GeForce GTX 650. This can be easy or hard.
If you already have the executable nvidia-smi installed or want to
install it (e.g. in Fedora CLI do dnf provides nvidia-smi and make
sure you have repo rpmfusion-nonfree enabled, somehow).
Install nvidia::ml and run the script I provide with Inline::CUDA at
scripts/nvidia-ml-test.pl
With the NVIDIA GPU name+version available search this Wikipedia
article <https://en.wikipedia.org/wiki/CUDA#GPUs_supported> for the
"compute capability" of the GPU. For example this is 3.0 for GeForce
GTX 650.
Use the "compute capability" of the GPU in order to find the CUDA SDK
version you must install in the same Wikipedia article
<https://en.wikipedia.org/wiki/CUDA#GPUs_supported> . For example, for
the GPU GeForce GTX 650, one should download and install CUDA SDK 10.2.
Download, but not yet install, the specific version of the CUDA SDK
from the CUDA Toolkit Archive
<https://developer.nvidia.com/cuda-toolkit-archive>
If you are lucky, your system's C compiler will be compatible with the
CUDA SDK version you downloaded and installing the above archive will
be successful. it is worth to give it a try, i.e. try to install and
see if it will complain about incompatible host compiler version. If it
doesn't then you are good to go.
If installing the above archive yields errors about incompatible host
compiler then you must install a supported host compiler at a private
path (so as not to interfere with your actual system compiler) and
provide that path during installation (see below) of the CUDA SDK and
also during installation of Inline::CUDA (see below).
Find the maximum host compiler version supported by your CUDA SDK you
downloaded. For example, CUDA SDK 10.2 in Linux is documented at
https://docs.nvidia.com/cuda/archive/10.2/cuda-installation-guide-linux/.
It states that the maximum gcc version is 8.2.1 for RHEL 8.1. I suspect
that it is the compiler's major version, e.g. 8, that matters. I can
confirm that gcc 8.4.0 works fine for Linux, Fedora 34, kernel 5.12,
perl v5.32, GeForce GTX 650.
Once you decide on the compiler version, download it and install it to
a private path so as not to interfere with the system compiler. Note
that path for later use.
I have instructions on how to do the above, in Linux for gcc. Download
specific gcc version from:
ftp://ftp.fu-berlin.de/unix/languages/gcc/releases/ (other mirrors
exist here https://gcc.gnu.org/mirrors.html). Compile the compiler and
make sure you give it a prefix and a suffix. You must also download
packages https://ftp.gnu.org/gnu/mpfr, https://ftp.gnu.org/gnu/mpc/ and
https://ftp.gnu.org/gnu/gmp/, choosing versions compatible with the gcc
version you have already downloaded. The crucial line in the
configuration stage of compiling gcc is configure
--prefix=/usr/local/gcc82 --program-suffix=82 --enable-languages=c,c++
--disable-multilib --disable-libstdcxx-pch . Here is a gist from
https://stackoverflow.com/questions/58859081/how-to-install-an-older-version-of-gcc-on-fedora:
tar xvf gcc-8.2.0.tar.xz
cd gcc-8.2.0/
tar xvf mpfr-4.0.2.tar.xz && mv -v mpfr-4.0.2 mpfr
tar xvf gmp-6.1.2.tar.xz && mv -v gmp-6.1.2 gmp
tar xvf mpc-1.1.0.tar.gz && mv -v mpc-1.1.0 mpc
cd ../
mkdir build-gcc820
cd build-gcc820/
../gcc-8.2.0/configure --prefix=/usr/local/gcc82 --program-suffix=82 --enable-languages=c,c++,fortran --disable-multilib --disable-libstdcxx-pch
make && make install
From now on, I will be using /usr/local/gcc82/bin/gcc82 and
/usr/local/gcc82/bin/g++82 as my host compilers.
Now you have our special compiler at /usr/local/gcc82 under the name
/usr/local/gcc82/bin/gcc82 and also /usr/local/gcc82/bin/g++82. We need
to install the CUDA SDK and tell it to skip checking host compiler
compatibility (I don't think there is a way to point it to the correct
compiler to use). In Linux, this is like sh
cuda_10.2.89_440.33.01_linux.run --override. After a successful
installation you should be able to see /usr/local/cuda/bin/nvcc.
Optionally add this to your PATH, export
PATH="${PATH}:/usr/local/cuda/bin"
In general, compiling CUDA code, for example this one
<https://gist.github.com/dpiponi/1502434>, is as simple as:
nvcc --compiler-bindir /usr/local/gcc82/bin/gcc82 simple.cu && a.out
Notice the cuda program extension .cu. It is important to keep nvcc
happy. Also note that if your CUDA SDK does not require installing an
older version of a compiler but instead it is happy with your system
compiler, then you can omit this: --compiler-bindir
/usr/local/gcc82/bin/gcc82.
If you did compile the simple cuda program and managed to run it, then
you are ready to install Inline::CUDA. If your system compiler is
acceptable by CUDA SDK, then it is as simple as running
perl Makefile.PL
make
make install
But if you need to declare a special host compiler (re:
/usr/local/gcc82/bin/gcc82) because your system compiler is not
accepted by CUDA SDK then you need to specify that to the
installation process via one of the following two methods:
The first method is more permanent but assumes that you can
(re-)install the module. During installation, specify the following
environment variables, assuming a bash-based terminal, then this
should do it:
CC=/usr/local/gcc82/bin/gcc82 \
CXX=/usr/local/gcc82/bin/g++82 \
LD=/usr/local/gcc82/bin/g++82 \
perl Makefile.PL
make
make install
#item The second method assumes you can edit Inline::CUDA's
configuration file located to a place like:
/usr/local/share/perl5/5.32/auto/share/dist/Inline-CUDA/Inline-CUDA.conf
(different systems will have a slightly different path), and modify
the entries for 'cc', 'cxx' and 'ld'.
Whatever the host compiler was, the configuration will be saved in a
file called Inline-CUDA.conf. This file will be saved in a share-dir
relative to your current Perl installation path. As an example mine is
at
/usr/local/share/perl5/5.32/auto/share/dist/Inline-CUDA/Inline-CUDA.conf
This configuration file will be consulted every time you use
Inline::CUDA and will know where the special host compiler resides.
Finally, make test will run a suite of test scripts and if all goes
well all will succeed. Additionally, make benchmark will run a matrix
multiplication benchmark which will reveal if you can indeed get any
benefits using GPGPU on your specific hardware for this specific
problem. Feel free to extend benchmarks for your use-case.
At this stage I would urge people installing the code to run also make
author-test and report back errors.
DEMO
The folder demos/ in the base dir of the current distribution contains
self-contained Inline::CUDA demo(s). One of which produces the
Mandelbrot Fractal on the GPU using Cuda code copied from marioroy
<https://perlmonks.org/?node=marioroy>'s excellent work at
https://github.com/marioroy/mandelbrot-python, see also PerlMonks post
at https://perlmonks.org/?node_id=11139880. The demo is not complete,
it just plugs marioroy's Cuda code into Inline::CUDA.
From the base dir of the current distribution run:
make demo
CAVEATS
In your CUDA code do not implement main()! Place your CUDA code in your
own functions which you call from Perl. If you get segmentation faults
check the above first.
CONTRIBUTIONS BY OTHERS
This is a module which stands on the shoulders of Giants.
Literally!
To start with, CUDA and nvidia cuda compiler are two NVIDIA projects
which offer general programming on the GPU to the masses opening a new
world of computational capabilities as an alternative to the
traditional CPU model. A big thank you to NVIDIA.
Then there is Perl's Inline module created by Ingy döt Net
<https://metacpan.org/author/INGY>. This module makes it easy to inline
a lot of computer languages and call them within a Perl script, passing
Perl data structures and obtaining results back.
This module is the key to opening many doors for Perl scripts.
A big thank you to Ingy döt Net <https://metacpan.org/author/INGY>.
Then there is Perl's Inline::C module created/co-created/maintained by
Ingy döt Net <https://metacpan.org/author/INGY>, Sisyphus
<https://metacpan.org/author/SISYPHUS> and Tina Müller
<https://metacpan.org/author/TINITA>.
The current Inline::CUDA module relies heavily on Inline::C. Because
the underlying CUDA language is C, I decided that instead of copying
what Inline::C does and modifying the section where the Makefile is
written, I decided to inject all Inline::C's subs into Inline::CUDA
except some sections which require special treatment, like when writing
the Makefile and also allowing some special Config keywords. The sub
injection happens every time the module is called, and that definetely
adds a tiny overhead which, in my opinion, is compensated by the huge
advantage of not copy-pasting code from Inline::C into Inline::CUDA and
then incorporating my changes every time Inline::C updates. A big thank
you to Ingy döt Net <https://metacpan.org/author/INGY> (again!),
Sisyphus <https://metacpan.org/author/SISYPHUS> and Tina Müller
<https://metacpan.org/author/TINITA>.
For writing test cases and benchmarks I had to descend into C and
become acquainted with perlguts, e.g. what is an SV
<https://perldoc.perl.org/perlguts#Working-with-SVs>. In this process I
had to ask for the wisdom of PerlMonks.org and #perl
<https://web.libera.chat/#perl>. A particular question was how to pass
in a C function an arrayref, a scalar or a scalarref, store the results
of the computation in there, in a call-by-reference manner. Fortunately
LeoNerd at #perl
<https://kiwiirc.com/nextclient/#irc://irc.perl.org/#perl> created the
following sv_setrv() macro which saved the day. Big thank you LeoNerd.
/************************************************************/
/* MONKEYPATCH by LeoNerd to set an arrayref into a scalarref
As posted on https://kiwiirc.com/nextclient/#irc://irc.perl.org/#perl
at 10:50 23/07/2021
A BIG THANK YOU LeoNerd
*/
#define HAVE_PERL_VERSION(R, V, S) \
(PERL_REVISION > (R) || (PERL_REVISION == (R) && (PERL_VERSION > (V) || (PERL_VERSION == (V) && (PERL_SUBVERSION >= (S))))))
#define sv_setrv(s, r) S_sv_setrv(aTHX_ s, r)
static void S_sv_setrv(pTHX_ SV *sv, SV *rv)
{
sv_setiv(sv, (IV)rv);
#if !HAVE_PERL_VERSION(5, 24, 0)
SvIOK_off(sv);
#endif
SvROK_on(sv);
}
I copied numerical recipes (as C code, Cuda kernels, etc.) from the
repository of Zhengchun Liu <https://github.com/lzhengchun> this code
resides in 'C/inlinecuda' of the current distribution and offers
shortcuts to GPU-based matrix multiplication, for example.
The idea of this project came to me when kcott
<https://www.perlmonks.org/?node=kcott> asked whether there are
https://www.perlmonks.org/?node_id=11134476 which I responded with the
preliminary idea <https://www.perlmonks.org/?node_id=11134582> for what
is now Inline::CUDA. A big thank you to kcott.
I got helpful comments, advice and the odd smiley from LeoNerd, mst,
Bojte, shlomif at #perl
<https://kiwiirc.com/nextclient/#irc://irc.perl.org/#perl>, thank you.
I got helpful comments and advice in this PerlMonks.org post
<https://perlmonks.org/?node_id=11135324> from syphilis
<https://perlmonks.org/?node=syphilis> and perlfan
<https://perlmonks.org/?node=perlfan>, although the problem was cracked
by LeoNerd #perl
<https://kiwiirc.com/nextclient/#irc://irc.perl.org/#perl>.
I also got helpful comments and advice from Ed J
<https://metacpan.org/author/ETJ> when I filed a bug over at
ExtUtils::MakeMaker (see
https://rt.cpan.org/Ticket/Display.html?id=138022 and
https://rt.cpan.org/Ticket/Display.html?id=137912).
AUTHOR
Andreas Hadjiprocopis, <bliako at cpan dot org>, <andreashad2 at gmail
dot com>, https://perlmonks.org/?node=bliako
DEDICATIONS
!Almaz!
BUGS
Please report any bugs or feature requests to bug-inline-cuda at
rt.cpan.org, or through the web interface at
https://rt.cpan.org/NoAuth/ReportBug.html?Queue=Inline-CUDA. I will be
notified, and then you'll automatically be notified of progress on your
bug as I make changes.
NOTE: this project is not yet on CPAN so report bugs by email to the
author. I am not very comfortable with github so cloning and merging
and pushing and pulling are beyond me.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Inline::CUDA
You can also look for information at:
* RT: CPAN's request tracker (report bugs here)
https://rt.cpan.org/NoAuth/Bugs.html?Dist=Inline-CUDA
* PerlMonks.org : a great forum to find Perl support and wisdom
The main side is this https://perlmonks.org where you can post
questions. The author's page is this
https://perlmonks.org/?node=bliako
* CPAN Ratings
https://cpanratings.perl.org/d/Inline-CUDA
* Search CPAN
https://metacpan.org/release/Inline-CUDA
ACKNOWLEDGEMENTS
This module stands on the shoulders of giants, namely the authors of
Inline and Inline::C. I wish to thank them here and pass most credit to
them. I will keep 1%.
A big thank you to NVIDIA for providing tools and support for doing
numerical programming on their GPU.
All mentioned above provided keys to many doors, all free and open
source. Thank you!
LICENSE AND COPYRIGHT
This software is Copyright (c) 2021 by Andreas Hadjiprocopis.
This is free software, licensed under:
The Artistic License 2.0 (GPL Compatible)