Inline::CUDA - Inline NVIDIA's CUDA code and GPU processing from within any Perl script.
Version 0.15
WARNING: see section "INSTALLATION" for how to install this package.
WARNING: prior to installation, please install https://github.com/hadjiprocopis/perl-nvidia2-ml
Inline::CUDA
is a module that allows you to write Perl subroutines in C
or C++ with CUDA extensions.
Similarly to Inline::C,
Inline::CUDA
is not meant to be used directly but rather in this way:
#Firstly, specify some configuration options:
use Inline CUDA => Config =>
# optionally specify some options,
# you don't have to
# if they are already stored in a configuration file
# which is consulted before running any CUDA program
# host_compiler_bindir => '/usr/local/gcc82/bin',
# cc => '/usr/local/gcc82/bin/gcc82',
# cxx => '/usr/local/gcc82/bin/g++82',
# ld => '/usr/local/gcc82/bin/gcc82',
# nvcc => '/usr/local/cuda/bin/nvcc',
# nvld => '/usr/local/cuda/bin/nvcc',
# pass options to nvcc:
# this is how to deal with unknown compiler flags passed on to nvcc: pass them all to gcc
# only supported in nvcc versions 11+
# nvccflags => '--forward-unknown-to-host-compiler',
# do not check compiler version, use whatever the user wants
# nvccflags => '--allow-unsupported-compiler',
# this will use CC or CXX depending on the language specified here
# you can use C++ in your CUDA code, and there are tests in t/*
# which check if c or c++ and show how to do this:
host_code_language => 'c', # or 'c++' or 'cpp', case insensitive, see also cxx =>
# optional extra Include and Lib dirs
#inc => '-I...',
#libs => '-L... -l...',
# for debugging
BUILD_NOISY => 1,
# code will be left in ./_Inline/build/ after successful build
clean_after_build => 0,
warnings => 10,
;
# and then, suck in code from __DATA__ and run it at runtime
# notice that Inline->use(CUDA => <<'EOCUDA') is run at compiletime
my $codestr;
{ local $/ = undef; $codestr = <DATA> }
Inline->bind( CUDA => $codestr );
if( do_add() ){ die "error running do_add()..." }
1;
__DATA__
/* this is C code with CUDA extensions */
#include <stdio.h>
#define N 1000
/* This is the CUDA Kernel which nvcc compiles: */
__global__
void add(int *a, int *b) {
int i = blockIdx.x;
if (i<N) b[i] = a[i]+b[i];
}
/* this function can be called from Perl.
It returns 0 on success or 1 on failure.
This simple code does not support passing parameters in,
which is covered elsewhere.
*/
int do_add() {
cudaError_t err;
// Create int arrays on the CPU.
// ('h' stands for "host".)
int ha[N], hb[N];
// Create corresponding int arrays on the GPU.
// ('d' stands for "device".)
int *da, *db;
if( (err=cudaMalloc((void **)&da, N*sizeof(int))) != cudaSuccess ){
fprintf(stderr, "do_add(): error, call to cudaMalloc() has failed for %zu bytes for da: %s\n",
N*sizeof(int), cudaGetErrorString(err)
);
return 1;
}
if( (err=cudaMalloc((void **)&db, N*sizeof(int))) != cudaSuccess ){
fprintf(stderr, "do_add(): error, call to cudaMalloc() has failed for %zu bytes for db: %s\n",
N*sizeof(int), cudaGetErrorString(err)
);
return 1;
}
// Initialise the input data on the CPU.
for (int i = 0; i<N; ++i) ha[i] = i;
// Copy input data to array on GPU.
if( (err=cudaMemcpy(da, ha, N*sizeof(int), cudaMemcpyHostToDevice)) != cudaSuccess ){
fprintf(stderr, "do_add(): error, call to cudaMemcpy(cudaMemcpyHostToDevice) has failed for %zu bytes for ha->da: %s\n",
N*sizeof(int), cudaGetErrorString(err)
);
return 1;
}
// Launch GPU code with N threads, one per array element.
add<<<N, 1>>>(da, db);
if( (err=cudaGetLastError()) != cudaSuccess ){
fprintf(stderr, "do_add(): error, failed to launch the kernel into the device: %s\n",
cudaGetErrorString(err)
);
return 1;
}
// Copy output array from GPU back to CPU.
if( (err=cudaMemcpy(hb, db, N*sizeof(int), cudaMemcpyDeviceToHost)) != cudaSuccess ){
fprintf(stderr, "do_add(): error, call to cudaMemcpy(cudaMemcpyDeviceToHost) has failed for %zu bytes for db->ha: %s\n",
N*sizeof(int), cudaGetErrorString(err)
);
return 1;
}
//for (int i = 0; i<N; ++i) printf("%d\n", hb[i]); // print results
// Free up the arrays on the GPU.
if( (err=cudaFree(da)) != cudaSuccess ){
fprintf(stderr, "do_add(): error, call to cudaFree() has failed for da: %s\n",
cudaGetErrorString(err)
);
return 1;
}
if( (err=cudaFree(db)) != cudaSuccess ){
fprintf(stderr, "do_add(): error, call to cudaFree() has failed for db: %s\n",
cudaGetErrorString(err)
);
return 1;
}
return 0;
}
The statement: use Inline::CUDA => ...;
is executed at
compile-time. Often this is not desirable because you may
want to read code from file, modify code at runtime or
even auto-generate the inlined code at runtime.
In these situations Inline provides bind()
.
Here is how to inline code read at runtime from a file
called my_cruncher.cu
, whose contents are exactly the same
as the __DATA__
section in the previous example,
use Inline;
use File::Slurp;
my $data = read_file('my_cruncher.cu');
Inline->bind(CUDA => $data);
Using Inline->use(CUDA => "DATA")
seems to have a problem
when __DATA__
section contains identifiers enclosed
in double underscores, e.g. __global__
(this is a CUDA reserved keyword)
one workaround is to declare #define CUDA_GLOBAL __global__
and then replace all __global__
with CUDA_GLOBAL
.
Sometimes, it is more convenient to configure Inline::CUDA
not in a use
statement (as above) but in a require
statement.
The latter is executed during the runtime of your script as opposed
to loading the file during compile time for the former. This has
certain benefits as you can enclose it in a conditional,
eval or try/catch blocks. This is how
(thank you [email protected]):
require Inline;
# configuration:
Inline->import(
CUDA => Config =>
ccflagsex => '...'
);
# compile your code:
Inline->import(
CUDA => $my_code
);
The somewhat old news, at least since 2007, is that a Graphics Processing Unit (GPU) has found uses beyond its traditional role in calculating and displaying graphics to our computer monitor. This stems from the fact that a GPU is a highly parallel computing machinery. Similar to the operating system sending data and instructions to that GPU frame-after-frame from the time it is booted in order to display windows, widgets, transparent menus, spinning animations, video games and visual effects, a developer can now send data and instructions to the GPU for doing any sort of arithmetic calculation in a highly parallel manner. Case in point is matrix multiplication where thousands of GPU computing elements are processing the matrices' elements in parallel. True parallelism, that is. As opposed to the emulated or limited, by the number of cores, 2, 4, 8 for cheap desktops, CPU's parallelism. It goes without saying that GPU processing is very powerful and opens up to a new world of nunber-crunching possibilities without the need for expensive super-computer capabilities.
NVIDIA's CUDA is "a parallel computing platform and programming model that makes using a GPU for general purpose computing simple and elegant" (from NVIDIA's site). In short, we use CUDA to dispatch number-crunching code to a Graphics Processing Unit (GPU) and then get the results back.
NVIDIA's CUDA comprises of a few keywords which can be inserted in C, C++, Fortran, etc. code. In effect, developers still write programs in their preferred language (C, C++ etc.) and whenever they need to access the GPU they use the CUDA extensions. For more information check CUDA Programming Guide .
A CUDA program is, therefore, a C or C++ program with a few CUDA keywords added. Generally, compiling such a program is done by a CUDA compiler, namely nvcc (nvidia cuda compiler) which, simplistically put, splits the code in two parts, the CUDA part and the C part. The C part is delegated to a C compiler, like gcc, and the CUDA part is handled by nvcc. Finally nvcc links these components into an ordinary standalone executable. For more information read CUDA Toolkit Documentation
Notice that in NVIDIA jargon, a "device" is (one of) the GPU and "host" is the CPU and the OS.
In practice there are huge caveats which their conquering can be surprisingly easy with some CLI magic. This is fine in Linux or even OSX but for poor M$-windows victims, the same process can be painfully tortuous and possibly ending to a mental breaker. As I don't belong to that category I will not be able to help you with very specific requests regarding the so-called OS.
And on to the caveats.
First of all, not all GPUs support CUDA. But new NVIDIA ones usually do and at a price of less or around 100 euros.
Secondly, different GPUs have different "compute capability" requiring different
versions of the CUDA SDK, which provides the nvcc and friends. For example my GeForce GTX 650
has a compute capability of 3.0
and that requires a SDK version of 10.2
.
That's the last SDK to support a 3.x
capability GPU. Currently, the SDK has reached
version 11.4 and supports compute capabilities of 3.5
to 8.6
. See the
Wikipedia article on CUDA
for what GPUs are supported and by what CUDA SDK version.
Thirdly and most importantly, nvcc
has specific and strict
requirements regarding the version of the "host compiler", for example,
gcc/g++
, clang
, cl.exe
. See which compilers are supported at
For example, my GPU's compute capability (3.0
) requires CUDA SDK version 10.2
which requires gcc version less or equal to 8
. Find out what compiler
your CUDA SDK supports in this
ax3l's gist
There is a hack to stop nvcc
checking compiler version and using whatever
compiler it is specified by the user. Simply pass --allow-unsupported-compiler
to nvcc
and hope for the best. According to
CUDA Toolkit Documentation,
this flag has no effect in MacOS.
xt/30-check-basic-with-system-compiler.t
shows how to tell Inline::CUDA
to use the system compiler and also tell nvcc
to not check
compiler version. This test can fail in particular OS/versions. It seems
to have worked for my particular setting. With this option you
are at least safe from getting into trouble because of
"Perl and XS objects with mismatched compiler versions".
Additionally, general GPU programming, in practice, has quite some caveats of
its own that the potential GPU programmer must be aware of. To start with,
there are some quite large overheads associated with sending data to the
GPU and receiving it back. Because the memory generally accessible to any program
running on the CPU (e.g. the C-part of the CUDA code) is not available to the
GPU in the simple and elegant manner C programmers take for granted
when presented with a memory pointer
and read the memory space it points to. And vice versa. Memory in the C-part of the
code must be cudaMemcpy()
'ed (the equivalent of memcpy()
for
host-to-device and device-to-host data transfers) to the GPU. And the results calculated
in the GPU remain there until are transfered back to host using another cudaMemcpy()
call.
Add to this the overhead of copying the value of each item of a Perl array into
a C array which cudaMemcpy()
understands and expects and you get quite
a significant overhead and a lot of paper-pushing for finally getting the same block
of data onto the GPU. And the same applies in doing the reverse.
Here is a rough sketch of what memory transfers are required
for calling an Inline::CUDA
function from Perl and doing GPU processing:
my @array = (1..5); # memory allocated for Perl array
inline_cuda_function(\@array, $result);
...
// now inside a Inline::CUDA code block
int inline_cuda_function(SV *in, SV *out){
// allocate memory for copying Perl array (in) to C
h_in = malloc(5*sizeof(int));
// allocate memory for holding the results on host
h_out = malloc(5*sizeof(int));
// allocate memory on the GPU for this same data
cudaMalloc((void **)&d_in, 5*sizeof(int));
// allocate memory on the GPU for the result
cudaMalloc((void **)&d_out, 5*sizeof(int));
// transfer Perl data onto host's C-array
AV *anAV = (AV *)SvRV(in);
for(int i=0;i<5;i++){
SV *anSV = *av_fetch((AV *)SvRV(anAV), i, FALSE);
h_in[i] = SvNV(anSV);
}
// and now transfer host's C-array onto the GPU
cudaMemcpy(d_in, h_in, 5*sizeof(int), cudaMemcpyHostToDevice);
// launch the kernel and do the processing onto the GPU
...
// extract results from the GPU onto host memory
cudaMemcpy(h_out, d_in, 5*sizeof(int), cudaMemcpyDeviceToHost);
// and now from host memory (the C array) onto Perl
// we have been passed a scalar, we create a new arrayref
// and place it to its RV slot
anAV = newAV();
av_extend(anAV, 5); // resize the Perl array to fit the result
// sv_setrv() is a macro created by LeoNerd, see above
// it places the new array we created onto the passed scalar (out)
sv_setrv(SvRV(out), (SV *)av);
for(int i=0;i<5;i++){
av_store(av, i, newSVnv(h_out[i]));
}
free(h_in); free(h_out);
cudaFree(d_in); cudaFree(d_out);
return 0; // success
}
There are some benchmarks in xt/benchmarks/*.b
which compare the
performance of a small
(size ~10x10), medium
(size ~100x100) and
large
(size ~1000x1000) data scenario for
doing matrix multiplication (run them with make benchmark
).
In my computer at least the pure-C,
CPU-hosted outperforms the GPU for the small
, medium
scenaria
exactly because of these overheads. But the GPU is a clear
winner for large
data scenario.
See for example this particular benchmark: xt/benchmarks/30-matrix-multiply.b
Finally, there is an issue with compiling XS code, which is essentially what Inline::CUDA
does,
with a compiler which is different to the compiler current Perl is built with. This is
the case when a special host compiler had to be installed because of
the CUDA SDK version. if that's true then you are essentially loading XS code
compiled with gcc82
(as per the example in section "INSTALLATION") with
a perl executable which was compiled with system compiler, for example gcc11
.
If that is really an issue then it will be insurmountable and the only
solution will be to perlbrew a new Perl built
with the special host compiler, e.g. gcc82
.
The manual on
installing Perl states
that specifying the compiler is as simple as sh Configure -Dcc=/usr/local/gcc82/bin/gcc82
If you want to compile and install a new Perl using perlbrew then this will do it (thank you [email protected]:
PERLBREW_CONFIGURE_FLAGS='-d -Dcc=/usr/local/gcc82/bin/gcc' perlbrew install 5.38.2 --as 5.38.2.gcc82
The -d
is for not being asked trivial questions about the compilation options
and use sane defaults. The --as 5.38.2.gcc82
tells perlbrew
to rename the new installed perl in case there is already one with the same name.
Installation of Inline::CUDA
is a nightmare because it depends on external
dependencies. It needs NVIDIA's CUDA SDK (providing nvcc
(the nvidia cuda compiler)
which requires specific host compiler versions. Which means that it is
very likely that you will also need to install in your system
an older compiler compatible with nvcc
version. Even if your
GPU supports the latest CUDA SDK version (at 11.4
as of July 2021),
the maximum gcc
version allowed with that is 10.21
.
Currently, gcc
is at version 11.2
and upgrades monthly.
Installing a "private" compiler, in Linux, can be easy or hard depending whether the package manager allows it. Mine does not. See "how-to-install-compiler" for instructions on how to do that on Linux and label the new compiler with its own name so that one can have system compiler and older compiler living in parallel and not disturbing each other.
That said, there is a workaround: add this to pass
the --allow-unsupported-compiler
flag to nvcc
.
This can be achieved via the use Inline =
Config => ...>, as below:
use Inline => Config =>
nvccflags => '--allow-unsupported-compiler',
... # other config options
;
... # Inline::CUDA etc.
The long and proper way of installing Inline::CUDA
is described below.
So, if all goes Merfy you will have to install
nvcc
and an additional host compiler gcc
. The latter is not
the most pleasant of experiences in Linux. I don't know what's the situation
with Windows. I can only imagine the horror.
Here is a rough sketch of what one should do.
-
Find the NVIDIA GPU name+version you have installed on your hardware kit. For example,
GeForce GTX 650
. This can be easy or hard.- If you already have the executable
nvidia-smi
installed or want to install it (e.g. in Fedora CLI dodnf provides nvidia-smi
and make sure you have reporpmfusion-nonfree
enabled, somehow). - Install nvidia::ml and run the script I provide with
Inline::CUDA
atscripts/nvidia-ml-test.pl
- If you already have the executable
-
With the NVIDIA GPU name+version available search this Wikipedia article for the "compute capability" of the GPU. For example this is
3.0
forGeForce GTX 650
. -
Use the "compute capability" of the GPU in order to find the CUDA SDK version you must install in the same Wikipedia article . For example, for the GPU
GeForce GTX 650
, one should download and install CUDA SDK 10.2. -
Download, but not yet install, the specific version of the CUDA SDK from the CUDA Toolkit Archive
-
If you are lucky, your system's C compiler will be compatible with the CUDA SDK version you downloaded and installing the above archive will be successful. it is worth to give it a try, i.e. try to install and see if it will complain about incompatible host compiler version. If it doesn't then you are good to go.
-
If installing the above archive yields errors about incompatible host compiler then you must install a supported host compiler at a private path (so as not to interfere with your actual system compiler) and provide that path during installation (see below) of the CUDA SDK and also during installation of
Inline::CUDA
(see below). -
Find the maximum host compiler version supported by your CUDA SDK you downloaded. For example, CUDA SDK 10.2 in Linux is documented at https://docs.nvidia.com/cuda/archive/10.2/cuda-installation-guide-linux/. It states that the maximum gcc version is
8.2.1
forRHEL 8.1
. I suspect that it is the compiler's major version, e.g.8
, that matters. I can confirm thatgcc 8.4.0
works fine forLinux, Fedora 34, kernel 5.12, perl v5.32, GeForce GTX 650
. -
Once you decide on the compiler version, download it and install it to a private path so as not to interfere with the system compiler. Note that path for later use.
-
I have instructions on how to do the above, in Linux for
gcc
. Download specificgcc
version from: ftp://ftp.fu-berlin.de/unix/languages/gcc/releases/ (other mirrors exist here https://gcc.gnu.org/mirrors.html). Compile the compiler and make sure you give it aprefix
and asuffix
. You must also download packages https://ftp.gnu.org/gnu/mpfr, https://ftp.gnu.org/gnu/mpc/ and https://ftp.gnu.org/gnu/gmp/, choosing versions compatible with the gcc version you have already downloaded. The crucial line in the configuration stage of compiling gcc isconfigure --prefix=/usr/local/gcc82 --program-suffix=82 --enable-languages=c,c++ --disable-multilib --disable-libstdcxx-pch
. Here is a gist from https://stackoverflow.com/questions/58859081/how-to-install-an-older-version-of-gcc-on-fedora:tar xvf gcc-8.2.0.tar.xz cd gcc-8.2.0/ tar xvf mpfr-4.0.2.tar.xz && mv -v mpfr-4.0.2 mpfr tar xvf gmp-6.1.2.tar.xz && mv -v gmp-6.1.2 gmp tar xvf mpc-1.1.0.tar.gz && mv -v mpc-1.1.0 mpc cd ../ mkdir build-gcc820 cd build-gcc820/ ../gcc-8.2.0/configure --prefix=/usr/local/gcc82 --program-suffix=82 --enable-languages=c,c++,fortran --disable-multilib --disable-libstdcxx-pch make && make install
From now on, I will be using
/usr/local/gcc82/bin/gcc82
and/usr/local/gcc82/bin/g++82
as my host compilers. -
Now you have our special compiler at
/usr/local/gcc82
under the name/usr/local/gcc82/bin/gcc82
and also/usr/local/gcc82/bin/g++82
. We need to install the CUDA SDK and tell it to skip checking host compiler compatibility (I don't think there is a way to point it to the correct compiler to use). In Linux, this is likesh cuda_10.2.89_440.33.01_linux.run --override
. After a successful installation you should be able to see/usr/local/cuda/bin/nvcc
. Optionally add this to your PATH,export PATH="${PATH}:/usr/local/cuda/bin"
-
In general, compiling CUDA code, for example this one, is as simple as:
nvcc --compiler-bindir /usr/local/gcc82/bin/gcc82 simple.cu && a.out
Notice the cuda program extension
.cu
. It is important to keep nvcc happy. Also note that if your CUDA SDK does not require installing an older version of a compiler but instead it is happy with your system compiler, then you can omit this:--compiler-bindir /usr/local/gcc82/bin/gcc82
. -
If you did compile the simple cuda program and managed to run it, then you are ready to install
Inline::CUDA
. If your system compiler is acceptable by CUDA SDK, then it is as simple as runningperl Makefile.PL make make install
But if you need to declare a special host compiler (re:
/usr/local/gcc82/bin/gcc82
) because your system compiler is not accepted by CUDA SDK then you need to specify that to the installation process via one of the following two methods:-
The first method is more permanent but assumes that you can (re-)install the module. During installation, specify the following environment variables, assuming a bash-based terminal, then this should do it:
CC=/usr/local/gcc82/bin/gcc82 \ CXX=/usr/local/gcc82/bin/g++82 \ LD=/usr/local/gcc82/bin/g++82 \ perl Makefile.PL make make install
#item The second method assumes you can edit
Inline::CUDA
's configuration file located to a place like:/usr/local/share/perl5/5.32/auto/share/dist/Inline-CUDA/Inline-CUDA.conf
(different systems will have a slightly different path), and modify the entries for 'cc', 'cxx' and 'ld'.
-
-
Whatever the host compiler was, the configuration will be saved in a file called
Inline-CUDA.conf
. This file will be saved in ashare-dir
relative to your current Perl installation path. As an example mine is at/usr/local/share/perl5/5.32/auto/share/dist/Inline-CUDA/Inline-CUDA.conf
This configuration file will be consulted every time you use
Inline::CUDA
and will know where the special host compiler resides. -
Finally,
make test
will run a suite of test scripts and if all goes well all will succeed. Additionally,make benchmark
will run a matrix multiplication benchmark which will reveal if you can indeed get any benefits using GPGPU on your specific hardware for this specific problem. Feel free to extend benchmarks for your use-case. -
At this stage I would urge people installing the code to run also
make author-test
and report back errors.
The folder demos/
in the base dir of the current distribution
contains self-contained Inline::CUDA
demo(s). One of which
produces the Mandelbrot Fractal on the GPU using Cuda code
copied from marioroy's
excellent work at https://github.com/marioroy/mandelbrot-python,
see also PerlMonks post at https://perlmonks.org/?node_id=11139880.
The demo is not complete, it just plugs marioroy's
Cuda code into Inline::CUDA
.
From the base dir of the current distribution run:
make demo
In your CUDA code do not implement main()
!
Place your CUDA code in your own functions which
you call from Perl. If you get segmentation faults
check the above first.
This is a module which stands on the shoulders of Giants.
Literally!
To start with, CUDA and nvidia cuda compiler
are
two NVIDIA projects which offer general programming on the GPU
to the masses opening a new world of computational
capabilities as an alternative to the traditional CPU model.
A big thank you to NVIDIA.
Then there is Perl's Inline module created by Ingy döt Net. This module makes it easy to inline a lot of computer languages and call them within a Perl script, passing Perl data structures and obtaining results back.
This module is the key to opening many doors for Perl scripts.
A big thank you to Ingy döt Net.
Then there is Perl's Inline::C module created/co-created/maintained by Ingy döt Net, Sisyphus and Tina Müller.
The current Inline::CUDA
module relies heavily on Inline::C. Because
the underlying CUDA language is C, I decided that instead of copying what
Inline::C does and modifying the section where the Makefile is written,
I decided to inject all Inline::C's subs into Inline::CUDA
except
some sections which require special treatment, like when writing the Makefile
and also allowing some special Config
keywords. The sub injection happens
every time the module is called, and that definetely adds a tiny overhead
which, in my opinion, is compensated by the huge advantage of not
copy-pasting code from Inline::C into Inline::CUDA
and then
incorporating my changes every time Inline::C updates.
A big thank you to Ingy döt Net (again!),
Sisyphus and Tina Müller.
For writing test cases and benchmarks I had to descend into C and become
acquainted with perlguts,
e.g. what is an SV.
In this process I had to ask for the wisdom of PerlMonks.org and
#perl. A particular question was
how to pass in a C function an arrayref, a scalar or a scalarref,
store the results of the computation in there, in a call-by-reference manner.
Fortunately LeoNerd at #perl
created the following sv_setrv()
macro which saved the day. Big thank you LeoNerd.
/************************************************************/
/* MONKEYPATCH by LeoNerd to set an arrayref into a scalarref
As posted on https://kiwiirc.com/nextclient/#irc://irc.perl.org/#perl
at 10:50 23/07/2021
A BIG THANK YOU LeoNerd
*/
#define HAVE_PERL_VERSION(R, V, S) \
(PERL_REVISION > (R) || (PERL_REVISION == (R) && (PERL_VERSION > (V) || (PERL_VERSION == (V) && (PERL_SUBVERSION >= (S))))))
#define sv_setrv(s, r) S_sv_setrv(aTHX_ s, r)
static void S_sv_setrv(pTHX_ SV *sv, SV *rv)
{
sv_setiv(sv, (IV)rv);
#if !HAVE_PERL_VERSION(5, 24, 0)
SvIOK_off(sv);
#endif
SvROK_on(sv);
}
I copied numerical recipes (as C code, Cuda kernels, etc.) from the repository of Zhengchun Liu this code resides in 'C/inlinecuda' of the current distribution and offers shortcuts to GPU-based matrix multiplication, for example.
The idea of this project came to me when kcott
asked whether there are https://www.perlmonks.org/?node_id=11134476
which I responded with the preliminary idea for what is
now Inline::CUDA
. A big thank you to kcott.
I got helpful comments, advice and the odd smiley from
LeoNerd
, mst
, Bojte
, shlomif
at #perl,
thank you.
I got helpful comments and advice in this PerlMonks.org post from syphilis and perlfan, although the problem was cracked by LeoNerd #perl.
I also got helpful comments and advice from Ed J when I filed a bug over at ExtUtils::MakeMaker (see https://rt.cpan.org/Ticket/Display.html?id=138022 and https://rt.cpan.org/Ticket/Display.html?id=137912).
Andreas Hadjiprocopis,
<bliako at cpan dot org>
,
<andreashad2 at gmail dot com>
,
https://perlmonks.org/?node=bliako
!Almaz!
Please report any bugs or feature requests to bug-inline-cuda at rt.cpan.org
, or through
the web interface at https://rt.cpan.org/NoAuth/ReportBug.html?Queue=Inline-CUDA. I will be notified, and then you'll
automatically be notified of progress on your bug as I make changes.
NOTE: this project is not yet on CPAN so report bugs by email to the author. I am not very comfortable with github so cloning and merging and pushing and pulling are beyond me.
You can find documentation for this module with the perldoc command.
perldoc Inline::CUDA
You can also look for information at:
-
RT: CPAN's request tracker (report bugs here)
-
PerlMonks.org : a great forum to find Perl support and wisdom
The main side is this https://perlmonks.org where you can post questions. The author's page is this https://perlmonks.org/?node=bliako
-
CPAN Ratings
-
Search CPAN
This module stands on the shoulders of giants, namely the authors of Inline and Inline::C. I wish to thank them here and pass most credit to them. I will keep 1%.
A big thank you to NVIDIA for providing tools and support for doing numerical programming on their GPU.
All mentioned above provided keys to many doors, all free and open source. Thank you!
This software is Copyright (c) 2021 by Andreas Hadjiprocopis.
This is free software, licensed under:
The Artistic License 2.0 (GPL Compatible)