Initial commit.

angavrilov · Sep 27, 2010 · 9f28acd · 9f28acd
commit 9f28acd
Show file tree

Hide file tree

Showing 14 changed files with 3,406 additions and 0 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,25 @@
+(This is the MIT / X Consortium license as taken from
+ http://www.opensource.org/licenses/mit-license.html on or about
+ Monday; July 13, 2009)
+
+Copyright (c) 2010 by Alexander Gavrilov
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
diff --git a/README b/README
@@ -0,0 +1,8 @@
+This module implements SSE intrinsic functions for ECL and SBCL.
+
+NOTE: CURRENTLY THIS SHOULD BE CONSIDERED EXPERIMENTAL, AND
+      SUBJECT TO INCOMPATIBLE CHANGES IN A FUTURE RELEASE.
+
+Since the implementation is closely tied to the internals of
+the compiler, it should normally be obtained exclusively via
+the bundled contrib mechanism of the above implementations.
diff --git a/cl-simd.asd b/cl-simd.asd
@@ -0,0 +1,44 @@
+;;; -*- mode: Lisp; indent-tabs-mode: nil; -*-
+;;;
+;;; Copyright (C) 2010, Alexander Gavrilov ([email protected])
+;;;
+;;; This file defines the cl-simd ASDF system.
+;;;
+;;; Note that a completely independent definition
+;;; is used to build the system as an ECL contrib.
+
+(defsystem :cl-simd
+  :version "1.0"
+  #+sb-building-contrib :pathname
+  #+sb-building-contrib #p"SYS:CONTRIB;CL-SIMD;"
+  :components
+  #+(and sbcl sb-sse-intrinsics)
+  ((:file "sse-package")
+   (:file "sbcl-core" :depends-on ("sse-package"))
+   (:file "sse-intrinsics" :depends-on ("sbcl-core"))
+   (:file "sbcl-functions" :depends-on ("sse-intrinsics"))
+   (:file "sbcl-arrays" :depends-on ("sbcl-functions"))
+   (:file "sse-array-defs" :depends-on ("sbcl-arrays"))
+   (:file "sse-utils" :depends-on ("sse-array-defs")))
+  #+(and ecl sse2)
+  ((:file "sse-package")
+   (:file "ecl-sse-core" :depends-on ("sse-package"))
+   (:file "sse-intrinsics" :depends-on ("ecl-sse-core"))
+   (:file "sse-array-defs" :depends-on ("sse-intrinsics"))
+   (:file "ecl-sse-utils" :depends-on ("sse-intrinsics"))
+   (:file "sse-utils" :depends-on ("ecl-sse-utils")))
+  #-(or (and sbcl sb-sse-intrinsics)
+        (and ecl sse2))
+  ())
+
+#+(or (and sbcl sb-sse-intrinsics)
+      (and ecl sse2))
+(defmethod perform :after ((o load-op) (c (eql (find-system :cl-simd))))
+  (provide :cl-simd))
+
+(defmethod perform ((o test-op) (c (eql (find-system :cl-simd))))
+  #+(or (and sbcl sb-sse-intrinsics)
+        (and ecl sse2))
+  (or (load (compile-file "test-sfmt.lisp"))
+      (error "test-sfmt failed")))
+
diff --git a/cl-simd.texinfo b/cl-simd.texinfo
@@ -0,0 +1,236 @@
+@node cl-simd
+@section cl-simd
+@cindex SSE2 Intrinsics
+@cindex Intrinsics, SSE2
+
+The @code{cl-simd} module provides access to SSE2 instructions
+(which are nowadays supported by any CPU compatible with x86-64)
+in the form of @emph{intrinsic functions}, similar to the way
+adopted by modern C compilers. It also provides some lisp-specific
+functionality, like setf-able intrinsics for accessing lisp arrays.
+
+When this module is loaded, it defines an @code{:sse2} feature,
+which can be subsequently used for conditional compilation of
+code that depends on it. Intrinsic functions are available from
+the @code{sse} package.
+
+This API, with minor technical differences, is supported by both
+ECL and SBCL (x86-64 only).
+
+@menu
+* SSE pack types::
+* SSE array type::
+* Differences from C intrinsics::
+* Simple extensions::
+* Lisp array accessors::
+* Example::
+@end menu
+
+@node SSE pack types
+@subsection SSE pack types
+
+The package defines and/or exports the following types to
+represent 128-bit SSE register contents:
+
+@example
+(sse-pack &optional item)
+
+int-sse-pack = (sse-pack integer)
+float-sse-pack = (sse-pack single-float)
+double-sse-pack = (sse-pack double-float)
+@end example
+
+Declaring variable types using the subtype appropriate
+for your data is likely to lead to more efficient code
+(especially on ECL). However, the compiler implicitly
+casts between any subtypes of sse-pack when needed.
+
+Printed representation of SSE packs can be controlled
+by binding @code{*sse-pack-print-mode*} to one of
+@code{NIL} (default), @code{:int}, @code{:float} or
+@code{:double}. When the variable is set to NIL, the
+implementation makes its best effort to guess from
+the data and context.
+
+@node SSE array type
+@subsection SSE array type
+
+@example
+(sse-array element-type &optional dimensions)
+@end example
+
+This deftype expands to a lisp array type that is
+efficiently supported by AREF-like accessors. It
+should be assumed to be a subtype of @code{SIMPLE-ARRAY}.
+
+Aligned SSE arrays can be created through the use
+of the @code{make-sse-array} function:
+
+@example
+(make-sse-array dimensions &key (element-type '(unsigned-byte 8))
+                initial-element displaced-to displaced-index-offset)
+@end example
+
+It is guaranteed to return an object of type @code{sse-array},
+and in non-displaced case ensures alignment of the beginning
+of data to the 16-byte boundary.
+
+On ECL this function supports full-featured displacement.
+On SBCL it has to simulate it by sharing the underlying
+data vector, and does not support nonzero index offset.
+
+@node Differences from C intrinsics
+@subsection Differences from C intrinsics
+
+Intel Compiler, GCC and
+@url{http://msdn.microsoft.com/en-us/library/y0dh78ez%28VS.80%29.aspx,MSVC}
+all support the same set
+of SSE intrinsics, originally designed by Intel. This
+package generally follows the naming scheme of the C
+version, with the following exceptions:
+
+@itemize
+@item
+Underscores are replaced with dashes, and the @code{_mm_}
+prefix is removed in favor of packages.
+
+@item
+The 'e' from @code{epi} is dropped because MMX is obsolete
+and won't be supported.
+
+@item
+@code{_si128} functions are renamed to @code{-pi} for uniformity
+and brevity. The author has personally found this discrepancy
+in the original C intrinsics naming highly jarring.
+
+@item
+Comparisons are named using graphic characters, e.g. @code{<=-ps}
+for @code{cmpleps}, or @code{/>-ps} for @code{cmpngtps}. In some
+places the set of comparison functions is extended to cover the
+full possible range.
+
+@item
+Conversion functions are renamed to @code{convert-*-to-*} and
+@code{truncate-*-to-*}.
+
+@item
+A few functions are completely renamed: @code{cpu-mxcsr} (setf-able),
+@code{cpu-pause}, @code{cpu-load-fence}, @code{cpu-store-fence},
+@code{cpu-memory-fence}, @code{cpu-clflush}, @code{cpu-prefetch-*}.
+@end itemize
+
+In addition, foreign pointer access intrinsics have an additional
+optional integer offset parameter to allow more efficient coding
+of pointer deference, and the most common ones have been renamed
+and made SETF-able:
+
+@itemize
+@item
+@code{mem-ref-ss}, @code{mem-ref-ps}, @code{mem-ref-aps}
+
+@item
+@code{mem-ref-sd}, @code{mem-ref-pd}, @code{mem-ref-apd}
+
+@item
+@code{mem-ref-pi}, @code{mem-ref-api}, @code{mem-ref-si64}
+@end itemize
+
+(The @code{-ap*} version requires alignment.)
+
+@node Simple extensions
+@subsection Simple extensions
+
+This module extends the set of basic intrinsics with the following
+simple compound functions:
+
+@itemize
+@item
+@code{neg-ss}, @code{neg-ps}, @code{neg-sd}, @code{neg-pd},
+@code{neg-pi8}, @code{neg-pi16}, @code{neg-pi32}, @code{neg-pi64}:
+
+implement numeric negation of the corresponding data type.
+
+@item
+@code{not-ps}, @code{not-pd}, @code{not-pi}:
+
+implement bitwise logical inversion.
+
+@item
+@code{if-ps}, @code{if-pd}, @code{if-pi}:
+
+perform element-wise combining of two values based on a boolean
+condition vector produced as a combination of comparison function
+results through bitwise logical functions.
+
+The condition value must use all-zero bitmask for false, and
+all-one bitmask for true as a value for each logical vector
+element. The result is undefined if any other bit pattern is used.
+
+N.B.: these are @emph{functions}, so both branches of the
+conditional are always evaluated.
+@end itemize
+
+The module also provides symbol macros that expand into expressions
+producing certain constants in the most efficient way:
+
+@itemize
+@item
+0.0-ps 0.0-pd 0-pi for zero
+
+@item
+true-ps true-pd true-pi for all 1 bitmask
+
+@item
+false-ps false-pd false-pi for all 0 bitmask (same as zero)
+@end itemize
+
+@node Lisp array accessors
+@subsection Lisp array accessors
+
+In order to provide better integration with ordinary lisp code,
+this module implements a set of AREF-like memory accessors:
+
+@itemize
+@item
+@code{(ROW-MAJOR-)?AREF-PREFETCH-(T0|T1|T2|NTA)} for cache prefetch.
+
+@item
+@code{(ROW-MAJOR-)?AREF-CLFLUSH} for cache flush.
+
+@item
+@code{(ROW-MAJOR-)?AREF-[AS]?P[SDI]} for whole-pack read & write.
+@end itemize
+
+(Where A = aligned; S = aligned streamed write.)
+
+These accessors can be used with any non-bit specialized
+array or vector, without restriction on the precise element
+type (although it should be declared at compile time to
+ensure generation of the fastest code).
+
+Additional index bound checking is done to ensure that 16
+bytes of memory are accessible after the specified index.
+
+As an exception, ROW-MAJOR-AREF-PREFETCH-* does not do any
+range checks at all, because the prefetch instructions
+are officially safe to use with bad addresses. The
+AREF-PREFETCH-* and *-CLFLUSH functions do only ordinary
+index checks without the 16-byte extension.
+
+@node Example
+@subsection Example
+
+This code processes several single-float arrays, storing
+either the value of a*b, or c/3.5 into result, depending
+on the sign of mode:
+
+@example
+(loop for i from 0 below 128 by 4
+   do (setf (aref-ps result i)
+            (if-ps (<-ps (aref-ps mode i) 0.0-ps)
+                   (mul-ps (aref-ps a i) (aref-ps b i))
+                   (div-ps (aref-ps c i) (set1-ps 3.5)))))
+@end example
+
+As already noted above, both branches of the if are always
+evaluated.