Skip to content

Commit

Permalink
Merge pull request #20 from gomlx/io
Browse files Browse the repository at this point in the history
v0.5.0: Added direct access to PJRT buffers when PJRT running in CPU; Benchmarks.
  • Loading branch information
janpfeifer authored Dec 19, 2024
2 parents c8c81d9 + 4ef585a commit 3e4e41d
Show file tree
Hide file tree
Showing 29 changed files with 1,975 additions and 521 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/go.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ jobs:
- name: Install Go
uses: actions/setup-go@v5
with:
go-version: "1.22.x"
go-version: "1.23.x"

- name: Install Gopjrt C library gomlx_xlabuilder and PJRT plugin
shell: bash
Expand Down
14 changes: 13 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -329,7 +329,7 @@ Also, see [this blog post](https://opensource.googleblog.com/2024/03/pjrt-plugin
Because of https://github.com/golang/go/issues/13467 : C API's cannot be exported across packages, even within the same repo.
Even a function as simple as `func Add(a, b C.int) C.int` in one package cannot be called from another.
So we need to wrap everything, and more than that, one cannot create separate sub-packages to handle separate concerns.
THis is also the reason the library `chelper.go` is copied in both `pjrt` and `xlabuilder` packages.
This is also the reason the library `chelper.go` is copied in both `pjrt` and `xlabuilder` packages.
* **Why does PJRT spits out so much logging ? Can we disable it ?**
This is a great question ... imagine if every library we use decided they also want to clutter our stderr?
I have [an open question in Abseil about it](https://github.com/abseil/abseil-cpp/discussions/1700).
Expand All @@ -340,6 +340,18 @@ Also, see [this blog post](https://opensource.googleblog.com/2024/03/pjrt-plugin
before calling `pjrt.GetPlugin`. But it may have unintended consequences, if some other library is depending
on the fd 2 to work, or if a real exceptional situation needs to be reported and is not.

## Environment Variables

That help control or debug how **gopjrt** work:

* `PJRT_PLUGIN_LIBRARY_PATH`: Path to search for PJRT plugins. **gopjrt** also searches in `/usr/local/lib/gomlx/pjrt`,
the standard library paths for the system and `$LD_LIBRARY_PATH`.
* `XLA_DEBUG_OPTIONS`: If set, it is parsed as a `DebugOptions` proto that
is passed during the JIT-compilation (`Client.Compile()`) of a computation graph.
It is not documented how it works in PJRT (e.g. I observed a great slow down when this is set,
even if set to the default values), but [the proto has some documentation](https://github.com/gomlx/gopjrt/blob/main/protos/xla.proto#L40).
* `GOPJRT_INSTALL_DIR` and `GOPJRT_NOSUDO`: used by the install scripts, see "Installing" section above.

## Links to documentation

* [Google Drive Directory with Design Docs](https://drive.google.com/drive/folders/18M944-QQPk1E34qRyIjkqDRDnpMa3miN): Some links are outdated or redirected, but very valuable information.
Expand Down
5 changes: 3 additions & 2 deletions c/WORKSPACE
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,12 @@ http_archive(
# Notice bazel.sh scrape the line below for the OpenXLA version, the format
# of the line should remain the same (the hash in between quotes), or bazel.sh
# must be changed accordingly.
OPENXLA_XLA_COMMIT_HASH = "90af2896ab4992ff14a1cd2a75ce02e43f46c090" # From 2024-11-24
# OPENXLA_XLA_COMMIT_HASH = "90af2896ab4992ff14a1cd2a75ce02e43f46c090" # From 2024-11-24
OPENXLA_XLA_COMMIT_HASH = "e2e8952ad0fac8833e9a78f9b3689e803ff8524f" # From 2024-12-11

http_archive(
name = "xla",
sha256 = "a910124d546bc79edb685612edaa3d56153f0e0927f967e8defaf312b833d404", # From 2024-11-24
sha256 = "5ec6919a25952fa790904983481ccb51ebbe20bbc53e15ddbb6d3e0b3aa3dfe1", # From 2024-12-11
strip_prefix = "xla-" + OPENXLA_XLA_COMMIT_HASH,
urls = [
"https://github.com/openxla/xla/archive/{hash}.zip".format(hash = OPENXLA_XLA_COMMIT_HASH),
Expand Down
8 changes: 3 additions & 5 deletions chelper.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,17 +31,15 @@ func cSizeOf[T any]() C.size_t {
// It must be manually freed with cFree() by the user.
func cMalloc[T any]() (ptr *T) {
size := cSizeOf[T]()
cPtr := (*T)(C.malloc(size))
C.memset(unsafe.Pointer(cPtr), 0, size)
cPtr := (*T)(C.calloc(1, size))
return cPtr
}

// cMallocArray allocates space to hold n copies of T in the C heap and initializes it to zero.
// It must be manually freed with C.free() by the user.
func cMallocArray[T any](n int) (ptr *T) {
size := cSizeOf[T]() * C.size_t(n)
cPtr := (*T)(C.malloc(size))
C.memset(unsafe.Pointer(cPtr), 0, size)
size := cSizeOf[T]()
cPtr := (*T)(C.calloc(C.size_t(n), size))
return cPtr
}

Expand Down
4 changes: 2 additions & 2 deletions cmd/run_coverage.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@

# Run this from the root of gopjrt repository to generate docs/coverage.out with the coverage data.

PACKAGE_COVERAGE="./pjrt ./xlabuilder"
go test -v -cover -coverprofile docs/coverage.out -coverpkg ${PACKAGE_COVERAGE}
PACKAGE_COVERAGE="github.com/gomlx/gopjrt/pjrt,github.com/gomlx/gopjrt/xlabuilder"
go test -cover -coverprofile docs/coverage.out -coverpkg="${PACKAGE_COVERAGE}" ./... -test.count=1 -test.short
go tool cover -func docs/coverage.out -o docs/coverage.out
22 changes: 21 additions & 1 deletion docs/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,31 @@
# Next
# v0.5.0 - 2024/12/19 - Adding direct access to PJRT buffers for CPU.

* Added `install_linux_amd64_amazonlinux.sh` and pre-built libraries for amazonlinux (built using old glibc support).
* Fixed installation scripts: s/sudo/$_SUDO. Also made them more verbose.
* Removed dependency on `xargs` in installation script for Linux.
* Improved documentation on Nvidia GPU card detection, and error message if not found.
* Updated GitHub action (`go.yaml`) to only change the README.md with the result of the change, if pushing to the
`main` branch.
* Added `prjt.arena` to avoid costly allocations for CGO calls, and merged some of CGO calls for general speed-ups.
The following functions had > 50% improvements on their fixed-cost (measured on transfers with 1 value, and minimal programs)
execution time (**not the variable part**):
* `Buffer.ToHost()`
* `Client.BufferFromHost()`
* `LoadedExecutable.Execute()`
* Added `BufferToHost` and `BufferFromHost` benchmarks.
* Added support for environment variable `XLA_DEBUG_OPTIONS`: if set, it is parsed as a `DebugOptions` proto that
is passed to the JIT-compilation of a computation graph.
* `LoadedExecutable.Execute()` now waits for the end of the execution (by setting
`PJRT_LoadedExecutable_Execute_Args.device_complete_events`).
Previous behavior lead to odd behavior and was undefined (not documented).
* Package `dtypes`:
* Added tests;
* Added `SizeForDimensions()` to be used for dtypes that uses fractions of bytes (like 4 bits).
* Added `Client.NewSharedBuffer` (and the lower level `client.CreateViewOfDeviceBuffer()`) to create buffers with shared
memory with the host, for faster input.
* Added `AlignedAlloc` and `AlignedFree` required by `client.CreateViewOfDeviceBuffer`.
* Added `Buffer.Data` for direct access to a buffer's data. Undocumented in PJRT, and likely only works on CPU.
* Fixed coverage script.

# v0.4.9 - 2024-11-25

Expand Down
Loading

0 comments on commit 3e4e41d

Please sign in to comment.