How to cache partial results in graph #157

iacore · 2023-05-15T19:17:05Z

iacore
May 15, 2023

Here is some example code that computes matmul twice.

const std = @import("std");
const c = @cImport({
    @cInclude("ggml.h");
});

pub fn main() !void {
    const ctx = c.ggml_init(.{
        .mem_size = 1024 * 1024 * 4 * 4 + 10000,
        .mem_buffer = null,
        .no_alloc = false,
    });
    const a = c.ggml_new_tensor_2d(ctx, c.GGML_TYPE_F32, 1024, 1024);
    const b = c.ggml_new_tensor_2d(ctx, c.GGML_TYPE_F32, 1024, 1024);
    const r = c.ggml_mul_mat(ctx, a, b);
    var graph = c.ggml_build_forward(r);
    c.ggml_build_forward_expand(&graph, r);
    for (0..4) |_| {
        const t0 = std.time.nanoTimestamp();
        c.ggml_graph_compute(ctx, &graph);
        const t1 = std.time.nanoTimestamp();
        std.log.info("elapsed: {}ns", .{t1 - t0});
    }
}

Output:

❯ zig build -Doptimize=ReleaseSafe run
info: elapsed: 56137027ns
info: elapsed: 87564899ns
info: elapsed: 67225178ns
info: elapsed: 88731115ns

Is there any way this can be automatically cached?

This is only a simple example. With a more complex scenario, caching is not so easy.

The Strassen Algorithm need to preprocess the matrices. In a model, $A*B$ where $A$ is fixed (model weights) and $B$ is dependent on input, the preprocess steps that only dependent on $A$ can be cached. This algorithm is recursive, so a single matrix multiplication may be expanded to a balanced tree -shaped compute graph.

iacore · 2023-05-15T19:17:41Z

iacore
May 15, 2023
Author

tl;dr how to cache part of compute graph

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to cache partial results in graph #157

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to cache partial results in graph #157

iacore May 15, 2023

Replies: 1 comment

iacore May 15, 2023 Author

iacore
May 15, 2023

iacore
May 15, 2023
Author