Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example: Porting a hash function for bulk processing and stopping at the result #240

Open
polarathene opened this issue Aug 4, 2020 · 0 comments
Assignees
Labels
Milestone

Comments

@polarathene
Copy link

Description

From the original issue: #52 (comment)

The discussion there resulted in this demo code to run a simple hash function FNV1a-32 from CPU rust code to on the GPU with arrayfire-rust. It presently for simplicity has a small static input to process and no additional logic for iterating through an input larger than available memory(permutation generator) to halt processing once a target result is matched.

@9prady9 has pointed out some minor changes to improve performance and requested more code regarding how performance was being measured/profiled(probably another good example in itself to include). He then also suggested that performance for this type of computation would be better suited with a custom kernel tuned for it than relying on ArrayFire API JIT kernel alone(another useful example that could derive from this one).

This could become a nice example that further examples could expand upon to cover how one approaches certain tasks from CPU to GPU, along with better covering areas of the ArrayFire API that may not be understood as well from API documentation alone(data transfer between GPU and host, control flow, custom kernels, measuring time taken with JIT in mind).

Click for example code
use arrayfire as af;

fn main() {
    // af::set_backend(Backend::CUDA);//Backend::OPENCL;//Backend::DEFAULT);
    af::init();
    af::info();


    /////////////////////////////////
    // Generate string inputs to hash
    /////////////////////////////////
    // Inputs are collapsed into a sequence/stream of bytes, single array dimension
    let test_strings = vec!["hello", "howdy", "hallo"];
    let test_bytes = test_strings.iter()
        .flat_map(|s| s.as_bytes())
        .cloned()
        .collect::<Vec<u8>>();
    // println!("test_strings as byte array: {:?}", test_bytes);
    // [104, 101, 108, 108, 111, 104, 111, 119, 100, 121, 104, 97, 108, 108, 111]


    ///////////////////////////
    // Initialize values for AF
    ///////////////////////////
    // Used to size AF array dimensions
    let input_len = test_strings[0].len() as u64;
    let num_inputs = test_strings.len() as u64;

    // Convert to an ArrayFire 2D array (Column Major)
    // 5x3 col(length of bytes for input) x row(number of inputs)
    let inputs_dims = af::Dim4::new(&[input_len, num_inputs, 1, 1]); // [5, 3, 1, 1]
    let inputs = af::Array::new(&test_bytes, inputs_dims);
    // Each input now has it's bytes in it's own column
    // print(&inputs);
    // [5 3 1 1]
    //    104        104        104 
    //    101        111         97 
    //    108        119        108 
    //    108        100        108 
    //    111        121        111 

    let fnv_dims = af::Dim4::new(&[1, num_inputs, 1, 1]); // [1, 3, 1, 1]


    ////////////////////////////
    // Compute hashes for inputs
    ////////////////////////////
    let hashes = fnv1a32_gpu(&inputs, fnv_dims);
    // print(&hashes);
    // [1 3 1 1]
    // 1335831723 3497709110 4182196071
    // println!("hashes: {:x} | {:x} | {:x}", 1_335_831_723_u32, 3_497_709_110_u32, 4_182_196_071_u32);
    // 4f9f2cab | d07ace36 | f9473f67


    ///////////////////////
    // Find matching hashes
    ///////////////////////
    // Creates an AF array filled with the same target value to match for,
    let target_hashes: af::Array<u32> = af::constant(0x4f9f2cab as u32, fnv_dims);
    let matches = check_for_matches(hashes, target_hashes);
    // println!("matched: {:?}", match_data);
    // matched: [1335831723]
    for matched in matches {
        println!("Matched the hash: {:x}", &matched);
    };
    // Matched the hash: 4f9f2cab
}

// FNV-1a (32-bit) hashing algorithm
const FNV_OFFSET: u32 = 2_166_136_261;
const FNV_PRIME: u32 = 16_777_619;
pub fn fnv1a32_gpu(inputs: &af::Array<u8>, fnv_dims: af::Dim4) -> af::Array<u32> {
    let input_len = inputs.dims().get()[0];

    let mut hashes = af::constant(FNV_OFFSET, fnv_dims);
    let prime = af::constant(FNV_PRIME, fnv_dims);

    // Iterates through all inputs(columns) in parallel a byte each at a time(row)
    for row_index in 0..input_len {
        hashes = (hashes ^ af::row(&inputs, row_index)) * &prime;
    }

    hashes
}

fn filter_matches(array: &af::Array<u32>, bools: &af::Array<bool>) -> af::Array<u32> {
    let indices = &af::locate(bools);
    let mut idxr = af::Indexer::default();
    idxr.set_index(indices, 0, None);
    af::index_gen(array, idxr)
}

fn check_for_matches(hashes: af::Array<u32>, target_hashes: af::Array<u32>) -> Vec<u32> {
    // If we computed the same hash value, then `eq()` will let us know which values matched
    let is_matched: af::Array<bool> = af::eq(&hashes, &target_hashes, false);

    // Only keep results that were matched
    let result = filter_matches(&hashes, &is_matched);

    // Transfer the matches to the host CPU to access
    let length = result.elements() as usize;
    let mut match_data: Vec<u32> = vec![0; length];
    result.host::<u32>(&mut match_data);

    match_data
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Features & Improvements
Development

No branches or pull requests

2 participants