Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kokkos interface #4

Open
wants to merge 17 commits into
base: metatensor
Choose a base branch
from
Open

Kokkos interface #4

wants to merge 17 commits into from

Conversation

frostedoyster
Copy link

@frostedoyster frostedoyster commented Oct 24, 2024

This PR implements an experimental kokkos-metatensor interface.

Most of the accelerated code is written in torch as opposed to Kokkos, as torch wraps specialized CUDA kernels that are, most often, faster than the code generated by Kokkos.
Only the version with the neighbor remapping is implemented (for now), as the interface is relatively fast, so that in most cases not remapping would cripple the model way
more than slowing down the interface.

The exact consistency among the original interface, the Kokkos::Cuda interface and the Kokkos::OpenMP interface can be verified by running the metatensor and metatensor-kokkos examples.

I also added a readme in the examples folder to document the whole building and running procedure. I think this would be a very good place to point people to so they can compile and run the code. We can then remove it again in the future if needed

I tried to format with clang-format but I get way too many changes and I think I'm doing something wrong, so I didn't format in the end

@frostedoyster frostedoyster marked this pull request as ready for review October 24, 2024 11:57
Copy link
Member

@Luthaf Luthaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of duplicated code with the non-kokkos version which make it harder to understand what's different. I agree with you that we don't want to inherit from the standard pair_style directly, but it might make sense to have a separate base class that handles all the common things, and then leave just the load data from kokkos/create the systems/store data in kokkos as custom code.

I can try to give this a go myself!


Regarding the device, I did not find code that checked that the kokkos device matched the torch one. What should we do it they don't? Currently the code will crash/segfault since it uses kokkos pointers as device points without checking.

cmake/Modules/Packages/ML-METATENSOR.cmake Outdated Show resolved Hide resolved
examples/PACKAGES/metatensor/in.metatensor Outdated Show resolved Hide resolved
examples/PACKAGES/metatensor/in.metatensor Outdated Show resolved Hide resolved
examples/PACKAGES/metatensor/nickel-lj.pt Outdated Show resolved Hide resolved
Comment on lines +31 to +80
template<class DeviceType>
struct MetatensorSystemOptionsKokkos {
// Mapping from LAMMPS types to metatensor types
const int32_t* types_mapping;
const Kokkos::View<int32_t*, Kokkos::LayoutRight, DeviceType> types_mapping_kokkos;
// interaction range of the model, in LAMMPS units
double interaction_range;
// should we run extra checks on the neighbor lists?
bool check_consistency;
};

// data for metatensor neighbors lists
template<class DeviceType>
struct MetatensorNeighborsDataKokkos {
// single neighbors sample containing [i, j, S_a, S_b, S_c]
using sample_t = std::array<int32_t, 5>;

struct SampleHasher {
static void hash_combine(std::size_t& seed, const int32_t& v) {
seed ^= std::hash<int32_t>()(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}

size_t operator()(const sample_t& s) const {
size_t hash = 0;
hash_combine(hash, s[0]);
hash_combine(hash, s[1]);
hash_combine(hash, s[2]);
hash_combine(hash, s[3]);
hash_combine(hash, s[4]);
return hash;
}
};

// cutoff for this NL in LAMMPS units
double cutoff;
// options of the NL as requested by the model
metatensor_torch::NeighborListOptions options;

// Below are cached allocations for the LAMMPS -> metatensor NL translation
// TODO: report memory usage for these?

// we keep the set of samples twice: once in `known_samples` to remove
// duplicated pairs, and once in `samples` in a format that can be
// used to create a torch::Tensor.
std::unordered_set<sample_t, SampleHasher> known_samples;
std::vector<sample_t> samples;
// pairs distances vectors
std::vector<std::array<double, 3>> distances_f64;
std::vector<std::array<float, 3>> distances_f32;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks identical to the non-kokkos one, why is it duplicated here?

Copy link
Author

@frostedoyster frostedoyster Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True for the second class in this snippet, but not for the first one. I think this is one of the very few parts of the code that can be re-used from the original interface, but I'll leave this re-organization up to you

src/KOKKOS/metatensor_system_kokkos.cpp Outdated Show resolved Hide resolved

NeighListKokkos<DeviceType>* list_kk = static_cast<NeighListKokkos<DeviceType>*>(this->list_);

auto numneigh_kk = list_kk->d_numneigh;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure this is always a int32_t, even when lammps is compiled with different integer sizes? https://docs.lammps.org/Build_settings.html#size-of-lammps-integer-types-and-size-limits

Copy link
Author

@frostedoyster frostedoyster Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, you're right. Since we're converting these ints to torch, either we make up a type conversion system or we error out if someone tries to compile these files while LAMMPS's integers are not int32. I would go for the second one


// distance mask
auto interatomic_vectors = positions_tensor.index_select(0, neighbors_tensor_full_or_half) - positions_tensor.index_select(0, centers_tensor_full_or_half);
auto distance_mask = torch::sum(interatomic_vectors.pow(2), 1) < cache.cutoff*cache.cutoff;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch.vector_norm might be a bit faster here

Copy link
Author

@frostedoyster frostedoyster Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave this up to you (I don't think this is a bottleneck for now)

src/KOKKOS/metatensor_system_kokkos.cpp Outdated Show resolved Hide resolved
atom_types_metatensor_kokkos.data(),
{total_n_atoms},
torch::TensorOptions().dtype(torch::kInt32).device(device)
).clone(); // clone because the original memory belongs to Kokkos and will be deallocated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of the clone, we could use the trick of moving the data inside a lambda, and registering the lambda as a custom Tensor destructor. If this is fast/small enough, then it might not be worth it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This array should be slim enough that a copy is irrelevant, but perhaps it might be worth it somewhere else (positions and/or force arrays, which are the largets ones we manipulate between torch and Kokkos). In general, however, my profiling shows that this sort of copy is very far down the things we should try to optimize

@Luthaf
Copy link
Member

Luthaf commented Oct 30, 2024

I also added a readme in the examples folder to document the whole building and running procedure. I think this would be a very good place to point people to so they can compile and run the code. We can then remove it again in the future if needed

The documentation about how to compile the code lives here for now: https://docs.metatensor.org/latest/atomistic/engines/lammps.html#how-to-install-the-code

For the future merge of this back in LAMMPS, the documentation for compilation should be there: https://github.com/metatensor/lammps/blob/metatensor/doc/src/Build_extras.rst#ml-metatensor-package

Comment on lines 342 to 351
// Handle potential mismatch between Kokkos and model devices
if (std::is_same<DeviceType, Kokkos::Cuda>::value) {
if (!mts_data->device.is_cuda()) {
throw std::runtime_error("Kokkos is running on a GPU, but the model is not on a GPU");
}
} else {
if (!mts_data->device.is_cpu()) {
throw std::runtime_error("Kokkos is running on CPU, but the model is not on CPU");
}
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here for the device check

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks!

Should we also change the default device? I.e. if the user is not giving a device, picking the same one kokkos is using instead of the first one in the model capabilities?

Also, this check would fail for alternative kokkos device (HIP, OpenCL, …), so I would rather check explicitly for CPU device instead of defaulting to it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree for the second part (fixed in the latest commit).

Regarding the first part, I think users of this interface would be relatively advanced and they would be able to understand the error message and pick the correct device, while allowing us to re-use the device selection code from the other interface. Anyway, if you want to change the device selection code instead of sharing it with the base interface, I have no objections

@frostedoyster
Copy link
Author

The review items should be fixed, except for those related to code sharing between the two interfaces. I'd rather leave that up to you because you know best what kind of code organization would be acceptable in the LAMMPS main repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants