-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kokkos interface #4
base: metatensor
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of duplicated code with the non-kokkos version which make it harder to understand what's different. I agree with you that we don't want to inherit from the standard pair_style directly, but it might make sense to have a separate base class that handles all the common things, and then leave just the load data from kokkos/create the systems/store data in kokkos as custom code.
I can try to give this a go myself!
Regarding the device, I did not find code that checked that the kokkos device matched the torch one. What should we do it they don't? Currently the code will crash/segfault since it uses kokkos pointers as device points without checking.
template<class DeviceType> | ||
struct MetatensorSystemOptionsKokkos { | ||
// Mapping from LAMMPS types to metatensor types | ||
const int32_t* types_mapping; | ||
const Kokkos::View<int32_t*, Kokkos::LayoutRight, DeviceType> types_mapping_kokkos; | ||
// interaction range of the model, in LAMMPS units | ||
double interaction_range; | ||
// should we run extra checks on the neighbor lists? | ||
bool check_consistency; | ||
}; | ||
|
||
// data for metatensor neighbors lists | ||
template<class DeviceType> | ||
struct MetatensorNeighborsDataKokkos { | ||
// single neighbors sample containing [i, j, S_a, S_b, S_c] | ||
using sample_t = std::array<int32_t, 5>; | ||
|
||
struct SampleHasher { | ||
static void hash_combine(std::size_t& seed, const int32_t& v) { | ||
seed ^= std::hash<int32_t>()(v) + 0x9e3779b9 + (seed<<6) + (seed>>2); | ||
} | ||
|
||
size_t operator()(const sample_t& s) const { | ||
size_t hash = 0; | ||
hash_combine(hash, s[0]); | ||
hash_combine(hash, s[1]); | ||
hash_combine(hash, s[2]); | ||
hash_combine(hash, s[3]); | ||
hash_combine(hash, s[4]); | ||
return hash; | ||
} | ||
}; | ||
|
||
// cutoff for this NL in LAMMPS units | ||
double cutoff; | ||
// options of the NL as requested by the model | ||
metatensor_torch::NeighborListOptions options; | ||
|
||
// Below are cached allocations for the LAMMPS -> metatensor NL translation | ||
// TODO: report memory usage for these? | ||
|
||
// we keep the set of samples twice: once in `known_samples` to remove | ||
// duplicated pairs, and once in `samples` in a format that can be | ||
// used to create a torch::Tensor. | ||
std::unordered_set<sample_t, SampleHasher> known_samples; | ||
std::vector<sample_t> samples; | ||
// pairs distances vectors | ||
std::vector<std::array<double, 3>> distances_f64; | ||
std::vector<std::array<float, 3>> distances_f32; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks identical to the non-kokkos one, why is it duplicated here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True for the second class in this snippet, but not for the first one. I think this is one of the very few parts of the code that can be re-used from the original interface, but I'll leave this re-organization up to you
|
||
NeighListKokkos<DeviceType>* list_kk = static_cast<NeighListKokkos<DeviceType>*>(this->list_); | ||
|
||
auto numneigh_kk = list_kk->d_numneigh; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure this is always a int32_t
, even when lammps is compiled with different integer sizes? https://docs.lammps.org/Build_settings.html#size-of-lammps-integer-types-and-size-limits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes, you're right. Since we're converting these ints to torch, either we make up a type conversion system or we error out if someone tries to compile these files while LAMMPS's integers are not int32
. I would go for the second one
|
||
// distance mask | ||
auto interatomic_vectors = positions_tensor.index_select(0, neighbors_tensor_full_or_half) - positions_tensor.index_select(0, centers_tensor_full_or_half); | ||
auto distance_mask = torch::sum(interatomic_vectors.pow(2), 1) < cache.cutoff*cache.cutoff; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torch.vector_norm
might be a bit faster here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I leave this up to you (I don't think this is a bottleneck for now)
atom_types_metatensor_kokkos.data(), | ||
{total_n_atoms}, | ||
torch::TensorOptions().dtype(torch::kInt32).device(device) | ||
).clone(); // clone because the original memory belongs to Kokkos and will be deallocated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of the clone, we could use the trick of moving the data inside a lambda, and registering the lambda as a custom Tensor destructor. If this is fast/small enough, then it might not be worth it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This array should be slim enough that a copy is irrelevant, but perhaps it might be worth it somewhere else (positions and/or force arrays, which are the largets ones we manipulate between torch and Kokkos). In general, however, my profiling shows that this sort of copy is very far down the things we should try to optimize
The documentation about how to compile the code lives here for now: https://docs.metatensor.org/latest/atomistic/engines/lammps.html#how-to-install-the-code For the future merge of this back in LAMMPS, the documentation for compilation should be there: https://github.com/metatensor/lammps/blob/metatensor/doc/src/Build_extras.rst#ml-metatensor-package |
// Handle potential mismatch between Kokkos and model devices | ||
if (std::is_same<DeviceType, Kokkos::Cuda>::value) { | ||
if (!mts_data->device.is_cuda()) { | ||
throw std::runtime_error("Kokkos is running on a GPU, but the model is not on a GPU"); | ||
} | ||
} else { | ||
if (!mts_data->device.is_cpu()) { | ||
throw std::runtime_error("Kokkos is running on CPU, but the model is not on CPU"); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here for the device check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks!
Should we also change the default device? I.e. if the user is not giving a device, picking the same one kokkos is using instead of the first one in the model capabilities?
Also, this check would fail for alternative kokkos device (HIP, OpenCL, …), so I would rather check explicitly for CPU device instead of defaulting to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree for the second part (fixed in the latest commit).
Regarding the first part, I think users of this interface would be relatively advanced and they would be able to understand the error message and pick the correct device, while allowing us to re-use the device selection code from the other interface. Anyway, if you want to change the device selection code instead of sharing it with the base interface, I have no objections
Co-authored-by: Guillaume Fraux <[email protected]>
Co-authored-by: Guillaume Fraux <[email protected]>
Co-authored-by: Guillaume Fraux <[email protected]>
…istent with non-kokkos example
The review items should be fixed, except for those related to code sharing between the two interfaces. I'd rather leave that up to you because you know best what kind of code organization would be acceptable in the LAMMPS main repository |
This PR implements an experimental kokkos-metatensor interface.
Most of the accelerated code is written in torch as opposed to Kokkos, as torch wraps specialized CUDA kernels that are, most often, faster than the code generated by Kokkos.
Only the version with the neighbor remapping is implemented (for now), as the interface is relatively fast, so that in most cases not remapping would cripple the model way
more than slowing down the interface.
The exact consistency among the original interface, the
Kokkos::Cuda
interface and theKokkos::OpenMP
interface can be verified by running the metatensor and metatensor-kokkos examples.I also added a readme in the examples folder to document the whole building and running procedure. I think this would be a very good place to point people to so they can compile and run the code. We can then remove it again in the future if needed
I tried to format with
clang-format
but I get way too many changes and I think I'm doing something wrong, so I didn't format in the end