Skip to content

Commit

Permalink
upd
Browse files Browse the repository at this point in the history
  • Loading branch information
haykh committed Jul 29, 2024
1 parent e844ba9 commit 4197060
Show file tree
Hide file tree
Showing 7 changed files with 259 additions and 98 deletions.
Binary file added docs/assets/images/howto/nt2-demo-5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
95 changes: 94 additions & 1 deletion docs/code/problem_generators.md
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,99 @@ Again, as everything else in the problem generator, the force (rather, the accel
Note, that among the functions mentioned throughout this section, you may specify only the ones you actually need, and ignore the ones you don't (i.e., there is no need to provide dummy functions that return zero), as the code will automatically determine at compile-time which functions are present.
## Custom field output
The code also allows for custom-defined fields to be written together with other field quantities during the output. To enable that, simply define the name of your field in the input file:
```toml
[output.fields]
...
custom = ["my_field"]
...
```

There can be as many custom fields as one needs. And then in the problem generator, populate the corresponding field by defining the following function:

```c++
void CustomFieldOutput(const std::string& name,//(1)!
ndfield_t<M::Dim, 6> buffer,//(2)!
std::size_t index,//(3)!
const Domain<S, M>& domain) {//(4)!
if (name == "my_field") {
// 1D example (can be easily generalized)
if constexpr (M::Dim == Dim::_1D) {
const auto& EM = domain.fields.em;
Kokkos::parallel_for(
"MyField",
domain.mesh.rangeActiveCells(),
Lambda(index_t i1) {
const auto i1_ = COORD(i1);
coord_t<M::Dim> x_Ph { ZERO };
// convert coordinate to physical basis:
metric.template convert<Crd::Cd, Crd::Ph>({ i1_ }, x_Ph);
// compute whatever needs to be written
// ... may also depend on the EM fields from the `domain`
// ... in this example -- output Ex * x^2
buffer(i1, index) = SQR(x_Ph[0]) *
metric.template transform<1, Idx::U, Idx::T>(
{ i1_ + HALF },
EM(i1, em::ex1));
// here we also convert Ex1(i + 1/2) to Tetrad basis
});
}
} else {
raise::Error("Custom output not provided", HERE);
}
}
```
1. the same name that went into the input file
2. buffer array where the field is going to be written into
3. an index of the buffer array where the field is written into
4. reference of the local subdomain
Alternatively, you can precompute the desired quantity in the `CustomPostStep` function and then simply copy to the buffer in the same function:
```c++
// assuming 2D and that the desired quantity is saved in `cbuff`
template <SimEngine::type S, class M>
struct PGen : public arch::ProblemGenerator<S, M> {
// ...
array_t<real_t**> cbuff;
// ...
void CustomPostStep(std::size_t step, long double, Domain<S, M>& domain) {
if (step == 0) {
// allocate the array at time = 0
cbuff = array_t<real_t**>("cbuff",
domain.mesh.n_all(in::x1),
domain.mesh.n_all(in::x2));
}
// populate the buffer (can be done at specific timesteps)
Kokkos::parallel_for(
"FillCbuff",
domain.mesh.rangeActiveCells(),
Lambda(index_t i1, index_t i2) {
// ...
});
}
void CustomFieldOutput(const std::string& name,
ndfield_t<M::Dim, 6> buffer,
std::size_t index,
const Domain<S, M>&) {
if (name == "my_field") {
Kokkos::deep_copy(Kokkos::subview(buffer, Kokkos::ALL, Kokkos::ALL, index), cbuff);
} else {
// ...
}
}
};
```

Keep in mind that the custom field output is written as-is, i.e., no additional interpolation or transformation is applied. So make sure the quantity you output is covariant (i.e., does not depend on the resolution or the stretching of coordinates; essentially, always output "physical" covariant/contravariant vectors or transform them to the tetrad basis).

{% include "html/d3js.html" %}
<script src="../atm-boundaries.js">
<script src="../atm-boundaries.js">
65 changes: 65 additions & 0 deletions docs/getting-started/compile-run.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,3 +130,68 @@ To run only specific tests, you can use the `-R` flag followed by the regular ex
```shell
ctest --test-dir build/ -R particle
```

## Specific architectures


### HIP/ROCm @ AMD GPUs

Compiling on AMD GPUs is typically not an issue:

1. Make sure you have the ROCm library loaded: e.g., run `rocminfo`;
2. Sometimes the environment variables are not properly set up, so make sure you have the following variables properly defined:

- `CMAKE_PREFIX_PATH=/opt/rocm` (or wherever ROCm is installed),
- `CC=hipcc` & `CXX=hipcc`,
- in rare occasions, you might have to also explicitly pass `-D CMAKE_CXX_COMPILER=hipcc -D CMAKE_C_COMPILER=hipcc` to cmake during the configuration stage;

3. Compile the code with proper Kokkos flags; i.e., for MI250x GPUs you would use: `-D Kokkos_ENABLE_HIP=ON` and `-D Kokkos_ARCH_AMD_GFX90A=ON`.

Now running is a bit trickier and the exact instruction might vary from machine to machine (part of it is because ROCm is much less streamlined than CUDA, but also system administrators on clusters are often more negligent towards AMD GPUs).

* If you are running this on a cluster -- the first thing to do is to inspect the documentation of the cluster. There you might find the proper `slurm` command for requesting GPU nodes and binding each GPU to respective CPUs.

* On personal machines figuring this out is a bit easier. First, inspect the output of `rocminfo` and `rocm-smi`. From there, you should be able to find the ID of the GPU you want to use. If you see more than one device -- that means you either have an additional AMD CPU, or an integrated GPU installed as well; ignore them. You will need to override two environment variables:

- `HSA_OVERRIDE_GFX_VERSION` set to GFX version that you used to compile the code (if you used `GFX1100` Kokkos flag, that would be `11.0.0`);
- `HIP_VISIBLE_DEVICES`, and `ROCR_VISIBLE_DEVICES` both need to be set to your device ID (usually, it's just a number from 0 to the number of devices that support HIP).

For example, the output of `rocminfo | grep -A 5 "Agent "` may look like this:
```
Agent 1
*******
Name: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
Uuid: CPU-XX
Marketing Name: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
Vendor Name: CPU
--
Agent 2
*******
Name: gfx1100
Uuid: GPU-XX
Marketing Name: AMD Radeon™ RX 7700S
Vendor Name: AMD
--
Agent 3
*******
Name: gfx1100
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
```
In this case, the required GPU is the `Agent 2`, which supports GFX1100. `rocm-smi` will look something like this:
```
============================================ ROCm System Management Interface ============================================
====================================================== Concise Info ======================================================
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
(DID, GUID) (Edge) (Avg) (Mem, Compute, ID)
==========================================================================================================================
0 1 0x7480, 19047 35.0°C 0.0W N/A, N/A, 0 0Mhz 96Mhz 29.8% auto 100.0W 0% 0%
1 2 0x15bf, 17218 48.0°C 19.111W N/A, N/A, 0 None 1000Mhz 0% auto Unsupported 82% 5%
==========================================================================================================================
================================================== End of ROCm SMI Log ===================================================
```
so the GPU we need has `Device` ID of `0` (since it's the dedicated GPU, it might automatically turn off when idle to save power on laptops; hence `Power = 0.0W`). Now we can run the code with:
```sh
HSA_OVERRIDE_GFX_VERSION=11.0.0 HIP_VISIBLE_DEVICES=0 ROCR_VISIBLE_DEVICES=0 ./executable ...
```
2 changes: 1 addition & 1 deletion docs/getting-started/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ We provide the following configurations (tags) as separate images:
Images for these containers are all stored on the [Docker hub](https://hub.docker.com/repository/docker/morninbru/entity/general). If you do not wish to use `docker compose` to download the pre-made ones from the Docker hub, you may also build the images yourself from the corresponding `Dockerfile`-s also provided with the source code. You can do that by going to the `dev/` directory in the root of the source code, and running:

```sh
docker build --no-cache -t myentity:<toolkit> -f Dockerfile.<toolkit>
docker build --no-cache -t myentity:<toolkit> -f Dockerfile.<toolkit> .
```

substituting one of the values for the `<toolkit>` mentioned above. You may then launch a container using the built image by running the following from the code source directory (or any other directory you wish to mount inside the container):
Expand Down
68 changes: 12 additions & 56 deletions docs/getting-started/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,69 +7,25 @@ hide:

Here we collect the most frequent questions that might occur. Please, make sure to inspect this section before filing a GitHub issue.

## Code usage

??? faq "Compiling & running on AMD GPUs"
??? faq "I want to have a custom ..."

Compiling on AMD GPUs is typically not an issue:
Here, "..." can be "boundary/injection conditions," "driving," "injection," "distribution function," "output." All of that *can* be done via the tools provided by the problem generator. Please inspect carefully the [section dedicated to that](../code/problem_generators.md).

1. Make sure you have the ROCm library loaded: e.g., run `rocminfo`;
2. Sometimes the environment variables are not properly set up, so make sure you have the following variables properly defined:

- `CMAKE_PREFIX_PATH=/opt/rocm` (or wherever ROCm is installed),
- `CC=hipcc` & `CXX=hipcc`

3. Compile the code with proper Kokkos flags; i.e., for MI250x GPUs you would use: `-D Kokkos_ENABLE_HIP=ON` and `-D Kokkos_ARCH_AMD_GFX1100=ON`.
## Technical

Now running is a bit trickier and the exact instruction might vary from machine to machine (part of it is because ROCm is much less streamlined than CUDA, but also system administrators on clusters are often more negligent towards AMD GPUs).

* If you are running this on a cluster -- the first thing to do is to inspect the documentation of the cluster. There you might find the proper `slurm` command for requesting GPU nodes and binding each GPU to respective CPUs.

* On personal machines figuring this out is a bit easier. First, inspect the output of `rocminfo` and `rocm-smi`. From there, you should be able to find the ID of the GPU you want to use. If you see more than one device -- that means you either have an additional AMD CPU, or an integrated GPU installed as well; ignore them. You will need to override two environment variables:

- `HSA_OVERRIDE_GFX_VERSION` set to GFX version that you used to compile the code (for most recent AMD cards that would be `11.0.0` (if you used `GFX1100` Kokkos flag);
- `HIP_VISIBLE_DEVICES`, and `ROCR_VISIBLE_DEVICES` both need to be set to your device ID (usually, it's just a number from 0 to the number of devices that support HIP).
??? faq "Running in a `docker` container with an AMD card"

For example, the output of `rocminfo | grep -A 5 "Agent "` may look like this:
```
Agent 1
*******
Name: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
Uuid: CPU-XX
Marketing Name: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
Vendor Name: CPU
--
Agent 2
*******
Name: gfx1100
Uuid: GPU-XX
Marketing Name: AMD Radeon™ RX 7700S
Vendor Name: AMD
--
Agent 3
*******
Name: gfx1100
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
```
In this case, the required GPU is the `Agent 2`, which supports GFX1100. `rocm-smi` will look something like this:
```
============================================ ROCm System Management Interface ============================================
====================================================== Concise Info ======================================================
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
(DID, GUID) (Edge) (Avg) (Mem, Compute, ID)
==========================================================================================================================
0 1 0x7480, 19047 35.0°C 0.0W N/A, N/A, 0 0Mhz 96Mhz 29.8% auto 100.0W 0% 0%
1 2 0x15bf, 17218 48.0°C 19.111W N/A, N/A, 0 None 1000Mhz 0% auto Unsupported 82% 5%
==========================================================================================================================
================================================== End of ROCm SMI Log ===================================================
```
so the GPU we need has `Device` ID of `0` (since it's the dedicated GPU, it might automatically turn off when idle to save power on laptops; hence `Power = 0.0W`). Now we can run the code with:
```sh
HSA_OVERRIDE_GFX_VERSION=11.0.0 HIP_VISIBLE_DEVICES=0 ROCR_VISIBLE_DEVICES=0 ./executable ...
```
AMD has a vary [brief documentation](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html) on the topic. In theory the `docker` containers that come with the code should work. Just make sure you have the proper groups (`render` and `video`) defined and added to the current user. If it complains about access to `/dev/kfd`, You might have to run docker as a root.


??? faq "Running in a `docker` container with an AMD card"
??? faq "Compilation errors"

Before merging with the released stable version, the code is tested on CUDA and HIP GPU compilers, as well as few version of CPU compilers (GCC 9...11, and LLVM 13...17). If you are encountering compiler errors on GPUs, first thing to check is whether the compilers are set up properly (i.e., whether CMake indeed captures the right compilers). Here are a few tips:

AMD has a vary [brief documentation](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html) on the topic. In theory the `docker` containers that come with the code should work. Just make sure you have the proper groups (`render` and `video`) defined and added to the current user. If it complains about access to `/dev/kfd`, You might have to run docker as a root.
- CUDA @ NVIDIA GPUs: make sure you have a version of `gcc` which is supported by the version of CUDA you are using; check out [this unofficial compatibility matrix](https://gist.github.com/ax3l/9489132#nvcc). In particular, Intel compilers are not very compatible with CUDA, and it is recommended to use `gcc` instead (you won't gain much by using Intel anyway, since CUDA will be doing the heavy-lifting).

- HIP/ROCm @ AMD GPUs: ROCm library is a headache. The documentation is even more so. We have a [dedicated section](./compile-run.md#hiprocm-amd-gpus) specifically discussing compilation with HIP. Make sure to check it before opening an issue.
Loading

0 comments on commit 4197060

Please sign in to comment.