Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAM and CPU utilisation #77

Open
GreatEmerald opened this issue Aug 14, 2017 · 0 comments
Open

RAM and CPU utilisation #77

GreatEmerald opened this issue Aug 14, 2017 · 0 comments

Comments

@GreatEmerald
Copy link

While running a whole bunch of processLandsatBatch jobs, I noticed that the CPU and RAM of the machine doing the processing were very underutilised (~30% CPU usage, ~5 GiB RAM used, the machine has 8 threads and 16 GiB RAM), and there was a lot of disk activity, including writing files into the raster tmp directory.

I figured that this is due to the fact that raster sets a maximum memory consumption level and sizes of chunks in case the raster(s) do not fit into memory. The default maximum memory consumption set in rasterOptions() is 1e+08 (in cells; Landsat VIs are in 16 bits per pixel), and the default chunk size is 1e+07. One Landsat tile is considered by raster to be too large for processing (since the mask also needs to be loaded, and the result stored in memory too, which makes the memory consumption requirement at 3 such tiles). So the rasters are split into chunks which are then written into a temporary file.

As a testcase, I took nine Landsat 8 tiles with five vegetation indices each and ran the processing function on them to see what the effect of these settings were. The results were:

  • Defaults: 704 seconds
  • maxmemory set to 1.3e+08, chunksize set to 3.2e+07: 711 seconds
  • maxmemory set to 2e+08, chunksize set to 5e+07, 5 cores: 385 seconds
  • maxmemory set to 2e+08, chunksize set to 5e+07, 8 cores: 389 seconds

The RAM consumption was higher in each step. Apparently increasing the chunk size doesn't help (raster splits rasters into 4 chunks at the minimum, so this just decreased the chunk size from 6 to 4). But increasing the maximum memory consumption so that the raster can all fit into memory made the process twice as fast! And used no temporary files, which is great for conserving disk space. It was twice as fast even when using just 5 out of 8 threads. Increasing the thread count to 8 (which is dangerous, because that means maxmemory is actually set to more than the total amount of RAM) didn't help any (possibly due to hyperthreading, there are 4 physical cores on this machine; or maybe because there were 9 tiles to process, and 5+4 is about as fast as 8+1).

So overall tuning the memory consumption allows for huge gains in processing speed, and it's better to use more RAM to not have to use the disk rather than to use more threads and wait for I/O for every chunk. At least with rotational drives and hyperthreading. This might be worth mentioning in the tutorial or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant