You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While running a whole bunch of processLandsatBatch jobs, I noticed that the CPU and RAM of the machine doing the processing were very underutilised (~30% CPU usage, ~5 GiB RAM used, the machine has 8 threads and 16 GiB RAM), and there was a lot of disk activity, including writing files into the raster tmp directory.
I figured that this is due to the fact that raster sets a maximum memory consumption level and sizes of chunks in case the raster(s) do not fit into memory. The default maximum memory consumption set in rasterOptions() is 1e+08 (in cells; Landsat VIs are in 16 bits per pixel), and the default chunk size is 1e+07. One Landsat tile is considered by raster to be too large for processing (since the mask also needs to be loaded, and the result stored in memory too, which makes the memory consumption requirement at 3 such tiles). So the rasters are split into chunks which are then written into a temporary file.
As a testcase, I took nine Landsat 8 tiles with five vegetation indices each and ran the processing function on them to see what the effect of these settings were. The results were:
Defaults: 704 seconds
maxmemory set to 1.3e+08, chunksize set to 3.2e+07: 711 seconds
maxmemory set to 2e+08, chunksize set to 5e+07, 5 cores: 385 seconds
maxmemory set to 2e+08, chunksize set to 5e+07, 8 cores: 389 seconds
The RAM consumption was higher in each step. Apparently increasing the chunk size doesn't help (raster splits rasters into 4 chunks at the minimum, so this just decreased the chunk size from 6 to 4). But increasing the maximum memory consumption so that the raster can all fit into memory made the process twice as fast! And used no temporary files, which is great for conserving disk space. It was twice as fast even when using just 5 out of 8 threads. Increasing the thread count to 8 (which is dangerous, because that means maxmemory is actually set to more than the total amount of RAM) didn't help any (possibly due to hyperthreading, there are 4 physical cores on this machine; or maybe because there were 9 tiles to process, and 5+4 is about as fast as 8+1).
So overall tuning the memory consumption allows for huge gains in processing speed, and it's better to use more RAM to not have to use the disk rather than to use more threads and wait for I/O for every chunk. At least with rotational drives and hyperthreading. This might be worth mentioning in the tutorial or so.
The text was updated successfully, but these errors were encountered:
While running a whole bunch of
processLandsatBatch
jobs, I noticed that the CPU and RAM of the machine doing the processing were very underutilised (~30% CPU usage, ~5 GiB RAM used, the machine has 8 threads and 16 GiB RAM), and there was a lot of disk activity, including writing files into the rastertmp
directory.I figured that this is due to the fact that
raster
sets a maximum memory consumption level and sizes of chunks in case the raster(s) do not fit into memory. The default maximum memory consumption set inrasterOptions()
is1e+08
(in cells; Landsat VIs are in 16 bits per pixel), and the default chunk size is1e+07
. One Landsat tile is considered byraster
to be too large for processing (since the mask also needs to be loaded, and the result stored in memory too, which makes the memory consumption requirement at 3 such tiles). So the rasters are split into chunks which are then written into a temporary file.As a testcase, I took nine Landsat 8 tiles with five vegetation indices each and ran the processing function on them to see what the effect of these settings were. The results were:
maxmemory
set to1.3e+08
,chunksize
set to3.2e+07
: 711 secondsmaxmemory
set to2e+08
,chunksize
set to5e+07
, 5 cores: 385 secondsmaxmemory
set to2e+08
,chunksize
set to5e+07
, 8 cores: 389 secondsThe RAM consumption was higher in each step. Apparently increasing the chunk size doesn't help (
raster
splits rasters into 4 chunks at the minimum, so this just decreased the chunk size from 6 to 4). But increasing the maximum memory consumption so that the raster can all fit into memory made the process twice as fast! And used no temporary files, which is great for conserving disk space. It was twice as fast even when using just 5 out of 8 threads. Increasing the thread count to 8 (which is dangerous, because that meansmaxmemory
is actually set to more than the total amount of RAM) didn't help any (possibly due to hyperthreading, there are 4 physical cores on this machine; or maybe because there were 9 tiles to process, and 5+4 is about as fast as 8+1).So overall tuning the memory consumption allows for huge gains in processing speed, and it's better to use more RAM to not have to use the disk rather than to use more threads and wait for I/O for every chunk. At least with rotational drives and hyperthreading. This might be worth mentioning in the tutorial or so.
The text was updated successfully, but these errors were encountered: