Improve performance #13

nikola-rados · 2021-02-24T20:11:54Z

While the script is working as intended thus far the performance may become a concern to its viability. With this issue we will seek out ways to improve the speed.

nikola-rados · 2021-02-26T20:50:40Z

Examining the snakeviz output for a request of size 571mb (this is the reported size from Dataset.nbytes / 2) we get a pretty clear picture of what is holding back the performance:

Note: Given the exact same parameters I've seen this time vary quite a bit, anywhere from late 200 seconds to early 400 seconds.

The Dataset.to_netcdf() method takes up basically the entire runtime of the program. If we follow the call stack to the bottom, we see that the method is already using some threading to handle its execution:

Despite this is doesn't seem to do things particularly quickly (at least it feels that way). @cairosanders and I have already tried to incorporate asyncio to simultaneously load the the individual requests but support from xarray of asynchronous tasks is pretty limited. Also the main bottleneck of to_netcdf still exists unfortunately.

I don't know what the performance requirements/expectations are for orca but I get the feeling this may be a little too slow. As such I was hoping to open up some discussion about how to go about potentially speeding this up.

nikola-rados · 2021-02-26T21:17:01Z

To add some more details the results above were achieved by running: make performance which runs a test case that splits a single request into two. Here is a look at the parameters passed into the script (found in the link above):

scripts/process.py -u tasmax_day_BCCAQv2_bcc-csm1-1-m_historical-rcp26_r1i1p1_19500101-21001231_Canada -v tasmax[0:1:15000] -t [0:1:91] -n [0:1:206] -l DEBUG

The original request is split into these two requests:

'https://docker-dev03.pcic.uvic.ca/twitcher/ows/proxy/thredds/dodsC/datasets/storage/data/climate/downscale/BCCAQ2/bccaqv2_with_metadata/tasmax_day_BCCAQv2+ANUSPLIN300_bcc-csm1-1-m_historical+rcp26_r1i1p1_19500101-21001231.nc?tasmax[0:1:7500][0:1:91][0:1:206]'
'https://docker-dev03.pcic.uvic.ca/twitcher/ows/proxy/thredds/dodsC/datasets/storage/data/climate/downscale/BCCAQ2/bccaqv2_with_metadata/tasmax_day_BCCAQv2+ANUSPLIN300_bcc-csm1-1-m_historical+rcp26_r1i1p1_19500101-21001231.nc?tasmax[7501:1:15000][0:1:91][0:1:206]'

These are split in half on the time variable such that both requests are under the threshold.

Here is the full set of logs from the run:

2021-02-26 13:08:15 INFO: Processing data file request
2021-02-26 13:08:15 DEBUG: Starting db session
2021-02-26 13:08:15 DEBUG: Got filepath: /storage/data/climate/downscale/BCCAQ2/bccaqv2_with_metadata/tasmax_day_BCCAQv2+ANUSPLIN300_bcc-csm1-1-m_historical+rcp26_r1i1p1_19500101-21001231.nc
2021-02-26 13:08:15 DEBUG: Initial url: https://docker-dev03.pcic.uvic.ca/twitcher/ows/proxy/thredds/dodsC/datasets/storage/data/climate/downscale/BCCAQ2/bccaqv2_with_metadata/tasmax_day_BCCAQv2+ANUSPLIN300_bcc-csm1-1-m_historical+rcp26_r1i1p1_19500101-21001231.nc?tasmax[0:1:15000][0:1:91][0:1:206]
2021-02-26 13:08:15 INFO: Downloading data file(s)
2021-02-26 13:08:16 DEBUG: Splitting, request over threshold: 571358088.0
2021-02-26 13:08:16 DEBUG: URL(s) for downloading: ['https://docker-dev03.pcic.uvic.ca/twitcher/ows/proxy/thredds/dodsC/datasets/storage/data/climate/downscale/BCCAQ2/bccaqv2_with_metadata/tasmax_day_BCCAQv2+ANUSPLIN300_bcc-csm1-1-m_historical+rcp26_r1i1p1_19500101-21001231.nc?tasmax[0:1:7500][0:1:91][0:1:206]', 'https://docker-dev03.pcic.uvic.ca/twitcher/ows/proxy/thredds/dodsC/datasets/storage/data/climate/downscale/BCCAQ2/bccaqv2_with_metadata/tasmax_day_BCCAQv2+ANUSPLIN300_bcc-csm1-1-m_historical+rcp26_r1i1p1_19500101-21001231.nc?tasmax[7501:1:15000][0:1:91][0:1:206]'])
2021-02-26 13:08:16 DEBUG: Downloading and merging 2 split files
2021-02-26 13:13:57 DEBUG: File writing complete
2021-02-26 13:13:57 INFO: Complete

nikola-rados self-assigned this Feb 24, 2021

nikola-rados mentioned this issue Feb 24, 2021

Reconstruction #12

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance #13

Improve performance #13

nikola-rados commented Feb 24, 2021

nikola-rados commented Feb 26, 2021

nikola-rados commented Feb 26, 2021

Improve performance #13

Improve performance #13

Comments

nikola-rados commented Feb 24, 2021

nikola-rados commented Feb 26, 2021

nikola-rados commented Feb 26, 2021