From 19fbaff8a755356cd0ebe57f2c9703d28dd26cf6 Mon Sep 17 00:00:00 2001 From: Philippe Miron Date: Thu, 14 Apr 2022 23:17:03 -0400 Subject: [PATCH] latest comments --- ..._data_benchmarking_typical_workflows.ipynb | 2461 ++++++++++------- 1 file changed, 1442 insertions(+), 1019 deletions(-) diff --git a/PM_05_Accelerating_Lagrangian_analyses_of_oceanic_data_benchmarking_typical_workflows.ipynb b/PM_05_Accelerating_Lagrangian_analyses_of_oceanic_data_benchmarking_typical_workflows.ipynb index c3afcb4..0bd97d1 100644 --- a/PM_05_Accelerating_Lagrangian_analyses_of_oceanic_data_benchmarking_typical_workflows.ipynb +++ b/PM_05_Accelerating_Lagrangian_analyses_of_oceanic_data_benchmarking_typical_workflows.ipynb @@ -283,7 +283,7 @@ "2. extracting data within given geographical and/or temporal windows (e.g. Gulf of Mexico),\n", "3. analyses per trajectory (e.g. single statistics, spectral estimation by Fast Fourier Transform).\n", "\n", - "Since the CloudDrift project aims at accelerating the use of Lagrangian data for atmospheric, oceanic, and climate sciences, we hope that the users of this notebook will provide us with feedback on its ease of use and the intuitiveness of the proposed methods in order to guide the on-going development of the clouddrift python package.\n", + "Since the *CloudDrift* project aims at accelerating the use of Lagrangian data for atmospheric, oceanic, and climate sciences, we hope that the users of this notebook will provide us with feedback on its ease of use and the intuitiveness of the proposed methods in order to guide the on-going development of the *clouddrift* python package.\n", "\n", "## Technical contributions\n", "\n", @@ -304,7 +304,7 @@ "\n", "In terms of data file format, we tested both NetCDF and Parquet file formats but did not find significant performance gain from using one or the other. Because NetCDF is a well-known and established file format in Earth sciences, we save the contiguous ragged array as a single NetCDF archive. \n", "\n", - "In terms of python packages, we find that *Pandas* is intuitive with a simple syntax but does not perform efficiently with large dataset. The complete GDP hourly dataset is currently *only* ~15 GB, but as part of *clouddrift* we also want to support larger Lagrangian datasets (>100 GB). On the other hand, *xarray* can interface with *Dask* to efficiently *lazy-load* large dataset but it requires custom adaptation to operate on a ragged array. In contrast, *Awkward Array* provides a novel approach by storing alongside the data an offset index in a manner that is transparent to the user, simplifying the analysis of non-uniform Lagrangian datasets. We find that it is also *fast* and can easily interface with *Numba* to further improve performances.\n", + "In terms of python packages, we find that *Pandas* is intuitive with a simple syntax but does not perform efficiently with large dataset. The complete GDP hourly dataset is currently *only* ~15 GB, but as part of *CloudDrift* we also want to support larger Lagrangian datasets (>100 GB). On the other hand, *xarray* can interface with *Dask* to efficiently *lazy-load* large dataset but it requires custom adaptation to operate on a ragged array. In contrast, *Awkward Array* provides a novel approach by storing alongside the data an offset index in a manner that is transparent to the user, simplifying the analysis of non-uniform Lagrangian datasets. We find that it is also *fast* and can easily interface with *Numba* to further improve performances.\n", "\n", "In terms of benchmark speed, each package show similar results for the geographical binning (test 1) and the operation per trajectory (test 3) benchmarks. For the extraction of a given region (test 2), *xarray* was found to be slower than both *Pandas* and *Awkward Array*. We note that speed performance may not the deciding factor for all users and we believe that ease of use and simple intuitive syntax are also important." ] @@ -370,7 +370,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 4, "metadata": {}, "outputs": [], "source": [ @@ -459,7 +459,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 7, "metadata": { "tags": [] }, @@ -468,33 +468,19 @@ "name": "stdout", "output_type": "stream", "text": [ - "Fetching the 500 requested netCDF files (as a reference ~2min for 500 files).\n" - ] - }, - { - "ename": "KeyboardInterrupt", - "evalue": "", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)", - "File \u001b[0;32m:30\u001b[0m, in \u001b[0;36m\u001b[0;34m\u001b[0m\n", - "File \u001b[0;32m~/opt/miniconda3/envs/earthcube/lib/python3.10/concurrent/futures/_base.py:637\u001b[0m, in \u001b[0;36mExecutor.__exit__\u001b[0;34m(self, exc_type, exc_val, exc_tb)\u001b[0m\n\u001b[1;32m 636\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m__exit__\u001b[39m(\u001b[38;5;28mself\u001b[39m, exc_type, exc_val, exc_tb):\n\u001b[0;32m--> 637\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mshutdown\u001b[49m\u001b[43m(\u001b[49m\u001b[43mwait\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m)\u001b[49m\n\u001b[1;32m 638\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;01mFalse\u001b[39;00m\n", - "File \u001b[0;32m~/opt/miniconda3/envs/earthcube/lib/python3.10/concurrent/futures/thread.py:235\u001b[0m, in \u001b[0;36mThreadPoolExecutor.shutdown\u001b[0;34m(self, wait, cancel_futures)\u001b[0m\n\u001b[1;32m 233\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m wait:\n\u001b[1;32m 234\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m t \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_threads:\n\u001b[0;32m--> 235\u001b[0m \u001b[43mt\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mjoin\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n", - "File \u001b[0;32m~/opt/miniconda3/envs/earthcube/lib/python3.10/threading.py:1089\u001b[0m, in \u001b[0;36mThread.join\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m 1086\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mRuntimeError\u001b[39;00m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcannot join current thread\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 1088\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m timeout \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m-> 1089\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_wait_for_tstate_lock\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1090\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 1091\u001b[0m \u001b[38;5;66;03m# the behavior of a negative timeout isn't documented, but\u001b[39;00m\n\u001b[1;32m 1092\u001b[0m \u001b[38;5;66;03m# historically .join(timeout=x) for x<0 has acted as if timeout=0\u001b[39;00m\n\u001b[1;32m 1093\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_wait_for_tstate_lock(timeout\u001b[38;5;241m=\u001b[39m\u001b[38;5;28mmax\u001b[39m(timeout, \u001b[38;5;241m0\u001b[39m))\n", - "File \u001b[0;32m~/opt/miniconda3/envs/earthcube/lib/python3.10/threading.py:1109\u001b[0m, in \u001b[0;36mThread._wait_for_tstate_lock\u001b[0;34m(self, block, timeout)\u001b[0m\n\u001b[1;32m 1106\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m\n\u001b[1;32m 1108\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m-> 1109\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[43mlock\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43macquire\u001b[49m\u001b[43m(\u001b[49m\u001b[43mblock\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtimeout\u001b[49m\u001b[43m)\u001b[49m:\n\u001b[1;32m 1110\u001b[0m lock\u001b[38;5;241m.\u001b[39mrelease()\n\u001b[1;32m 1111\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_stop()\n", - "\u001b[0;31mKeyboardInterrupt\u001b[0m: " + "Fetching the 500 requested netCDF files (as a reference ~2min for 500 files).\n", + "CPU times: user 88.3 ms, sys: 24.4 ms, total: 113 ms\n", + "Wall time: 3.03 s\n" ] } ], "source": [ "%%time\n", "\n", - "# output folder and official GDP ftp:\n", + "# output folder and official GDP https server\n", "# Note: If you are running this notebook on a local computer and have already downloaded the individual NetCDF files \n", "# independently of this notebook, you can move/copy these files to the folder destination shown below, \n", "# or alternatively change the variable 'folder' to your folder with the data\n", - "# Any new \n", "folder = 'data/raw/'\n", "input_url = 'https://www.aoml.noaa.gov/ftp/pub/phod/lumpkin/hourly/v2.00/netcdf/'\n", "\n", @@ -503,7 +489,7 @@ "string = urlpath.read().decode('utf-8')\n", "pattern = re.compile('drifter_[0-9]*.nc')\n", "filelist = pattern.findall(string)\n", - "list_id = [int(f.split('_')[-1][:-3]) for f in filelist]\n", + "list_id = np.unique([int(f.split('_')[-1][:-3]) for f in filelist])\n", "\n", "# Here we \"randomly\" select a subset of ID numbers but produce reproducible results\n", "# by actually setting the seed of the random generator\n", @@ -914,27 +900,27 @@ " fill: currentColor;\n", "}\n", "
<xarray.Dataset>\n",
-       "Dimensions:           (traj: 1, obs: 5137)\n",
+       "Dimensions:           (traj: 1, obs: 1095)\n",
        "Dimensions without coordinates: traj, obs\n",
        "Data variables: (12/33)\n",
-       "    ID                (traj) |S10 b'2578'\n",
-       "    rowsize           (traj) int32 5137\n",
+       "    ID                (traj) |S10 b'2592'\n",
+       "    rowsize           (traj) int32 1095\n",
        "    WMO               (traj) float64 4.401e+06\n",
        "    expno             (traj) float64 9.046e+03\n",
-       "    deploy_date       (traj) float32 1.114e+09\n",
-       "    deploy_lat        (traj) float64 47.5\n",
+       "    deploy_date       (traj) float32 9.887e+08\n",
+       "    deploy_lat        (traj) float64 47.43\n",
        "    ...                ...\n",
-       "    err_sst           (traj, obs) float64 0.013 0.011 0.011 ... 0.009 0.021\n",
-       "    err_sst1          (traj, obs) float64 0.033 0.023 0.015 ... 0.023 0.036\n",
-       "    err_sst2          (traj, obs) float64 0.03 0.027 0.021 ... 0.018 0.02 0.021\n",
-       "    flg_sst           (traj, obs) float64 5.0 5.0 5.0 5.0 ... 1.0 1.0 1.0 1.0\n",
-       "    flg_sst1          (traj, obs) float64 5.0 5.0 5.0 5.0 ... 1.0 1.0 1.0 1.0\n",
-       "    flg_sst2          (traj, obs) float64 5.0 5.0 5.0 5.0 ... 1.0 1.0 1.0 1.0\n",
+       "    err_sst           (traj, obs) float64 0.074 0.077 0.086 ... 0.042 -1e+34\n",
+       "    err_sst1          (traj, obs) float64 0.224 0.158 0.113 ... 0.107 -1e+34\n",
+       "    err_sst2          (traj, obs) float64 0.236 0.185 0.142 ... 0.087 0.1 -1e+34\n",
+       "    flg_sst           (traj, obs) float64 5.0 5.0 5.0 5.0 ... 5.0 5.0 5.0 0.0\n",
+       "    flg_sst1          (traj, obs) float64 5.0 5.0 5.0 5.0 ... 5.0 5.0 5.0 0.0\n",
+       "    flg_sst2          (traj, obs) float64 4.0 4.0 4.0 4.0 ... 4.0 4.0 2.0 0.0\n",
        "Attributes: (12/72)\n",
        "    title:                      Global Drifter Program hourly drifting buoy c...\n",
-       "    id:                         Global Drifter Program ID 2578\n",
+       "    id:                         Global Drifter Program ID 2592\n",
        "    location_type:              Argos\n",
-       "    wmo_platform_code:          4400505\n",
+       "    wmo_platform_code:          4400509\n",
        "    ncei_template_version:      NCEI_NetCDF_Trajectory_Template_v2\n",
        "    cdm_data_type:              Trajectory\n",
        "    ...                         ...\n",
@@ -943,19 +929,20 @@
        "    acknowledgement:            Elipot et al. (2016), Elipot et al. (2021) to...\n",
        "    history:                    Version 2.00.  Metadata from dirall.dat and d...\n",
        "    interpolation_method:       \n",
-       "    imei:                       
" + " imei: " ], "text/plain": [ "\n", - "Dimensions: (traj: 1, obs: 5137)\n", + "Dimensions: (traj: 1, obs: 1095)\n", "Dimensions without coordinates: traj, obs\n", "Data variables: (12/33)\n", " ID (traj) |S10 ...\n", @@ -973,9 +960,9 @@ " flg_sst2 (traj, obs) float64 ...\n", "Attributes: (12/72)\n", " title: Global Drifter Program hourly drifting buoy c...\n", - " id: Global Drifter Program ID 2578\n", + " id: Global Drifter Program ID 2592\n", " location_type: Argos\n", - " wmo_platform_code: 4400505\n", + " wmo_platform_code: 4400509\n", " ncei_template_version: NCEI_NetCDF_Trajectory_Template_v2\n", " cdm_data_type: Trajectory\n", " ... ...\n", @@ -1007,7 +994,7 @@ "\n", "In the GDP dataset, the number of observations varies from `len(['obs'])=13` to `len(['obs'])=66417`. As such, it seems inefficient to create bidimensional datastructure `['traj', 'obs']`, commonly used by Lagrangian numerical simulation tools such as [Ocean Parcels](https://oceanenDrift](https://opendrift.github.io/) and [OpenDrift](https://opendrift.github.io/) that tend to generate trajectories of equal or similar lengths.\n", "\n", - "Here, we propose to combine the data from the individual netCDFs files into a [*contiguous ragged array*](https://cfconventions.org/cf-conventions/cf-conventions.html#_contiguous_ragged_array_representation) eventually written in a single NetCDF file in order to simplify data distribution, decrease metadata redundancies, and efficiently store a Lagrangian data collection of uneven lengths. The aggregation process (conducted with the `create_ragged_array` function found in the module `preprocess.py`) also converts to variables some of the metadata originally stored as attributes in the individual NetCDFs. The final structure contains 21 `vars['obs']` and 38 `vars['traj']`." + "Here, we propose to combine the data from the individual netCDFs files into a [*contiguous ragged array*](https://cfconventions.org/cf-conventions/cf-conventions.html#_contiguous_ragged_array_representation) eventually written in a single NetCDF file in order to simplify data distribution, decrease metadata redundancies, and efficiently store a Lagrangian data collection of uneven lengths. The aggregation process (conducted with the `create_ragged_array` function found in the module `preprocess.py`) also converts to variables some of the metadata originally stored as attributes in the individual NetCDFs. The final structure contains 21 variables with dimension `['obs']` and 38 variables with dimension `['traj']`." ] }, { @@ -1387,21 +1374,21 @@ " fill: currentColor;\n", "}\n", "
<xarray.Dataset>\n",
-       "Dimensions:                (traj: 500, obs: 5124072)\n",
+       "Dimensions:                (traj: 500, obs: 4786301)\n",
        "Coordinates:\n",
-       "    ID                     (traj) int64 2578 19081 21711 ... 68243870 68247270\n",
+       "    ID                     (traj) int64 2592 6428 13566 ... 68246720 68248530\n",
        "    longitude              (obs) float32 ...\n",
        "    latitude               (obs) float32 ...\n",
        "    time                   (obs) datetime64[ns] ...\n",
        "    ids                    (obs) int64 ...\n",
        "Dimensions without coordinates: traj, obs\n",
        "Data variables: (12/54)\n",
-       "    rowsize                (traj) int64 5137 3489 26614 4060 ... 656 658 1771\n",
-       "    location_type          (traj) bool False False False True ... True True True\n",
-       "    WMO                    (traj) int32 4400505 7100582 ... 4402562 5102733\n",
-       "    expno                  (traj) int32 9046 9484 9325 ... 21312 21312 21312\n",
-       "    deploy_date            (traj) datetime64[ns] 2005-04-15 2000-04-21 ... NaT\n",
-       "    deploy_lon             (traj) float32 -48.0 -39.79 176.9 ... -46.03 -155.2\n",
+       "    rowsize                (traj) int64 1095 19132 6631 17906 ... 643 1843 1176\n",
+       "    location_type          (traj) bool False False False ... True True True\n",
+       "    WMO                    (traj) int32 4400509 1600536 ... 4601712 4601740\n",
+       "    expno                  (traj) int32 9046 9435 7325 ... 21312 21312 21312\n",
+       "    deploy_date            (traj) datetime64[ns] 2001-05-01 2001-01-11 ... NaT\n",
+       "    deploy_lon             (traj) float32 -52.17 71.24 -97.16 ... -151.0 -143.4\n",
        "    ...                     ...\n",
        "    err_sst                (obs) float32 ...\n",
        "    err_sst1               (obs) float32 ...\n",
@@ -1413,7 +1400,7 @@
        "    title:             Global Drifter Program hourly drifting buoy collection\n",
        "    history:           Version 2.00.  Metadata from dirall.dat and deplog.dat\n",
        "    Conventions:       CF-1.6\n",
-       "    date_created:      2022-04-14T10:19:12.870209\n",
+       "    date_created:      2022-04-14T23:14:58.694974\n",
        "    publisher_name:    GDP Drifter DAC\n",
        "    publisher_email:   aoml.dftr@noaa.gov\n",
        "    ...                ...\n",
@@ -1422,35 +1409,35 @@
        "    contributor_role:  Data Acquisition Center\n",
        "    institution:       NOAA Atlantic Oceanographic and Meteorological Laboratory\n",
        "    acknowledgement:   Elipot et al. (2022) to be submitted. Elipot et al. (2...\n",
-       "    summary:           Global Drifter Program hourly data
  • title :
    Global Drifter Program hourly drifting buoy collection
    history :
    Version 2.00. Metadata from dirall.dat and deplog.dat
    Conventions :
    CF-1.6
    date_created :
    2022-04-14T23:14:58.694974
    publisher_name :
    GDP Drifter DAC
    publisher_email :
    aoml.dftr@noaa.gov
    publisher_url :
    https://www.aoml.noaa.gov/phod/gdp
    licence :
    MIT License
    processing_level :
    Level 2 QC by GDP drifter DAC
    metadata_link :
    https://www.aoml.noaa.gov/phod/dac/dirall.html
    contributor_name :
    NOAA Global Drifter Program
    contributor_role :
    Data Acquisition Center
    institution :
    NOAA Atlantic Oceanographic and Meteorological Laboratory
    acknowledgement :
    Elipot et al. (2022) to be submitted. Elipot et al. (2016). Global Drifter Program quality-controlled hourly interpolated data from ocean surface drifting buoys, version 2.00. NOAA National Centers for Environmental Information. https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2016JC011716TBA. Accessed [date].
    summary :
    Global Drifter Program hourly data
  • " ], "text/plain": [ "\n", - "Dimensions: (traj: 500, obs: 5124072)\n", + "Dimensions: (traj: 500, obs: 4786301)\n", "Coordinates:\n", " ID (traj) int64 ...\n", " longitude (obs) float32 ...\n", @@ -1476,7 +1463,7 @@ " title: Global Drifter Program hourly drifting buoy collection\n", " history: Version 2.00. Metadata from dirall.dat and deplog.dat\n", " Conventions: CF-1.6\n", - " date_created: 2022-04-14T10:19:12.870209\n", + " date_created: 2022-04-14T23:14:58.694974\n", " publisher_name: GDP Drifter DAC\n", " publisher_email: aoml.dftr@noaa.gov\n", " ... ...\n", @@ -1866,19 +1853,19 @@ " fill: currentColor;\n", "}\n", "
    <xarray.DataArray 'rowsize' (traj: 500)>\n",
    -       "array([ 5137,  3489, 26614, ...,   656,   658,  1771])\n",
    +       "array([ 1095, 19132,  6631, ...,   643,  1843,  1176])\n",
            "Coordinates:\n",
    -       "    ID       (traj) int64 2578 19081 21711 22192 ... 68243270 68243870 68247270\n",
    +       "    ID       (traj) int64 2592 6428 13566 17927 ... 68244730 68246720 68248530\n",
            "Dimensions without coordinates: traj\n",
            "Attributes:\n",
            "    long_name:  Number of observations per trajectory\n",
    -       "    units:      -
    " + " units: -" ], "text/plain": [ "\n", - "array([ 5137, 3489, 26614, ..., 656, 658, 1771])\n", + "array([ 1095, 19132, 6631, ..., 643, 1843, 1176])\n", "Coordinates:\n", - " ID (traj) int64 2578 19081 21711 22192 ... 68243270 68243870 68247270\n", + " ID (traj) int64 2592 6428 13566 17927 ... 68244730 68246720 68248530\n", "Dimensions without coordinates: traj\n", "Attributes:\n", " long_name: Number of observations per trajectory\n", @@ -2306,13 +2293,13 @@ " fill: currentColor;\n", "}\n", "
    <xarray.Dataset>\n",
    -       "Dimensions:                (traj: 500, obs: 5124072)\n",
    +       "Dimensions:                (traj: 500, obs: 4786301)\n",
            "Coordinates:\n",
            "    ID                     (traj) int64 dask.array<chunksize=(500,), meta=np.ndarray>\n",
    -       "    longitude              (obs) float32 dask.array<chunksize=(5124072,), meta=np.ndarray>\n",
    -       "    latitude               (obs) float32 dask.array<chunksize=(5124072,), meta=np.ndarray>\n",
    -       "    time                   (obs) datetime64[ns] dask.array<chunksize=(5124072,), meta=np.ndarray>\n",
    -       "    ids                    (obs) int64 dask.array<chunksize=(5124072,), meta=np.ndarray>\n",
    +       "    longitude              (obs) float32 dask.array<chunksize=(4786301,), meta=np.ndarray>\n",
    +       "    latitude               (obs) float32 dask.array<chunksize=(4786301,), meta=np.ndarray>\n",
    +       "    time                   (obs) datetime64[ns] dask.array<chunksize=(4786301,), meta=np.ndarray>\n",
    +       "    ids                    (obs) int64 dask.array<chunksize=(4786301,), meta=np.ndarray>\n",
            "Dimensions without coordinates: traj, obs\n",
            "Data variables: (12/54)\n",
            "    rowsize                (traj) int64 dask.array<chunksize=(500,), meta=np.ndarray>\n",
    @@ -2322,17 +2309,17 @@
            "    deploy_date            (traj) datetime64[ns] dask.array<chunksize=(500,), meta=np.ndarray>\n",
            "    deploy_lon             (traj) float32 dask.array<chunksize=(500,), meta=np.ndarray>\n",
            "    ...                     ...\n",
    -       "    err_sst                (obs) float32 dask.array<chunksize=(5124072,), meta=np.ndarray>\n",
    -       "    err_sst1               (obs) float32 dask.array<chunksize=(5124072,), meta=np.ndarray>\n",
    -       "    err_sst2               (obs) float32 dask.array<chunksize=(5124072,), meta=np.ndarray>\n",
    -       "    flg_sst                (obs) int8 dask.array<chunksize=(5124072,), meta=np.ndarray>\n",
    -       "    flg_sst1               (obs) int8 dask.array<chunksize=(5124072,), meta=np.ndarray>\n",
    -       "    flg_sst2               (obs) int8 dask.array<chunksize=(5124072,), meta=np.ndarray>\n",
    +       "    err_sst                (obs) float32 dask.array<chunksize=(4786301,), meta=np.ndarray>\n",
    +       "    err_sst1               (obs) float32 dask.array<chunksize=(4786301,), meta=np.ndarray>\n",
    +       "    err_sst2               (obs) float32 dask.array<chunksize=(4786301,), meta=np.ndarray>\n",
    +       "    flg_sst                (obs) int8 dask.array<chunksize=(4786301,), meta=np.ndarray>\n",
    +       "    flg_sst1               (obs) int8 dask.array<chunksize=(4786301,), meta=np.ndarray>\n",
    +       "    flg_sst2               (obs) int8 dask.array<chunksize=(4786301,), meta=np.ndarray>\n",
            "Attributes: (12/15)\n",
            "    title:             Global Drifter Program hourly drifting buoy collection\n",
            "    history:           Version 2.00.  Metadata from dirall.dat and deplog.dat\n",
            "    Conventions:       CF-1.6\n",
    -       "    date_created:      2022-04-14T10:19:12.870209\n",
    +       "    date_created:      2022-04-14T23:14:58.694974\n",
            "    publisher_name:    GDP Drifter DAC\n",
            "    publisher_email:   aoml.dftr@noaa.gov\n",
            "    ...                ...\n",
    @@ -2341,7 +2328,7 @@
            "    contributor_role:  Data Acquisition Center\n",
            "    institution:       NOAA Atlantic Oceanographic and Meteorological Laboratory\n",
            "    acknowledgement:   Elipot et al. (2022) to be submitted. Elipot et al. (2...\n",
    -       "    summary:           Global Drifter Program hourly data