Skip to content

Amazon Open Data

Michaela Wheeler edited this page Feb 12, 2021 · 11 revisions

Background

Sondehub.org aggregates telemetry data that is uploaded from community run radiosonde receive stations (mostly radiosonde_auto_rx) which captures weather balloon telemetry. Weather balloons are launched from numerous weather organisations around the world and the data is used to build upper atmosphere weather models. The goal of SondeHub is collect the data in a central location to allow organisations to develop their own models and forecasting.

Balloons are typically launched at 1115 and 2315 UTC other some out of schedule launches do occur when data about particular storm fronts are required or experiments like Ozone monitoring are launched (see xdata field). The balloons typically reach altitudes of 25,000m and report back GPS location, datetime, temperature, humidity back to ground stations. The location data is used to calculate wind speeds at specific altitudes.

Data Access

Data is stored Amazon S3 and can be access by using AWS S3 tools. Alternatively basic access is available using our Python SDK. Sonde data is uploaded as soon as we receive it.

SDK

Install SDK

pip3 install sondehub # todo add examples

Example

import sondehub
frames = sondehub.download(serial="serial", datetime_prefix="2018-10-01")

Amazon

Since data is stored in Amazon it can be downloaded quickly from an EC2 instance rather than downloading the data to your local machine. You can utilise Amazons SDKs or CLI to do this.

CLI Example

aws s3 cp -r s3://sondehub-open-data/serial/{serial}/ /tmp/sonde_data

Data Types and Structure

Frames are uploaded as JSON files in the S3 bucket as they are received by our servers. They are uploaded in Universal Sonde Telemetry Format and are indexed by datetime. No filter, or modification of the frames has occurred at this point and it's the users responsibility to check all required fields are acceptable for your use case.

Some important notes:

  • data hasn't be normalised in anyway
  • SondehubV1 API is forward to SondehubV2 however only has a subset of available fields available. This can be filtered out by using checking for SondehubV1 in the software_name field
  • All data prior to 2021 has been imported from SondehubV1
  • ⚠️ The data provided by most decoders are uncalibrated as this data isn't available in most over the air formats. Care must be taken when using this data.

Data Types

Universal Sonde Telemetry Format provides a JSON object per frame.

    {
            "subtype": "SondehubV1",
            "temp": "-61.2",
            "manufacturer": "SondehubV1",
            "serial": "S4640152",
            "lat": "44.20318",
            "frame": "6147",
            "datetime": "2021-02-04T00:32:30.157239Z",
            "software_name": "SondehubV1",
            "humidity": "1.9",
            "alt": "22010",
            "vel_h": "6.5",
            "uploader_callsign": "F4ERG",
            "lon": "-2.50013",
            "software_version": "SondehubV1",
            "type": "SondehubV1",
            "time_received": "2021-02-04T00:32:30.157239Z",
            "position": "44.20318,-2.50013"
    }

Data Structure

The S3 bucket sondehub-open-data is partitioned into /date/ and /serial/{serial}/.

/date/

Format: s3://sondehub-open-data/date/${ISODATE}-${SERIAL}-${GUID}.json
Example: s3://sondehub-open-data/date/2021-02-04T01:50:30.017123Z-S2950048-0776298b-b099-46fc-a77d-693823a38580.json

/serial/

Format: s3://sondehub-open-data/serial/${SERIAL}/${ISODATE}-${GUID}.json
Example: s3://sondehub-open-data/serial/DFM-19029884/2021-02-04T01:50:30.100357Z-3c03ad12-f23d-48cf-b902-70b535980e66.json

XDATA

xdata field can be decoded based on the vendors specifications. For more details checkout NOAA's XDATA specification.