Processing large datasets in a streaming manner #9772
-
In some cases, the data that you need to process for an asset is too large to fit into memory. In these cases, it would be useful to be able to process data chunk by chunk, with each chunk getting written to external storage before the next chunk is processed. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
While a bit convoluted, it is possible to use generators for this purpose, provided you use a custom IOManager. While you're not able to yield these chunks directly from your asset or op (as Dagster only permits yielding structured events such as AssetObservations or Outputs from the body of the op), you can return a generator, which will be consumed within the IOManager itself. A quick example of this behavior would look something like the following:
|
Beta Was this translation helpful? Give feedback.
-
This should get more formal support IMO |
Beta Was this translation helpful? Give feedback.
-
I am using
|
Beta Was this translation helpful? Give feedback.
While a bit convoluted, it is possible to use generators for this purpose, provided you use a custom IOManager. While you're not able to yield these chunks directly from your asset or op (as Dagster only permits yielding structured events such as AssetObservations or Outputs from the body of the op), you can return a generator, which will be consumed within the IOManager itself. A quick example of this behavior would look something like the following: