Skip to content

Processing large datasets in a streaming manner #9772

Answered by OwenKephart
OwenKephart asked this question in Q&A
Discussion options

You must be logged in to vote

While a bit convoluted, it is possible to use generators for this purpose, provided you use a custom IOManager. While you're not able to yield these chunks directly from your asset or op (as Dagster only permits yielding structured events such as AssetObservations or Outputs from the body of the op), you can return a generator, which will be consumed within the IOManager itself. A quick example of this behavior would look something like the following:

class StreamingIOManager(IOManager):

    def handle_output(self, context, obj) -> None:
        with open("my_output.txt", "wb") as f:
            for chunk in obj(): # obj is a generator function
                f.write(chunk)

    def loa…

Replies: 3 comments 1 reply

Comment options

You must be logged in to vote
1 reply
@CarlinWilliamson
Comment options

Answer selected by OwenKephart
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
area: io-manager Related to I/O Managers
4 participants