You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Scenario: Python 3.11 GraphQL gateway using Ariadne with lots of nested data
During development I found a significant performance degradation. I raised this issue in GraphQL core graphql-python/graphql-core#190 . After some more research I found that using gather on CPU bound tasks causes significant overhead (graphql-python/graphql-core#190 (comment)). In the case of CPU bound async tasks it is better to use sequential await.
So I monkey patched gather into serial await in GraphQL core, but I still had very slow responses. Today I finally dove into this problem again and I saw that there was another gather in aiodataloader!
As far as I understand, the goal of the dataloader (when using with cache) is to only cause a few IO bound lookups and serve all other loads directly through the cache. This means that we will have a CPU bound usage of gather. I monkey patched the aiodataloader gather into a serial await and my requests went from 3s -> 500ms.
I am not sure if this is always the case (for example when not using cache), but as long as you want cache you really need to have a serial await. Maybe I am missing something (please let me know), but I would suggest to add a serial await to the load_many if cache is being used.
asyncdefserial_gather(*futures: Awaitable[Any]):
return [awaitfutureforfutureinfutures]
aiodataloader=import_module("aiodataloader")
defload_many(self, keys: Iterable[Any]) ->"Future[List[ReturnT]]":
""" Loads multiple keys, returning a list of values >>> a, b = await my_loader.load_many([ 'a', 'b' ]) This is equivalent to the more verbose: >>> a, b = await gather( >>> my_loader.load('a'), >>> my_loader.load('b') >>> ) """ifnotisinstance(keys, Iterable):
raiseTypeError(
("The loader.load_many() function must be called with Iterable<key> but got: {}.").format(keys)
)
returnserial_gather(*[self.load(key) forkeyinkeys])
aiodataloader.DataLoader.load_many=load_many
The text was updated successfully, but these errors were encountered:
Doing this would lead to very poor performance when retrieving keys that are not in the cache. It essentially defeats batching, which is the entire point of the DataLoader pattern.
What’s the use-case which leads to calling load_many() on hundreds of thousands of keys where the results are already cached? Are you using aiodataloader as an application-level cache?
Found the "problem" (which ended up being a user error). If you are running Python in debug mode the asyncio loop is also set to debug. This ensures that on each context switch the full stack trace is kept. This is quite expensive so when using Gather (depending on your workload) the tasks may end up switching A LOT which completely kills performance. In production this is not a problem because the asyncio loop is not set to debug.
TL;DR; If you want to get the actual performance disable debug on your asyncio loop
Scenario: Python 3.11 GraphQL gateway using Ariadne with lots of nested data
During development I found a significant performance degradation. I raised this issue in GraphQL core graphql-python/graphql-core#190 . After some more research I found that using
gather
on CPU bound tasks causes significant overhead (graphql-python/graphql-core#190 (comment)). In the case of CPU bound async tasks it is better to use sequential await.So I monkey patched
gather
into serial await in GraphQL core, but I still had very slow responses. Today I finally dove into this problem again and I saw that there was anothergather
in aiodataloader!As far as I understand, the goal of the dataloader (when using with cache) is to only cause a few IO bound lookups and serve all other loads directly through the cache. This means that we will have a CPU bound usage of
gather
. I monkey patched the aiodataloadergather
into a serial await and my requests went from 3s -> 500ms.I am not sure if this is always the case (for example when not using cache), but as long as you want cache you really need to have a serial await. Maybe I am missing something (please let me know), but I would suggest to add a serial await to the
load_many
if cache is being used.The text was updated successfully, but these errors were encountered: