Feature/caching #1922

hmoazam · 2024-12-10T22:15:05Z

One single caching interface which has two levels of cache - in memory lru cache and fanout (on disk)

…d and reset methods, test with multiple threads

okhat · 2024-12-10T23:27:28Z

dspy/__init__.py

@@ -64,6 +66,13 @@
 configure = settings.configure
 context = settings.context

+CACHE_DIR = os.environ.get("DSPY_CACHEDIR") or os.path.join(Path.home(), ".dspy_cache")


Note to self: .dspy_cache is already used for litellm, this PR should be swapped (it uses .dspy_litellm_cache for the existing cache but that cache doesn't exist at that new path)

also the limit should be 30GB

okhat · 2024-12-10T23:29:07Z

dspy/clients/lm.py

-        num_retries=num_retries,
-    )
+    from dspy import settings
+    @dspy_cache_decorator(settings.cache)


hmm this is a strange pattern; is the inner function redefined every time? does this mean the decorator... works? I guess it does, but yeah it's a bit overkill probably?

It does work but definitely a weird pattern. Not 100% sure if it has any negative side effects.

okhat · 2024-12-10T23:29:55Z

dspy/clients/lm.py

+    def cached_litellm_text_completion_inner(request: Dict[str, Any], num_retries: int):
+        return litellm_text_completion(
+            request,
+            cache={"no-cache": False, "no-store": False},


This passes cache to the function below. But for some reason we removed the cache kwarg below? Why?

The default (both kwargs true) should be added back to litellm_text_completion function (I forgot to fix this)

okhat · 2024-12-10T23:39:02Z

dspy/utils/cache.py

+            key = self.cache_key(request)
+        except Exception:
+            return None
+        with self.lock:  


Hmm do we need to lock just to read?

if we can remove this lock, we may wanna do something like if key := self.memory_cache.get() to avoid double access

I would think reads are thread safe but the docs aren't clear.

okhat · 2024-12-12T22:01:14Z

Recording some thoughts here.

We need to make sure that caching behavior makes sense if the function being cached raises an Exception.
We need to make sure that the caching behavior makes sense for LiteLLM. Specifically, LiteLLM's default caching is a bit nuanced: it does not cache api_* keys (which is important) and it does not cace metadata (like cost), as far as I can tell.

…re/caching_7Dec2024

dbczumar · 2024-12-20T00:39:30Z

dspy/utils/cache.py

+        """
+
+        self.memory_cache = LRUCache(maxsize=mem_size_limit)
+        self.fanout_cache = FanoutCache(shards=16, timeout=2, directory=directory, size_limit=disk_size_limit)


@hmoazam As I understand it, part of the motivation for this PR is to enable users to dump the current state of the cache for a particular DSPy session.

Is the FanoutCache introduced for that purpose? If so, since FanoutCache shares the same directory as the litellm.cache, what extra utility is it providing?

Separately, I think we should define the behavior for dumping / loading a cache.

For example, consider the case where I've already run "Program 1" as a Python script / notebook. Now, I'm running "Program 2" in a separate notebook / Python script invocation. I'm using the default DSPy cache directory for both program executions.

If I dump the cache after I run Program 2, do I also want to include cache contents from Program 1? Or, do I only want to include cache contents from Program 1, since that's my "current session"? Should users have a choice?

Once we align on the desired semantics, we can figure out how to make adjustments to the implementation.

cc @okhat

@dbczumar For disk caching, we will migrate from LiteLLM's caching to FanoutCache. The former has proved limiting when users need many read/write processes and threads in parallel.

For a short period of time, we might keep the LiteLLM cache as well. This means that highly active users that upgrade versions regularly will have an implicit seamless migration of their caches, because LiteLLM's requests will remain cached and will implicitly transfer to the FanoutCache. (This behavior is not required, and in principle there are other ways to more intentionally allow migration, but explicit migration is almost never worthwhile for caches.)

Cache dumping is fairly simple. It saves the current in-memory Python session, unless the user triggers a reset for the in-memory cache. (The current in-memory session draws on saved caches on disk, as usual.)

@okhat @hmoazam Apologies for the delay.

I took a look at this further and found that diskcache does not natively handle objects that can't be pickled (e.g. lambdas like LiteLLM's azure_ad_token_provider arg), but LiteLLM does. There's a nontrivial amount of logic that goes into this. It may be easier to add support for FanoutCache to LiteLLM, rather than implement our own diskcache wrapper. Are there other reasons that we need to move away from LiteLLM?

Separately, it's not clear to me that dumping the in-memory Python session is what we want. If I understand correctly, the user's objective is to run a workload and store all of the cached values for that workload on disk. That way, another user can set DSPY_CACHEDIR to that disk location and automatically use the cached values. Dumping memory isn't quite the same thing because users might have memory limits set on their cache (we probably want to set a memory limit by default anyway), thereby unintentionally discarding cached values from larger workloads. Further, dumping memory to a separate location than the disk cache (as is done in this PR) complicates the experience of loading the cache (I can't just set DSPY_CACHEDIR to the desired location - I have to worry about loading this pickled memory state as well). To solve both of these problems, I'd propose an API like the following:

# Start recording (dual writing) cache results to the specified destination, which will be # created if it does not exist dspy.settings.cache.start_recording(destination="/path/to/cache/directory/that/I/want/to/share") # Run workload dspy.settings.cache.stop_recording() # Optional

When recording is enabled, results will be dual written to the main disk cache location (e.g. the one specified by DSPY_CACHEDIR) and the user-specified disk cache location. This way, recipients of the cache who want to reuse its values don't have to reason about loading stuff from disk vs memory - the snapshotted (recorded) cache looks just like any other DSPy disk cache (because it is :D).

Thanks @dbczumar! I've been looking into the failing tests and was still figuring out the part with pickling. I'll let Omar comment on any other reasons for moving away from litellm's cache, but the main motivations were for us to have more options e.g. use fanoutcache, and offer additional methods like saving and loading.

We originally envisioned the user journey for saving/loading as the following:

User executes a program, assuming caching is turned on, calls are cached both in memory and on disk.

User decides they want to port some subset of the cache, so they run a new session of the notebook, executing only the calls they wish to export, which end up in the in-memory cache, and can then be dumped out with the save method. This means they do not have to reset the disk cache or create another one.

User calls the load method with the path to the saved cache, which loads it into memory

I do think that we should keep the 2 levels of caching, and I do see we have a gap with point 3, in that we should also populate the disk cache with those values explicitly. Your suggestion does help us achieve a similar experience for the users, but my main concern is whether we would run into issues simply because of the pickling limitations. I haven't gone through all the relevant litellm code yet, but with your proposed approach we still face that limitation from diskcache itself.

@hmoazam We can also add Fanout to LiteLLM directly. My motivation here is that LiteLLM handles pickling on top of diskcache directly. The problem with building our own wrapper over diskcache is that diskcache does not handle pickling directly - i.e. we won't be able to save many valid LiteLLM request objects (e.g. requests with lambdas) to disk unless we reuse LiteLLM's pickle compatibility logic (which is private / internal) or roll our own. This is a problem even without memory pickling (two separate issues).

Regarding save / load, I think this is what the memory / dumping logic in your PR (and my proposed start_recording() API) is attempting to handle. We can still wrap LiteLLM to achieve this; I don't think save / load is sufficient motivation for rewriting the wrapper around diskcache to achieve this (why not wrap litellm.caching.Cache, which saves us all the pickle + disk headaches?)

User decides they want to port some subset of the cache, so they run a new session of the notebook, executing only the calls they wish to export, which end up in the in-memory cache, and can then be dumped out with the save method. This means they do not have to reset the disk cache or create another one.

By definition porting means creating another cache (let me know if I'm missing something)

From a solution standpoint, "dump memory" is an indirect way of achieving portability. It relies on the assumption that all of the data in the new session is cached in memory and that, therefore, dumping memory will create the desired "port" (subset copy) of the cache. However, this may not be the case if the user has set memory limits on the cache (in which case some of the data from the session is only on disk and never gets ported), and it also relies on the user clearing memory before running the session to ensure a "clean port" (which can be inefficient if most of the data is already in memory, and only a small subset needs to be excluded from the port). "start_recording()" is more explicit, doesn't require clearing a memory cache, and doesn't risk data loss from the cache port due to configured memory limits.

@okhat

hmoazam added 5 commits December 8, 2024 22:15

Rebased latest changes

47d6534

WIP

4a9ac55

WIP - initial tests working fine. TODO: how/where to expose save, loa…

b5a948f

…d and reset methods, test with multiple threads

Cleaned up structure and added cache to settings

dfdf8f7

looks like it works

385014d

hmoazam requested a review from okhat December 10, 2024 22:15

okhat reviewed Dec 10, 2024

View reviewed changes

okhat added 3 commits December 11, 2024 09:05

Merge branch 'main' into feature/caching_7Dec2024

69c7e44

Some WIP adjustments (2 TODOs marked in cache.py!)

afd025a

DSPy cache decorator: use function identifier

e712a61

okhat added 2 commits December 12, 2024 18:38

Fixes

4c6b529

Merge branch 'v2.6' of https://github.com/stanfordnlp/dspy into featu…

bd6f527

…re/caching_7Dec2024

okhat changed the base branch from main to v2.6 December 13, 2024 15:38

WIP debugging the cache tests

9f3b090

dbczumar reviewed Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/caching #1922

Feature/caching #1922

hmoazam commented Dec 10, 2024

okhat Dec 10, 2024 •

edited

Loading

okhat Dec 10, 2024

hmoazam Dec 10, 2024

okhat Dec 10, 2024

hmoazam Dec 10, 2024

okhat Dec 10, 2024

okhat Dec 10, 2024

hmoazam Dec 10, 2024

okhat commented Dec 12, 2024

dbczumar Dec 20, 2024

dbczumar Dec 20, 2024

okhat Dec 23, 2024

dbczumar Jan 8, 2025 •

edited

Loading

hmoazam Jan 8, 2025 •

edited

Loading

dbczumar Jan 8, 2025 •

edited

Loading

Feature/caching #1922

Are you sure you want to change the base?

Feature/caching #1922

Conversation

hmoazam commented Dec 10, 2024

okhat Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

okhat commented Dec 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

hmoazam Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

dbczumar Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

okhat Dec 10, 2024 •

edited

Loading

dbczumar Jan 8, 2025 •

edited

Loading

hmoazam Jan 8, 2025 •

edited

Loading

dbczumar Jan 8, 2025 •

edited

Loading