-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Create an acceptance test suite for checkpointers #541
Comments
@benjamincburns thanks for the feedback! this is a great point and we actually do plan to do this for both Python and JS, we'll follow up here once we have more details |
@vbarda are you guys interested in external contributions for this for the JS side? I'd imagine it'd be a riskier contribution for a third party to make given that anyone external to the team would essentially be reverse-engineering the interface specification, not to mention making some assumptions around how you guys would want to structure the test suite, what tooling to use, etc. That said, I'm finding that I need something like this for my own project, so I'm likely to be building something of this ilk soon if you guys don't do it for me. If the thing that I come up with looks like it might be more broadly useful, I'd be happy to submit it as a PR. I'd just expect that I'd need some help from the LangGraph team to massage it into a shape that fits nicely in this repository. Any preferences you can share here on the sorts of assumptions I mention above would also be quite helpful. |
@vbarda I'd be keen but I'm quite green to this side of things and feel very lost in my understanding of the anatomy of these Checkpointer Savers (or whatever they are called <<< the naming of these parts are in and of itself a hard ask to form a mental model around. It really could be explained better or named better too). Currently, @benjamincburns I'm also trying to push through to make a custom Checkpoint Saver for use with SurrealDB, regardless of the missing info (above), but not doing great at all... I'm referencing the Sqlite Saver at present. |
@hgoona if you DM me on the LangChain slack I'll drop my notes doc your way. It's a bit terse (mostly just mapping out types and asking/answering my own questions as bullet points) but it may help. You can find me as "Ben Burns" there with the same pfp. I can't guarantee that I'll reply outside of my usual working hours, though (NZ timezone). I'm also planning on taking the day off tomorrow (although I may still be around as I might hack on a personal project with LangGraph). Edit: I went ahead and uploaded them as a gist. If you want to discuss them, please reach out to me on Slack or leave comments on that gist. |
@benjamincburns Noted! Though, I don't have access to that Slack. Are you on the LangChain Discord server to talk further? |
@benjamincburns In case you're interested, I've posted a draft Saver for SurrealDB here where I've just tried to mimic the Sqlite version: #545 (comment) I've annotated the test a bit better than the original from SqliteSaver so it may be useful if applied to your case. Currently my |
@vbarda I've started an acceptance suite that I intend to contribute. So far it's just a pile of unit tests, but I intend to fill out some integration tests in the It's not ready to PR just yet, but if you or someone else from the team wouldn't mind giving it a quick scan through my branch, I'd be much obliged for any feedback you can offer. I intend to do a clean-up pass or two for readability, informative/actionable error messages on failure, etc. Just wanting to make sure that for now this is something that's roughly aligned with what you guys had in mind.
|
I've repurposed the existing But I do still think that a lighter, more granular suite would be very helpful so will keep this open! Thanks for starting this @benjamincburns! Using this as a baseline might be helpful as well: https://github.com/langchain-ai/langchainjs/tree/main/libs/langchain-standard-tests |
Now that I've submitted the PR for this, I thought it might be useful if I captured some feedback on some things that I observed along the way. I can create tickets for any/all of these things if it's useful, but I thought it'd be best to capture them here first, as I think the context helps underscore the importance. Also I recognize that this is a very new, very beta project in an early stage startup that has gained substantial attention and popularity in its earliest days. I have some experience building in the open myself, and it definitely presents some unique challenges (and joys). If you're lucky enough to have demand for what you're doing it's easy to get the impression that everyone wants you to be at least two years (or more) more mature than you actually are. You guys are doing a seriously impressive job at navigating that, especially when it comes to iterating quickly and engaging with your community. So in case it isn't clear - I'm writing this to support the work that you're doing, not to detract from it. If you've worked on SaaS apps before, you'll know that state persistence can be a subtly risky and complex area of concern, and that poor planning and/or poor engineering discipline in this area can cause massively expensive problems to crop up quietly over time. As a result, devs who play in this space would be ill advised to adopt libraries and frameworks that remove control by making this problem less predictable or harder to reason about. While working through #600 I encountered a number stumbling blocks that are causing LangGraph to miss the mark in that regard:
With all of that said, I'm comfortable enough to continue using LangGraph for my application now that I've gained a deeper (albeit still incomplete) understanding of the checkpointing internals. However I currently have the luxury of working in an environment with very low time pressure. Were that not the case, the effort required for me to reach the necessary level of knowledge would've been far too high to move forward with LangGraph. It's totally understandable for this to be the case at this stage of LangGraph's development, but I think these issues are likely to be a significant off-ramp for would-be adopters as time marches on. |
+1 on this detailed analysis. Due to my uncertainty on the mechanics and purposes within the Checkpoint implementations, at least for the moment, I've decided to depart from using the Checkpoint feature. |
@hgoona we are aware of the issues pointed out above, and plan on working through them. We have just merged @benjamincburns's PR which adds the awesome test suite he wrote for checkpointers, and the plan after that is to work on fixing the issues found during that work. For context, our thought process on designing the checkpointer interface has been to make it a good basis to use in production use cases (eg. all changes to the interface so far have been done in a backwards compatible way, checkpoints from the very first version are still readable with the latest version of the libraries (albeit with less features than current ones of course), access to the database is serialized where needed, and combined into a single operation where possible, etc). Some of the actual implementations of checkpointers in this repo have fallen a bit short of the mark on correctness, and we are working towards fixing that now. |
As @hgoona mentioned in this comment, the requirements for custom checkpointers aren't abundantly obvious from the docs, or even the LangChain-provided checkpointer implementations.
It would be helpful if a black-box acceptance test suite existed for checkpointers, ideally one that could be used both by the LangGraph team, and authors of third party checkpointer implementations.
The suite would need to exercise all of the requirements of the checkpointer contract, such that if an arbitrary checkpointer passes this suite, the author and community would have high confidence that the checkpointer in question fulfills all of LangGraph's requirements for storing and retrieving persisted state.
Ideally this would also exist for stores. I'm happy to open another issue for those if the maintainers would prefer to track these separately.
The text was updated successfully, but these errors were encountered: