Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Create an acceptance test suite for checkpointers #541

Closed
benjamincburns opened this issue Oct 1, 2024 · 11 comments · Fixed by #600
Closed

Feature request: Create an acceptance test suite for checkpointers #541

benjamincburns opened this issue Oct 1, 2024 · 11 comments · Fixed by #600

Comments

@benjamincburns
Copy link
Contributor

As @hgoona mentioned in this comment, the requirements for custom checkpointers aren't abundantly obvious from the docs, or even the LangChain-provided checkpointer implementations.

It would be helpful if a black-box acceptance test suite existed for checkpointers, ideally one that could be used both by the LangGraph team, and authors of third party checkpointer implementations.

The suite would need to exercise all of the requirements of the checkpointer contract, such that if an arbitrary checkpointer passes this suite, the author and community would have high confidence that the checkpointer in question fulfills all of LangGraph's requirements for storing and retrieving persisted state.

Ideally this would also exist for stores. I'm happy to open another issue for those if the maintainers would prefer to track these separately.

@vbarda
Copy link
Collaborator

vbarda commented Oct 1, 2024

@benjamincburns thanks for the feedback! this is a great point and we actually do plan to do this for both Python and JS, we'll follow up here once we have more details

@benjamincburns
Copy link
Contributor Author

benjamincburns commented Oct 2, 2024

@vbarda are you guys interested in external contributions for this for the JS side? I'd imagine it'd be a riskier contribution for a third party to make given that anyone external to the team would essentially be reverse-engineering the interface specification, not to mention making some assumptions around how you guys would want to structure the test suite, what tooling to use, etc.

That said, I'm finding that I need something like this for my own project, so I'm likely to be building something of this ilk soon if you guys don't do it for me. If the thing that I come up with looks like it might be more broadly useful, I'd be happy to submit it as a PR. I'd just expect that I'd need some help from the LangGraph team to massage it into a shape that fits nicely in this repository.

Any preferences you can share here on the sorts of assumptions I mention above would also be quite helpful.

@hgoona
Copy link

hgoona commented Oct 2, 2024

@vbarda I'd be keen but I'm quite green to this side of things and feel very lost in my understanding of the anatomy of these Checkpointer Savers (or whatever they are called <<< the naming of these parts are in and of itself a hard ask to form a mental model around. It really could be explained better or named better too).

Currently, @benjamincburns I'm also trying to push through to make a custom Checkpoint Saver for use with SurrealDB, regardless of the missing info (above), but not doing great at all... I'm referencing the Sqlite Saver at present.
I'd be happy to share notes if that is helpful??

@benjamincburns
Copy link
Contributor Author

benjamincburns commented Oct 3, 2024

@hgoona if you DM me on the LangChain slack I'll drop my notes doc your way. It's a bit terse (mostly just mapping out types and asking/answering my own questions as bullet points) but it may help. You can find me as "Ben Burns" there with the same pfp.

I can't guarantee that I'll reply outside of my usual working hours, though (NZ timezone). I'm also planning on taking the day off tomorrow (although I may still be around as I might hack on a personal project with LangGraph).

Edit: I went ahead and uploaded them as a gist. If you want to discuss them, please reach out to me on Slack or leave comments on that gist.

@hgoona
Copy link

hgoona commented Oct 3, 2024

@benjamincburns Noted! Though, I don't have access to that Slack. Are you on the LangChain Discord server to talk further?

@hgoona
Copy link

hgoona commented Oct 4, 2024

@benjamincburns In case you're interested, I've posted a draft Saver for SurrealDB here where I've just tried to mimic the Sqlite version: #545 (comment)

I've annotated the test a bit better than the original from SqliteSaver so it may be useful if applied to your case. Currently my SdbSaver is failing two tests. Would love any help I can get on it!

@benjamincburns
Copy link
Contributor Author

benjamincburns commented Oct 7, 2024

@vbarda I've started an acceptance suite that I intend to contribute. So far it's just a pile of unit tests, but I intend to fill out some integration tests in the scenarios, as well.

It's not ready to PR just yet, but if you or someone else from the team wouldn't mind giving it a quick scan through my branch, I'd be much obliged for any feedback you can offer. I intend to do a clean-up pass or two for readability, informative/actionable error messages on failure, etc. Just wanting to make sure that for now this is something that's roughly aligned with what you guys had in mind.

@jacoblee93
Copy link
Collaborator

I've repurposed the existing pregel.test.ts suite in the PostgresSaver PR to add support for setting up a checkpointer before each test:

https://github.com/langchain-ai/langgraphjs/blob/main/libs/langgraph/src/tests/pregel.postgres_saver.int.test.ts

But I do still think that a lighter, more granular suite would be very helpful so will keep this open! Thanks for starting this @benjamincburns!

Using this as a baseline might be helpful as well: https://github.com/langchain-ai/langchainjs/tree/main/libs/langchain-standard-tests

@benjamincburns
Copy link
Contributor Author

benjamincburns commented Oct 16, 2024

Now that I've submitted the PR for this, I thought it might be useful if I captured some feedback on some things that I observed along the way. I can create tickets for any/all of these things if it's useful, but I thought it'd be best to capture them here first, as I think the context helps underscore the importance.

Also I recognize that this is a very new, very beta project in an early stage startup that has gained substantial attention and popularity in its earliest days. I have some experience building in the open myself, and it definitely presents some unique challenges (and joys). If you're lucky enough to have demand for what you're doing it's easy to get the impression that everyone wants you to be at least two years (or more) more mature than you actually are. You guys are doing a seriously impressive job at navigating that, especially when it comes to iterating quickly and engaging with your community.

So in case it isn't clear - I'm writing this to support the work that you're doing, not to detract from it.


If you've worked on SaaS apps before, you'll know that state persistence can be a subtly risky and complex area of concern, and that poor planning and/or poor engineering discipline in this area can cause massively expensive problems to crop up quietly over time. As a result, devs who play in this space would be ill advised to adopt libraries and frameworks that remove control by making this problem less predictable or harder to reason about.

While working through #600 I encountered a number stumbling blocks that are causing LangGraph to miss the mark in that regard:

  • Key public interfaces are either entirely undocumented or only minimally documented, causing the reader to rely entirely on the interface semantics when inferring the intended contract of the interface
    • for example, see tsdoc comments on BaseCheckpointSaver, Checkpoint, CheckpointTuple -- CheckpointMetadata is somewhat better than the others, but it's unclear what my responsibilities are with respect to this type as a checkpoint saver author.
    • there's also no public documentation that I can find that guides developers on how to write their own custom checkpoint saver implementation, nor is there any explanation of the differences between similarly named fields on related types (e.g. CheckpointMetadata.writes vs CheckpointTuple.pendingWrites vs pending_sends), or how they should (or shouldn't) impact the behavior of checkpoint savers.
    • the key mechanism that enables checkpoint savers to store state deltas (the newVersions argument on the BaseCheckpointSaver.put method) isn't described or mentioned in the README or any other documentation that I could find
  • Existing tests had minimal coverage of the interface requirements
    • I could be mistaken, but I didn't see any tests that
      • passed newVersions to put (prior to checkpoint-postgres being merged)
      • checked for correct handling of error conditions
      • validated the functionality of the BaseCheckpointSaver.list method
  • Some checkpoint saver interface elements appear to have been built out speculatively and/or haven't yet been fully fleshed out
    • varying degrees of functionality in the list method across existing implementations
    • some types are defined as generics to allow you to change up the version ID type (restricted to number | string) but existing implementations all use number, and it's not clear how (or why) one would go about using string versions.
    • storage of state deltas (again, a key feature of the checkpoint saver interface) is only demonstrated by the PostgresSaver implementation, which only merged last week.
  • The RunnableConfig type is being abused to pass important internal structures that would make more sense to be passed as objects with well-defined types.
    • doing this makes it so that these critical structures aren't discoverable and aren't included in auto-generated reference docs
    • this is causing other areas of your code to silently become typed as any, breaking IDE indexing and search features like "Find All References", which further reduces discoverability of key logic.
      • For example, even though Pregel.checkpointer is typed as BaseCheckpointSaver | false | undefined, methods in Pregel that use it (e.g. here in updateState) alias it to a local checkpointer variable, which is defined as const checkpointer = config.configurable?.[CONFIG_KEY_CHECKPOINTER] ?? this.checkpointer;.
        • config.configurable is of type any, which makes config.configurable?.[CONFIG_KEY_CHECKPOINTER] also be type any, which makes the type of the local checkpointer alias be BaseCheckpointSaver | false | undefined | any, which the TS type checker narrows to any.
        • At first blush you'd think the TypeScript noImplicitAny check would catch this, as you have strict: true set in your compilerOptions in tsconfig.json, but config.configurable is explicitly typed as any, so it propagates without error.
      • A good first step toward fixing this issue is to swap all explicit usages of any for unknown, and only access these values after passing them through some form of type guard, ideally a type predicate. ESLint should have some rules to help with this.

With all of that said, I'm comfortable enough to continue using LangGraph for my application now that I've gained a deeper (albeit still incomplete) understanding of the checkpointing internals. However I currently have the luxury of working in an environment with very low time pressure. Were that not the case, the effort required for me to reach the necessary level of knowledge would've been far too high to move forward with LangGraph. It's totally understandable for this to be the case at this stage of LangGraph's development, but I think these issues are likely to be a significant off-ramp for would-be adopters as time marches on.

@hgoona
Copy link

hgoona commented Oct 16, 2024

+1 on this detailed analysis. Due to my uncertainty on the mechanics and purposes within the Checkpoint implementations, at least for the moment, I've decided to depart from using the Checkpoint feature.

@nfcampos
Copy link
Contributor

@hgoona we are aware of the issues pointed out above, and plan on working through them. We have just merged @benjamincburns's PR which adds the awesome test suite he wrote for checkpointers, and the plan after that is to work on fixing the issues found during that work. For context, our thought process on designing the checkpointer interface has been to make it a good basis to use in production use cases (eg. all changes to the interface so far have been done in a backwards compatible way, checkpoints from the very first version are still readable with the latest version of the libraries (albeit with less features than current ones of course), access to the database is serialized where needed, and combined into a single operation where possible, etc). Some of the actual implementations of checkpointers in this repo have fallen a bit short of the mark on correctness, and we are working towards fixing that now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants