Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Issue #1885: Extract JSON from Code Blocks and Handle Malformed JSON with json_repair #1978

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

clarkandrew
Copy link

Description

This PR improves the robustness of JSON handling in memory/main.py by introducing two key enhancements:

  1. Fixes #1885 with Regular Expression Parsing: Extracts JSON from LLM responses wrapped in markdown code blocks (e.g., ```json ... ```). This addresses the issue where models often return JSON in such formats, ensuring accurate parsing and processing.

  2. Integrates json_repair: Replaces json.loads with json_repair.loads to fix minor JSON formatting errors (e.g., missing parentheses, commas, or added words). This prevents reprocessing entire LLM requests due to small JSON issues.

The json-repair dependency has been added to pyproject.toml to support these improvements. These changes enhance the system’s stability and data integrity when handling dynamic JSON data.

Why json_repair?

Some LLMs, even with structured output, occasionally produce JSON that isn't fully valid. Common mistakes include missing quotes, misplaced commas, or malformed arrays and objects. Although these errors are typically minor, they can break JSON parsing and force unnecessary retries of entire requests.

I initially searched for a lightweight Python package that could fix such issues reliably but couldn't find one. So I developed json_repair, which addresses:

  • Syntax errors: Fixes missing quotes, misplaced commas, unescaped characters, and other typical JSON mistakes.
  • Malformed arrays/objects: Repairs incomplete arrays or objects by adding necessary elements to ensure structural integrity.

Incorporating this into mem0 improves the system's ability to handle edge cases where malformed JSON would otherwise lead to failed requests and retries.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

  • Unit Test

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

…id JSON and extract content from code blocks

In `memory/main.py`, the update introduces `json_repair` for robust JSON handling, ensuring the application can process and correct invalid JSON formats. This is particularly useful in environments where JSON data might not be well-formed. The code now also extracts JSON strings embedded within code blocks using regular expressions, enhancing the ability to process diverse response formats.

Additionally, `json-repair` is added to the dependencies in `pyproject.toml`, ensuring the necessary library is available for handling these JSON parsing improvements. This facilitates more resilient data processing capabilities in the application.
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Drew seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@clarkandrew clarkandrew changed the title Impprove JSON handling with regex extraction of code blocks and json_repair Fix Issue #1885: Extract JSON from Code Blocks and Handle Malformed JSON with json_repair Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Json output for different models varies
2 participants