Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CYPHER] UNWIND + data + type check doesn't execute whole batch. #1948

Open
camomiy opened this issue Feb 7, 2025 · 1 comment
Open

[CYPHER] UNWIND + data + type check doesn't execute whole batch. #1948

camomiy opened this issue Feb 7, 2025 · 1 comment

Comments

@camomiy
Copy link

camomiy commented Feb 7, 2025

Hello

ArcadeDB Version:

ArcadeDB Server v24.11.2 (build 055592c73d27d894c26f3faaf7df22e15c28f03d/1733838531445/main)

OS and JDK Version:

Running on Linux 5.15.0-105-generic - OpenJDK 64-Bit Server VM 17.0.13

Expected behavior

So the idea is we want to execute the following request :

            UNWIND $batch as row
            MATCH (a:CHUNK) WHERE ID(a) = row.source_id
            MATCH (b) WHERE ID(b) = row.target_id
            MERGE (a)-[r:TEST_TEST_TEST_TEST]->(b)
             RETURN a, b, r

Our param data is :

[
    {
        "source_id": "#4:0",
        "target_id": "#217:0",
        "features": {}
    },
    {
        "source_id": "#4:0",
        "target_id": "#52:0",
        "features": {}
    }
]
  • Please note that the source_id in the two entries are the same
  • Please note the only check is that the A node (source id) is a CHUNK node
    In other words, if the source_ids are all the same, if the check pass for one entry it should pass for all (the two).

When executing the query, here is what is returned by the arcade db rest api AN ARRAY OF LENGTH = 1 :

[
    {
        "a": {
            "@rid": "#4:0",
            "@type": "CHUNK",
            "subtype": "CHUNK",
            "name": "document OCRized image, chunk 1",
            "text": "TRUNCATED TEXT",
            "index": 0
        },
        "b": {
            "@rid": "#217:0",
            "@type": "PIPELINE_CONFIG",
            "pipelines": [
                "data_ingestion_pipeline",
                "ocr"
            ]
        },
        "r": {
            "@in": "#217:0",
            "@out": "#4:0",
            "@rid": "#297:0",
            "@type": "TEST_TEST_TEST_TEST"
        }
    }
]

This is quite unexpected.

When editing the query and removing the CHUNK check from MATCH (a:CHUNK) WHERE ID(a) = row.source_id -> MATCH (a) WHERE ID(a) = row.source_id there is no issue, returned array len is 2 and all the relations (the two) are correctly created.

New query :

            UNWIND $batch as row
            MATCH (a) WHERE ID(a) = row.source_id
            MATCH (b) WHERE ID(b) = row.target_id
            MERGE (a)-[r:TEST_TEST_TEST_TEST_2222]->(b)
            RETURN a, b, r

New result with the expected size :

[
    {
        "a": {
            "@rid": "#4:0",
            "@type": "CHUNK",
            "subtype": "CHUNK",
            "name": "document OCRized image, chunk 1",
            "text": "ANOTHER TRUNCATED",
            "index": 0
        },
        "b": {
            "@rid": "#217:0",
            "@type": "PIPELINE_CONFIG",
            "pipelines": [
                "data_ingestion_pipeline",
                "ocr"
            ]
        },
        "r": {
            "@in": "#217:0",
            "@out": "#4:0",
            "@rid": "#305:0",
            "@type": "TEST_TEST_TEST_TEST_2222"
        }
    },
    {
        "a": {
            "@rid": "#4:0",
            "@type": "CHUNK",
            "subtype": "CHUNK",
            "name": "document OCRized image, chunk 1",
            "text": "TRUNCATED TEXT",
            "index": 0
        },
        "b": {
            "@rid": "#52:0",
            "@type": "IMAGE",
            "name": "manchots-17.webp",
            "file_path": "TRUNCATED PATH",
            "id_doc": "#28:0",
            "mime_type": "image/webp",
            "last_modified": "ven. f\\u00e9vr. 07 08:13:07 +00:00 2025",
            "llava_flag": true,
            "clip_flag": true,
            "ocr_flag": false,
            "text": "TRUNCATED TEXT"
        },
        "r": {
            "@in": "#52:0",
            "@out": "#4:0",
            "@rid": "#306:0",
            "@type": "TEST_TEST_TEST_TEST_2222"
        }
    }
]

Here is a database backup right before executing the queries, you'll notice on the chunks RID posted above debug relations, you can ignore them.

POLAIRE_OCR2-backup-20250207-083229355.zip

@ExtReMLapin
Copy link
Contributor

Suspiciously similar to #1929

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants