Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BTE problem parsing Automat-robokop, leads to TRAPI validation crashes (ARS can't use BTE's response) #879

Open
colleenXu opened this issue Oct 1, 2024 · 10 comments
Assignees
Labels
On Test Related changes are deployed to Test server

Comments

@colleenXu
Copy link
Collaborator

colleenXu commented Oct 1, 2024

Related to #865

Noticed in https://arax.ci.transltr.io/?r=c961069f-36da-4369-a141-3aad9234e5ca. There's a red X-mark for validation. But it only shows the basic TypeError message and nothing else. Consequently, the ARS doesn't ingest BTE's response. (TRAPI "orange" errors don't prevent ARS from using BTE's response. The "red" critical errors do.)

So I downloaded BTE's response bte-ci-pf2-validationProblem.json.zip and used a notebook to run TRAPI validation locally. I've confirmed that this is the only critical error in BTE's response (by mutating the response to force all-string qualifier values. Then no critical/unusual errors. You can see this in the notebook right now)


The problem is the KG edge qualifier value is an array, not a string.

It looks like a BTE-parsing-response issue or MetaKG issue (parsing Automat robokop's meta_knowledge_graph response).

Validation report right before crash

	* Knowledge Graph Edge Qualifiers Qualifier:
		=> Validation of qualifier in qualifiers threw an unexpected exception

			$ infores:text-mining-provider-targeted -> infores:automat-robokop -> infores:biothings-explorer
				# NCBIGene:6615[biolink:Gene]--biolink:affected_by->NCBIGene:2739[biolink:Gene]
				- qualifier_type_id | qualifier_value | reason: 

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[6], line 3
      1 # validator.get_messages().keys()
----> 3 validator.dump()

File ~/miniconda3/envs/2023_08_01_TRAPIvalidator/lib/python3.11/site-packages/reasoner_validator/report.py:845, in ValidationReporter.dump(self, title, id_rows, msg_rows, compact_format, file)
    843     print(f"\t\t\t\t- {' | '.join(tags)}: ", file=file)
    844     first_message = False
--> 845 print(f"\t\t\t\t\t{' | '.join(parameters.values())}", file=file)
    846 messages_per_row += 1
    847 if msg_rows and messages_per_row >= msg_rows:

TypeError: sequence item 1: expected str instance, list found

I found 4 almost-identical (?) KG edges matching this error message

                "241537d1a84d7a3a271291c63329c562": {
                    "predicate": "biolink:affected_by",
                    "subject": "NCBIGene:6615",
                    "object": "NCBIGene:2739",
                    "qualifiers": [
                        {
                            "qualifier_type_id": "biolink:qualified_predicate",
                            "qualifier_value": [
                                "biolink::caused_by"
                            ]
                        },
                        {
                            "qualifier_type_id": "biolink:subject_aspect_qualifier",
                            "qualifier_value": [
                                "degradation",
                                "activity_or_abundance",
                                "stability",
                                "expression",
                                "abundance",
                                "activity"
                            ]
                        },
                        {
                            "qualifier_type_id": "biolink:subject_direction_qualifier",
                            "qualifier_value": [
                                "decreased",
                                "increased"
                            ]
                        }
                    ],
                    "attributes": [
                        {
                            "original_attribute_name": "agent_type",
                            "value": "text_mining_agent",
                            "attribute_type_id": "biolink:agent_type",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "knowledge_level",
                            "value": "not_provided",
                            "attribute_type_id": "biolink:knowledge_level",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "sentences",
                            "value": "For example, downregulation of EMT driver genes (A375-GLO1-KO versus A375-GLO1-WT) [such as FN1 (3.2-fold), MELTF (2.5-fold), MMP2 (2.8-fold), MMP9 (5.2-fold), MYC (3.9-fold), PTGS2 (7.4-fold), SNAI2 (4.1-fold), TFRC (9.1-fold), VIM (2.7), ZEB2 (3.3-fold)] was reversed by GLO1 re-expression (A375-GLO1-R versus A375-GLO1-KO) [causing upregulation of FN1 (5.5-fold), MELTF (2.5-fold), MMP2 (2.9-fold), MMP9 (4.9-fold), MYC (3.4-fold), PTGS2 (5.8-fold), SNAI1 (2.5-fold), TFRC (13.9-fold),|NA|For example, downregulation of EMT driver genes (A375-GLO1-KO versus A375-GLO1-WT) [such as FN1 (3.2-fold), MELTF (2.5-fold), MMP2 (2.8-fold), MMP9 (5.2-fold), MYC (3.9-fold), PTGS2 (7.4-fold), SNAI2 (4.1-fold), TFRC (9.1-fold), VIM (2.7), ZEB2 (3.3-fold)] was reversed by GLO1 re-expression (A375-GLO1-R versus A375-GLO1-KO) [causing upregulation of FN1 (5.5-fold), MELTF (2.5-fold), MMP2 (2.9-fold), MMP9 (4.9-fold), MYC (3.4-fold), PTGS2 (5.8-fold), SNAI1 (2.5-fold), TFRC (13.9-fold),|NA|For example, downregulation of EMT driver genes (A375-GLO1-KO versus A375-GLO1-WT) [such as FN1 (3.2-fold), MELTF (2.5-fold), MMP2 (2.8-fold), MMP9 (5.2-fold), MYC (3.9-fold), PTGS2 (7.4-fold), SNAI2 (4.1-fold), TFRC (9.1-fold), VIM (2.7), ZEB2 (3.3-fold)] was reversed by GLO1 re-expression (A375-GLO1-R versus A375-GLO1-KO) [causing upregulation of FN1 (5.5-fold), MELTF (2.5-fold), MMP2 (2.9-fold), MMP9 (4.9-fold), MYC (3.4-fold), PTGS2 (5.8-fold), SNAI1 (2.5-fold), TFRC (13.9-fold),|NA",
                            "attribute_type_id": "biolink:Attribute",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "tmkp_ids",
                            "value": [
                                "tmkp:2f87dfb6ef4f3c0e03839c93f56d5d1eb03e29bcad9658e68be9eb06182b503c",
                                "tmkp:50bac2d91922a1a3d8f223b8deea7751c6443cf47ef3c7cd40cb007356ffa71f",
                                "tmkp:81bc9b8d85f1154cb047a1e1e87e4fe5fd860f4a293bdb3b83a4fc7f3fcf4f0c"
                            ],
                            "attribute_type_id": "biolink:Attribute",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "tmkp_confidence_score",
                            "value": 0.99425588,
                            "attribute_type_id": "biolink:Attribute",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "publications",
                            "value": [
                                "PMC:7352620",
                                "PMC:7352620",
                                "PMC:7352620"
                            ],
                            "attribute_type_id": "biolink:publications",
                            "value_type_id": "linkml:Uriorcurie"
                        }
                    ],
                    "sources": [
                        {
                            "resource_id": "infores:text-mining-provider-targeted",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:automat-robokop",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:text-mining-provider-targeted"
                            ]
                        },
                        {
                            "resource_id": "infores:biothings-explorer",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:automat-robokop"
                            ]
                        }
                    ]
                },
               "92f8c15b2f0fb0588d793ede637f54b4": {
                    "predicate": "biolink:affected_by",
                    "subject": "NCBIGene:6615",
                    "object": "NCBIGene:2739",
                    "qualifiers": [
                        {
                            "qualifier_type_id": "biolink:qualified_predicate",
                            "qualifier_value": [
                                "biolink::caused_by"
                            ]
                        },
                        {
                            "qualifier_type_id": "biolink:subject_aspect_qualifier",
                            "qualifier_value": [
                                "transport",
                                "degradation",
                                "secretion",
                                "activity_or_abundance",
                                "molecular_interaction",
                                "stability",
                                "expression",
                                "metabolic_processing",
                                "abundance",
                                "synthesis",
                                "activity"
                            ]
                        },
                        {
                            "qualifier_type_id": "biolink:subject_direction_qualifier",
                            "qualifier_value": [
                                "decreased",
                                "increased"
                            ]
                        }
                    ],
                    "attributes": [
                        {
                            "original_attribute_name": "agent_type",
                            "value": "text_mining_agent",
                            "attribute_type_id": "biolink:agent_type",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "knowledge_level",
                            "value": "not_provided",
                            "attribute_type_id": "biolink:knowledge_level",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "sentences",
                            "value": "For example, downregulation of EMT driver genes (A375-GLO1-KO versus A375-GLO1-WT) [such as FN1 (3.2-fold), MELTF (2.5-fold), MMP2 (2.8-fold), MMP9 (5.2-fold), MYC (3.9-fold), PTGS2 (7.4-fold), SNAI2 (4.1-fold), TFRC (9.1-fold), VIM (2.7), ZEB2 (3.3-fold)] was reversed by GLO1 re-expression (A375-GLO1-R versus A375-GLO1-KO) [causing upregulation of FN1 (5.5-fold), MELTF (2.5-fold), MMP2 (2.9-fold), MMP9 (4.9-fold), MYC (3.4-fold), PTGS2 (5.8-fold), SNAI1 (2.5-fold), TFRC (13.9-fold),|NA|For example, downregulation of EMT driver genes (A375-GLO1-KO versus A375-GLO1-WT) [such as FN1 (3.2-fold), MELTF (2.5-fold), MMP2 (2.8-fold), MMP9 (5.2-fold), MYC (3.9-fold), PTGS2 (7.4-fold), SNAI2 (4.1-fold), TFRC (9.1-fold), VIM (2.7), ZEB2 (3.3-fold)] was reversed by GLO1 re-expression (A375-GLO1-R versus A375-GLO1-KO) [causing upregulation of FN1 (5.5-fold), MELTF (2.5-fold), MMP2 (2.9-fold), MMP9 (4.9-fold), MYC (3.4-fold), PTGS2 (5.8-fold), SNAI1 (2.5-fold), TFRC (13.9-fold),|NA|For example, downregulation of EMT driver genes (A375-GLO1-KO versus A375-GLO1-WT) [such as FN1 (3.2-fold), MELTF (2.5-fold), MMP2 (2.8-fold), MMP9 (5.2-fold), MYC (3.9-fold), PTGS2 (7.4-fold), SNAI2 (4.1-fold), TFRC (9.1-fold), VIM (2.7), ZEB2 (3.3-fold)] was reversed by GLO1 re-expression (A375-GLO1-R versus A375-GLO1-KO) [causing upregulation of FN1 (5.5-fold), MELTF (2.5-fold), MMP2 (2.9-fold), MMP9 (4.9-fold), MYC (3.4-fold), PTGS2 (5.8-fold), SNAI1 (2.5-fold), TFRC (13.9-fold),|NA",
                            "attribute_type_id": "biolink:Attribute",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "tmkp_ids",
                            "value": [
                                "tmkp:2f87dfb6ef4f3c0e03839c93f56d5d1eb03e29bcad9658e68be9eb06182b503c",
                                "tmkp:50bac2d91922a1a3d8f223b8deea7751c6443cf47ef3c7cd40cb007356ffa71f",
                                "tmkp:81bc9b8d85f1154cb047a1e1e87e4fe5fd860f4a293bdb3b83a4fc7f3fcf4f0c"
                            ],
                            "attribute_type_id": "biolink:Attribute",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "tmkp_confidence_score",
                            "value": 0.99425588,
                            "attribute_type_id": "biolink:Attribute",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "publications",
                            "value": [
                                "PMC:7352620",
                                "PMC:7352620",
                                "PMC:7352620"
                            ],
                            "attribute_type_id": "biolink:publications",
                            "value_type_id": "linkml:Uriorcurie"
                        }
                    ],
                    "sources": [
                        {
                            "resource_id": "infores:text-mining-provider-targeted",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:automat-robokop",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:text-mining-provider-targeted"
                            ]
                        },
                        {
                            "resource_id": "infores:biothings-explorer",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:automat-robokop"
                            ]
                        }
                    ]
                },
                "3a5cf5eb8b1c26de585da3377f340266": {
                    "predicate": "biolink:affected_by",
                    "subject": "NCBIGene:6615",
                    "object": "NCBIGene:2739",
                    "qualifiers": [
                        {
                            "qualifier_type_id": "biolink:qualified_predicate",
                            "qualifier_value": [
                                "biolink::caused_by"
                            ]
                        },
                        {
                            "qualifier_type_id": "biolink:subject_aspect_qualifier",
                            "qualifier_value": [
                                "degradation",
                                "secretion",
                                "uptake",
                                "activity_or_abundance",
                                "molecular_interaction",
                                "stability",
                                "expression",
                                "abundance",
                                "activity"
                            ]
                        },
                        {
                            "qualifier_type_id": "biolink:subject_direction_qualifier",
                            "qualifier_value": [
                                "decreased",
                                "increased"
                            ]
                        }
                    ],
                    "attributes": [
                        {
                            "original_attribute_name": "agent_type",
                            "value": "text_mining_agent",
                            "attribute_type_id": "biolink:agent_type",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "knowledge_level",
                            "value": "not_provided",
                            "attribute_type_id": "biolink:knowledge_level",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "sentences",
                            "value": "For example, downregulation of EMT driver genes (A375-GLO1-KO versus A375-GLO1-WT) [such as FN1 (3.2-fold), MELTF (2.5-fold), MMP2 (2.8-fold), MMP9 (5.2-fold), MYC (3.9-fold), PTGS2 (7.4-fold), SNAI2 (4.1-fold), TFRC (9.1-fold), VIM (2.7), ZEB2 (3.3-fold)] was reversed by GLO1 re-expression (A375-GLO1-R versus A375-GLO1-KO) [causing upregulation of FN1 (5.5-fold), MELTF (2.5-fold), MMP2 (2.9-fold), MMP9 (4.9-fold), MYC (3.4-fold), PTGS2 (5.8-fold), SNAI1 (2.5-fold), TFRC (13.9-fold),|NA|For example, downregulation of EMT driver genes (A375-GLO1-KO versus A375-GLO1-WT) [such as FN1 (3.2-fold), MELTF (2.5-fold), MMP2 (2.8-fold), MMP9 (5.2-fold), MYC (3.9-fold), PTGS2 (7.4-fold), SNAI2 (4.1-fold), TFRC (9.1-fold), VIM (2.7), ZEB2 (3.3-fold)] was reversed by GLO1 re-expression (A375-GLO1-R versus A375-GLO1-KO) [causing upregulation of FN1 (5.5-fold), MELTF (2.5-fold), MMP2 (2.9-fold), MMP9 (4.9-fold), MYC (3.4-fold), PTGS2 (5.8-fold), SNAI1 (2.5-fold), TFRC (13.9-fold),|NA|For example, downregulation of EMT driver genes (A375-GLO1-KO versus A375-GLO1-WT) [such as FN1 (3.2-fold), MELTF (2.5-fold), MMP2 (2.8-fold), MMP9 (5.2-fold), MYC (3.9-fold), PTGS2 (7.4-fold), SNAI2 (4.1-fold), TFRC (9.1-fold), VIM (2.7), ZEB2 (3.3-fold)] was reversed by GLO1 re-expression (A375-GLO1-R versus A375-GLO1-KO) [causing upregulation of FN1 (5.5-fold), MELTF (2.5-fold), MMP2 (2.9-fold), MMP9 (4.9-fold), MYC (3.4-fold), PTGS2 (5.8-fold), SNAI1 (2.5-fold), TFRC (13.9-fold),|NA",
                            "attribute_type_id": "biolink:Attribute",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "tmkp_ids",
                            "value": [
                                "tmkp:2f87dfb6ef4f3c0e03839c93f56d5d1eb03e29bcad9658e68be9eb06182b503c",
                                "tmkp:50bac2d91922a1a3d8f223b8deea7751c6443cf47ef3c7cd40cb007356ffa71f",
                                "tmkp:81bc9b8d85f1154cb047a1e1e87e4fe5fd860f4a293bdb3b83a4fc7f3fcf4f0c"
                            ],
                            "attribute_type_id": "biolink:Attribute",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "tmkp_confidence_score",
                            "value": 0.99425588,
                            "attribute_type_id": "biolink:Attribute",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "publications",
                            "value": [
                                "PMC:7352620",
                                "PMC:7352620",
                                "PMC:7352620"
                            ],
                            "attribute_type_id": "biolink:publications",
                            "value_type_id": "linkml:Uriorcurie"
                        }
                    ],
                    "sources": [
                        {
                            "resource_id": "infores:text-mining-provider-targeted",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:automat-robokop",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:text-mining-provider-targeted"
                            ]
                        },
                        {
                            "resource_id": "infores:biothings-explorer",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:automat-robokop"
                            ]
                        }
                    ]
                },
                "4d589f7402f498079be5c89750f6fe98": {
                    "predicate": "biolink:affected_by",
                    "subject": "NCBIGene:6615",
                    "object": "NCBIGene:2739",
                    "qualifiers": [
                        {
                            "qualifier_type_id": "biolink:qualified_predicate",
                            "qualifier_value": [
                                "biolink::caused_by"
                            ]
                        },
                        {
                            "qualifier_type_id": "biolink:subject_aspect_qualifier",
                            "qualifier_value": [
                                "transport",
                                "degradation",
                                "secretion",
                                "uptake",
                                "activity_or_abundance",
                                "molecular_interaction",
                                "stability",
                                "expression",
                                "metabolic_processing",
                                "abundance",
                                "synthesis",
                                "activity"
                            ]
                        },
                        {
                            "qualifier_type_id": "biolink:subject_direction_qualifier",
                            "qualifier_value": [
                                "decreased",
                                "increased"
                            ]
                        }
                    ],
                    "attributes": [
                        {
                            "original_attribute_name": "agent_type",
                            "value": "text_mining_agent",
                            "attribute_type_id": "biolink:agent_type",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "knowledge_level",
                            "value": "not_provided",
                            "attribute_type_id": "biolink:knowledge_level",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "sentences",
                            "value": "For example, downregulation of EMT driver genes (A375-GLO1-KO versus A375-GLO1-WT) [such as FN1 (3.2-fold), MELTF (2.5-fold), MMP2 (2.8-fold), MMP9 (5.2-fold), MYC (3.9-fold), PTGS2 (7.4-fold), SNAI2 (4.1-fold), TFRC (9.1-fold), VIM (2.7), ZEB2 (3.3-fold)] was reversed by GLO1 re-expression (A375-GLO1-R versus A375-GLO1-KO) [causing upregulation of FN1 (5.5-fold), MELTF (2.5-fold), MMP2 (2.9-fold), MMP9 (4.9-fold), MYC (3.4-fold), PTGS2 (5.8-fold), SNAI1 (2.5-fold), TFRC (13.9-fold),|NA|For example, downregulation of EMT driver genes (A375-GLO1-KO versus A375-GLO1-WT) [such as FN1 (3.2-fold), MELTF (2.5-fold), MMP2 (2.8-fold), MMP9 (5.2-fold), MYC (3.9-fold), PTGS2 (7.4-fold), SNAI2 (4.1-fold), TFRC (9.1-fold), VIM (2.7), ZEB2 (3.3-fold)] was reversed by GLO1 re-expression (A375-GLO1-R versus A375-GLO1-KO) [causing upregulation of FN1 (5.5-fold), MELTF (2.5-fold), MMP2 (2.9-fold), MMP9 (4.9-fold), MYC (3.4-fold), PTGS2 (5.8-fold), SNAI1 (2.5-fold), TFRC (13.9-fold),|NA|For example, downregulation of EMT driver genes (A375-GLO1-KO versus A375-GLO1-WT) [such as FN1 (3.2-fold), MELTF (2.5-fold), MMP2 (2.8-fold), MMP9 (5.2-fold), MYC (3.9-fold), PTGS2 (7.4-fold), SNAI2 (4.1-fold), TFRC (9.1-fold), VIM (2.7), ZEB2 (3.3-fold)] was reversed by GLO1 re-expression (A375-GLO1-R versus A375-GLO1-KO) [causing upregulation of FN1 (5.5-fold), MELTF (2.5-fold), MMP2 (2.9-fold), MMP9 (4.9-fold), MYC (3.4-fold), PTGS2 (5.8-fold), SNAI1 (2.5-fold), TFRC (13.9-fold),|NA",
                            "attribute_type_id": "biolink:Attribute",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "tmkp_ids",
                            "value": [
                                "tmkp:2f87dfb6ef4f3c0e03839c93f56d5d1eb03e29bcad9658e68be9eb06182b503c",
                                "tmkp:50bac2d91922a1a3d8f223b8deea7751c6443cf47ef3c7cd40cb007356ffa71f",
                                "tmkp:81bc9b8d85f1154cb047a1e1e87e4fe5fd860f4a293bdb3b83a4fc7f3fcf4f0c"
                            ],
                            "attribute_type_id": "biolink:Attribute",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "tmkp_confidence_score",
                            "value": 0.99425588,
                            "attribute_type_id": "biolink:Attribute",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "publications",
                            "value": [
                                "PMC:7352620",
                                "PMC:7352620",
                                "PMC:7352620"
                            ],
                            "attribute_type_id": "biolink:publications",
                            "value_type_id": "linkml:Uriorcurie"
                        }
                    ],
                    "sources": [
                        {
                            "resource_id": "infores:text-mining-provider-targeted",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:automat-robokop",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:text-mining-provider-targeted"
                            ]
                        },
                        {
                            "resource_id": "infores:biothings-explorer",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:automat-robokop"
                            ]
                        }
                    ]
                },

Easier way to test: quick non-creative-mode query to retrieve these problematic edges

POST query to local instance ARA-mode: http://localhost:3000/v1/query

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": ["biolink:Gene"]
                },
                "n1": {
                    "ids": ["NCBIGene:2739"],
                    "categories": ["biolink:Gene"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:affected_by"]
                }
            }
        }
    }
}

Querying Automat robokop directly: no qualifier issues + only 1 corresponding edge

POST to https://automat.ci.transltr.io/robokopkg/query.

Will get some other edges too: this KP seems to have query-directionality issues. It also returns edges in the canonical direction (not what was asked for).

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": ["biolink:Gene"]
                },
                "n1": {
                    "ids": ["NCBIGene:2739"],
                    "categories": ["biolink:Gene"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:affected_by"]
                }
            }
        }
    }
}

Only found 1 edge that matches the 4 BTE edges and its qualifier set is fine...

                "5:4bbbd080-7349-4312-bad6-e2c56677957a:136870644": {
                    "subject": "NCBIGene:2739",
                    "predicate": "biolink:affects",
                    "object": "NCBIGene:6615",
                    "sources": [
                        {
                            "resource_id": "infores:text-mining-provider-targeted",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:automat-robokop",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:text-mining-provider-targeted"
                            ]
                        }
                    ],
                    "attributes": [
                        {
                            "original_attribute_name": "agent_type",
                            "value": "text_mining_agent",
                            "attribute_type_id": "biolink:agent_type",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "knowledge_level",
                            "value": "not_provided",
                            "attribute_type_id": "biolink:knowledge_level",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "sentences",
                            "value": "For example, downregulation of EMT driver genes (A375-GLO1-KO versus A375-GLO1-WT) [such as FN1 (3.2-fold), MELTF (2.5-fold), MMP2 (2.8-fold), MMP9 (5.2-fold), MYC (3.9-fold), PTGS2 (7.4-fold), SNAI2 (4.1-fold), TFRC (9.1-fold), VIM (2.7), ZEB2 (3.3-fold)] was reversed by GLO1 re-expression (A375-GLO1-R versus A375-GLO1-KO) [causing upregulation of FN1 (5.5-fold), MELTF (2.5-fold), MMP2 (2.9-fold), MMP9 (4.9-fold), MYC (3.4-fold), PTGS2 (5.8-fold), SNAI1 (2.5-fold), TFRC (13.9-fold),|NA|For example, downregulation of EMT driver genes (A375-GLO1-KO versus A375-GLO1-WT) [such as FN1 (3.2-fold), MELTF (2.5-fold), MMP2 (2.8-fold), MMP9 (5.2-fold), MYC (3.9-fold), PTGS2 (7.4-fold), SNAI2 (4.1-fold), TFRC (9.1-fold), VIM (2.7), ZEB2 (3.3-fold)] was reversed by GLO1 re-expression (A375-GLO1-R versus A375-GLO1-KO) [causing upregulation of FN1 (5.5-fold), MELTF (2.5-fold), MMP2 (2.9-fold), MMP9 (4.9-fold), MYC (3.4-fold), PTGS2 (5.8-fold), SNAI1 (2.5-fold), TFRC (13.9-fold),|NA|For example, downregulation of EMT driver genes (A375-GLO1-KO versus A375-GLO1-WT) [such as FN1 (3.2-fold), MELTF (2.5-fold), MMP2 (2.8-fold), MMP9 (5.2-fold), MYC (3.9-fold), PTGS2 (7.4-fold), SNAI2 (4.1-fold), TFRC (9.1-fold), VIM (2.7), ZEB2 (3.3-fold)] was reversed by GLO1 re-expression (A375-GLO1-R versus A375-GLO1-KO) [causing upregulation of FN1 (5.5-fold), MELTF (2.5-fold), MMP2 (2.9-fold), MMP9 (4.9-fold), MYC (3.4-fold), PTGS2 (5.8-fold), SNAI1 (2.5-fold), TFRC (13.9-fold),|NA",
                            "attribute_type_id": "biolink:Attribute",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "tmkp_ids",
                            "value": [
                                "tmkp:2f87dfb6ef4f3c0e03839c93f56d5d1eb03e29bcad9658e68be9eb06182b503c",
                                "tmkp:50bac2d91922a1a3d8f223b8deea7751c6443cf47ef3c7cd40cb007356ffa71f",
                                "tmkp:81bc9b8d85f1154cb047a1e1e87e4fe5fd860f4a293bdb3b83a4fc7f3fcf4f0c"
                            ],
                            "attribute_type_id": "biolink:Attribute",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "tmkp_confidence_score",
                            "value": 0.99425588,
                            "attribute_type_id": "biolink:Attribute",
                            "value_type_id": "EDAM:data_0006"
                        },
                        {
                            "original_attribute_name": "publications",
                            "value": [
                                "PMC:7352620",
                                "PMC:7352620",
                                "PMC:7352620"
                            ],
                            "attribute_type_id": "biolink:publications",
                            "value_type_id": "linkml:Uriorcurie"
                        }
                    ],
                    "qualifiers": [
                        {
                            "qualifier_type_id": "biolink:object_direction_qualifier",
                            "qualifier_value": "decreased"
                        },
                        {
                            "qualifier_type_id": "biolink:qualified_predicate",
                            "qualifier_value": "biolink:causes"
                        },
                        {
                            "qualifier_type_id": "biolink:object_aspect_qualifier",
                            "qualifier_value": "activity_or_abundance"
                        }
                    ]
                },

@colleenXu
Copy link
Collaborator Author

Probably want to fix ASAP @andrewsu @tokebe @rjawesome @NeuralFlux

@NeuralFlux
Copy link
Contributor

Going through it now. I will discuss with @rjawesome later.

@NeuralFlux
Copy link
Contributor

NeuralFlux commented Oct 1, 2024

@colleenXu do we have a query which returns a vaild TRAPI response, i.e., qualifier_value is a string? I couldn't find any edges in the given test query's response where it is a string. (randomly sampled 10 of them)

@NeuralFlux NeuralFlux self-assigned this Oct 1, 2024
@RichardBruskiewich
Copy link

@colleenXu , the reasoner-validator reporting bug you identified is resolved in release 4.2.7

As you noted, a real validation error was triggered by the use of array values for qualifiers. This is actually a TRAPI compliance error. The patch to reasoner-validator just now ensures that the report succeeds in dumping its results without crashing. I guess this Biothings issue can be closed?

@tokebe
Copy link
Member

tokebe commented Oct 2, 2024

@RichardBruskiewich This issue is still tracking what BTE is doing wrong to produce the legitimate validation error, so it'll remain open until a fix is made on BTE's side and said fix makes its way up to Prod.

@RichardBruskiewich
Copy link

RichardBruskiewich commented Oct 2, 2024

Ok... the BTE TRAPI non-compliance error should now be properly reported and simply relates to the face that Edge qualifiers cannot be arrays of values, only a single string scalar value

@NeuralFlux
Copy link
Contributor

NeuralFlux commented Oct 3, 2024

I believe the issue arose because

  1. While initializing a record, we assign the association's qualifiers to the record's qualifiers if the latter is undefined
  2. While reversing a record, we reverse the association's qualifiers and assign them to the record's qualifiers

Since an association's qualifier may have an array value in qualifier_value, it can creep into the record's qualifiers, and eventually into the edges returned by BTE.

For the fix, (1) I init record's qualifiers to an empty set if they're undefined (only TRAPI-specific records, look at next comment) and (2) I separately reverse the record's and its association's qualifiers.

@tokebe
Copy link
Member

tokebe commented Oct 3, 2024

Noting for the above that this should be a problem specific to TRAPI KPs -- AFAIK Non-TRAPI KPs aren't given array qualifier values on individual operations (CC @colleenXu to confirm) because it's expected that record qualifiers will be drawn from the association.

@NeuralFlux
Copy link
Contributor

Thanks for the note @tokebe , updated my comment.

@colleenXu
Copy link
Collaborator Author

Yup this shouldn't show up in non-TRAPI KPs. all x-bte operations are set up to only have 1 string/value per qualifier_type_id.

@colleenXu colleenXu added On CI Related changes are deployed to CI server On CI -> Test and removed On CI Related changes are deployed to CI server labels Oct 22, 2024
@tokebe tokebe added On Test Related changes are deployed to Test server and removed On CI -> Test labels Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
On Test Related changes are deployed to Test server
Projects
None yet
Development

No branches or pull requests

4 participants