RAG0017: Clean Knowledge Graph Data #35

tenzin3 · 2024-09-16T03:50:23Z

Description

Knowledge graph triples are generated by providing prompts to LLMs. Due to constraints like context length and the need for better output quality, the unstructured text is processed in smaller chunks rather than all at once. As a result, a large amount of fragmented graph data is produced. In this scenario, the processes of collating, deduplicating, and eliminating similar relations and entities become crucial to ensure accuracy and efficiency.

Expected Output

A deduplicated and consolidated knowledge graph with unique entities and relations, ensuring clarity and eliminating redundancy.

Implementation Plan

combine all graphs
filter with string similarity
filter with embedding similarity checking

tenzin3 · 2024-09-16T03:51:33Z

Methods to clean the knowledge graph

Perform string similarity when collating nodes and relations into one giant knowledge graph.
convert nodes (name: string) into embedding using the fintuned embedding model and then perform cosine similarity check to get similar nodes.
check overlapping relations and properties.
human in the loop for final quality review

tenzin3 · 2024-09-16T06:28:02Z

Graph Schema(uncleaned):

Observation:

there are few entities schema in total 15, which i believe is very simplified and good.
some entities(Structure and Deity) has few associated with it while some(Location) has huge number of nodes associated with it.
lot of filtering and cleaning needed for relation

tenzin3 added this to LLM and Rag Sep 12, 2024

tenzin3 self-assigned this Sep 16, 2024

tenzin3 converted this from a draft issue Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAG0017: Clean Knowledge Graph Data #35

RAG0017: Clean Knowledge Graph Data #35

tenzin3 commented Sep 16, 2024 •

edited

Loading

tenzin3 commented Sep 16, 2024

tenzin3 commented Sep 16, 2024

RAG0017: Clean Knowledge Graph Data #35

RAG0017: Clean Knowledge Graph Data #35

Comments

tenzin3 commented Sep 16, 2024 • edited Loading

Description

Expected Output

Implementation Plan

tenzin3 commented Sep 16, 2024

tenzin3 commented Sep 16, 2024

Graph Schema(uncleaned):

Observation:

tenzin3 commented Sep 16, 2024 •

edited

Loading