miade-llm

MiADE supercharged with LLMs - for the detailed extraction of diagnosis.

This project uses LangServe to deploy langchain chains as REST API endpoints.

Environment Setup

This project uses the Replicate API to run models. You will need to set the REPLICATE_API_TOKEN environment variable (you usually get some free credits when you sign up). If you would like to use a different service or use self-hosted models, you would need to update the relation_extractor chain.

Update: The previous tested version of the relation_extractor chain used Mixtral-8x7B-instruct-v0.1, but this model is no longer available on Replicate. The current default model is microsoft/phi-3-mini-128k-instruct.

Prompts are currently pulled from LangChain Hub so you also need to set LANGCHAIN_API_KEY. You can view the full prompt here.

The model id, prompt, and extra model paths can be configured in config/config.yaml

(Optional) If you also want to configure LangSmith to trace and monitor chains, set these environment variables:

export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_PROJECT=<your-project>  # if not specified, defaults to "default"

Usage

To install dependencies make sure you have poetry installed:

pip install poetry

Then install the project dependencies with poetry:

cd src
poetry install

To spin up a LangServer instance run (make sure you are in the src directory):

poetry run langchain serve

This will start the FastAPI app with a server is running locally at http://localhost:8000

We can see all endpoints at http://localhost:8000/docs.

Access the playground at http://localhost:8000/name-of-package/playground

Access the endpoints from code with:

from langserve.client import RemoteRunnable

runnable = RemoteRunnable("http://localhost:8000/name-of-package")

Chains

relation-extractor

Extracts relations between concepts found in note and outputs in a JSON-format. Uses MedCAT for NER (requires model).

MedCAT model is required to run this chain. To download an example model trained on MIMIC:

pip install gdown
gdown 'https://drive.google.com/uc?export=download&id=17s999FIotRenltR6gr_f8ZjdaXc-u1Gx', -O ./data/models/miade_problems_model_f25ec9423958e8d6.zip

Experimental

SNOMED RAG Agent

An experimental agent that maps clinical concepts to SNOMED CT codes using LangGraph. The agent:

Takes a clinical relation triplet (e.g., "fracture of left femur") as input
Generates appropriate SNOMED CT search terms
Queries a SNOMED CT terminology server
Predicts and evaluates morphology and finding site attributes
Scores candidate terms to find the best SNOMED CT concept match

Example usage can be found in notebooks/snomed_agent.ipynb. Requires access to a SNOMED CT terminology server. Currently uses the Snowstorm SNOMED CT server.

Graph Components

Nodes

Search

Takes a relation triplet (e.g., {"node_1": "fracture", "node_2": "left femur", "edge": "of"})
The search node uses GPT-4 to generate appropriate search variations (e.g., "left femur fracture", "fracture of left femur").
These terms are used to query the SNOMED CT terminology server

Planner

The planner node analyzes the primary search term
Predicts two key SNOMED attributes:
- Morphology (form/structure of abnormality)
- Finding Site (anatomical location)
These predictions help evaluate candidate matches

Evaluator

The evaluator node:
1. Retrieves full concept details for each candidate
2. Compares predicted attributes with actual SNOMED relationships
3. Scores candidates on a 1-5 scale
4. Provides reasoning for each score
Candidates scoring ≥4 are added to a shortlist
A perfect match (score=5) is selected as the final candidate

Refiner

If no suitable candidate is found, the refine node can modify search terms
Process continues until either:
1. A perfect match is found
2. Maximum revisions are reached

Prompts

QUERY_PROMPT - Generates 1-3 search terms from relation triplet
ATTRIBUTE_PROMPT - Predicts morphology and finding site for a term
EVALUATION_PROMPT - Scores candidates on 1-5 scale based on attribute matches

State Schema

class AgentState(BaseModel):
    relations: str  # Input relation triplet
    search_terms: List[str]  # Generated search terms
    snomed_candidates: Dict[str, str]  # term -> conceptId mapping
    target_attributes: str  # Predicted morphology & finding site
    evals: List[ScoreCard]  # Evaluation results
    shortlist: List[ScoreCard]  # High-scoring candidates (≥4)
    final_candidate: ScoreCard  # Best matching candidate
    revision_number: int  # Current revision attempt
    max_revisions: int  # Max revision attempts

class ScoreCard(BaseModel):
    candidate_term: str
    snomed_id: str
    score: int  # 1-5 rating
    reasoning: str  # Explanation of score

Usage Considerations

Requires access to a SNOMED CT terminology server
Performance depends on server response times
Quality of matches relies on the underlying LLM's medical knowledge
Best suited for clinical terms with clear morphology and finding sites

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
config		config
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

miade-llm

Environment Setup

Usage

Chains

relation-extractor

Experimental

SNOMED RAG Agent

Graph Components

Nodes

Prompts

State Schema

Usage Considerations

About

Releases

Packages

Languages

License

uclh-criu/miade-llm

Folders and files

Latest commit

History

Repository files navigation

miade-llm

Environment Setup

Usage

Chains

relation-extractor

Experimental

SNOMED RAG Agent

Graph Components

Nodes

Prompts

State Schema

Usage Considerations

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages