embs is your one-stop toolkit for handling document ingestion, embedding, and ranking workflows. Whether you're building a retrieval-augmented generation (RAG) system, a chatbot, or a semantic search engine, embs
makes it easy to integrate document retrieval, embeddings, and ranking with minimal setup.
-
Free External APIs:
- Docsifer for document conversion (PDFs, URLs, images, etc.) and
- Lightweight Embeddings API for generating state-of-the-art embeddings, including multi-language support and some of the best models for NLP tasks — all provided free of charge.
- These APIs support top-tier multilingual embeddings models like
sentence-transformers
andOpenAI-compatible embeddings
, so you can achieve top-quality results with minimal configuration.
-
Perfect for RAG Systems: Automatically convert and split documents into meaningful chunks, generate embeddings, and rank them — all tailored for retrieval-augmented workflows like OpenAI GPT or other generative models.
-
Integrates with Chatbots: Preprocess, split, and embed your knowledge base to build conversational systems that respond with accurate and contextually relevant answers.
-
Flexible Splitting: Use built-in or custom chunking strategies to split large documents into smaller, retrievable sections. This improves relevance in retrieval-based workflows.
-
Unified Pipeline: Streamline everything from document ingestion to semantic ranking with a single API.
-
Lightweight & Extensible: No heavy dependencies beyond
aiohttp
. Easily fits into your existing infrastructure.
pip install embs
Or in pyproject.toml
(Poetry):
[tool.poetry.dependencies]
embs = "^0.1.0"
In RAG workflows, retrieved knowledge informs a generative model like GPT to produce accurate and relevant answers. embs
simplifies this by:
- Converting raw documents (PDFs, URLs) to clean text or markdown.
- Splitting documents into retrievable chunks (e.g., by headers or lines).
- Embedding chunks with powerful multilingual models.
- Ranking the chunks for relevance to the query.
With caching enabled, repeated requests are even faster, ensuring scalability for real-world deployments.
Below is an end-to-end example that retrieves documents, applies the built-in Markdown splitter, generates embeddings, and ranks them by query relevance. This showcases how embs
works perfectly for chatbot or RAG pipelines.
import asyncio
from functools import partial
from embs import Embs
async def main():
# Markdown-based splitter configuration
split_config = {
"headers_to_split_on": [("#", "h1"), ("##", "h2"), ("###", "h3")],
"return_each_line": False, # Keep chunks as sections, not individual lines
"strip_headers": True # Remove header text from the chunks
}
md_splitter = partial(Embs.markdown_splitter, config=split_config)
# Initialize the Embs client
client = Embs()
# Step 1: Retrieve documents and split them by Markdown headers
raw_docs = await client.retrieve_documents_async(
files=["/path/to/sample.pdf"],
urls=["https://example.com"],
splitter=md_splitter # Apply built-in markdown splitter
)
print(f"Total chunks after splitting: {len(raw_docs)}")
# Step 2: Rank the retrieved documents by relevance to a query
results = await client.search_documents_async(
query="Explain quantum computing",
files=["/path/to/quantum_theory.pdf"], # Additional files to retrieve and rank
urls=["https://example.com/quantum.html"],
splitter=md_splitter # Apply splitter for additional sources
)
# Step 3: Output the top-ranked results
for item in results[:3]:
print(f"File: {item['filename']} | Score: {item['probability']:.4f}")
print(f"Snippet: {item['markdown'][:80]}...\n")
asyncio.run(main())
- Context-Aware Answers: By splitting large documents into manageable chunks and ranking them for relevance, your chatbot always responds with the most contextually appropriate snippet.
- Multilingual Embeddings: The Lightweight Embeddings API supports embeddings for multiple languages, so your chatbot can handle diverse user inputs and knowledge bases.
- Caching for Scalability: Repeated retrieval or ranking operations are sped up dramatically with in-memory or disk-based caching, ensuring low-latency responses.
Below are the primary methods in embs
. All async methods have a synchronous equivalent.
Convert files and/or URLs into Markdown using Docsifer. Optionally, apply a splitter to break down large documents into chunks.
async def retrieve_documents_async(
files=None,
urls=None,
openai_config=None,
settings=None,
concurrency=5,
options=None,
splitter=None
) -> List[Dict[str, str]]:
...
- Params:
files
: List of file paths or file-like objects.urls
: List of URLs for Docsifer to process.splitter
: A callable that receives and returns a list of docs, e.g.,Embs.markdown_splitter
.
- Returns: A list of documents (
{"filename": <str>, "markdown": <str>}
).
Generate embeddings for text or a list of texts using the Lightweight Embeddings API.
async def embed_async(
text_or_texts: Union[str, List[str]],
model=None
) -> Dict[str, Any]:
...
- Params:
text_or_texts
: Single string or list of strings to embed.model
: Optional; specify the embedding model (defaults tosnowflake-arctic-embed-l-v2.0
).
- Returns: Embedding data as a dictionary.
Rank a list of text candidates by relevance to a query using the Lightweight Embeddings API.
async def rank_async(
query: str,
candidates: List[str],
model=None
) -> List[Dict[str, Any]]:
...
- Params:
query
: The query string.candidates
: List of candidate texts.model
: Optional; specify the ranking model (defaults tosnowflake-arctic-embed-l-v2.0
).
- Returns: A ranked list of
{"text": <candidate>, "probability": <float>, "cosine_similarity": <float>}
.
Retrieve documents (files/URLs), optionally split them, and rank their chunks by relevance to a query.
async def search_documents_async(
query: str,
files=None,
urls=None,
openai_config=None,
settings=None,
concurrency=5,
options=None,
model=None,
splitter=None
) -> List[Dict[str, Any]]:
...
- Params:
query
: The query to rank against.files
,urls
: As inretrieve_documents_async
.splitter
: Optional; e.g., useEmbs.markdown_splitter
.
- Returns: A ranked list of chunks with
{"filename": ..., "markdown": ..., "probability": ..., "cosine_similarity": ...}
.
Enable in-memory or disk-based caching to avoid redundant processing:
cache_conf = {
"enabled": True,
"type": "memory", # or "disk"
"prefix": "myapp",
"dir": "cache_folder", # only needed for disk caching
"max_mem_items": 128,
"max_ttl_seconds": 86400
}
client = Embs(cache_config=cache_conf)
- Memory Caching: Quick lookups using LRU with TTL expiration.
- Disk Caching: Stores JSON files to a specified directory, evicting older files after TTL expiration.
embs
is rigorously tested using pytest
and pytest-asyncio
. To ensure that retrieval, embeddings, ranking, caching, and splitting are working as expected, run:
pytest --asyncio-mode=auto
Licensed under the MIT License. See LICENSE for details.
Contributions are welcome! Submit issues, ideas, or pull requests to help improve embs
.