Awesome Semantic-Search

Looking for More Paper Additions. PS: Raise a PR

Following repository aims to serve a meta-repository for Semantic Search and Semantic Similarity related tasks.

Semantic Search isn't limited to text! It can be done with images, speech, etc.There are numerous different use-cases and applications of semantic search.

Feel free to raise a PR on this repo!

Papers

2010

Priority Range Trees
Information Retrieval and the semantic web 📄

2014

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval 📄

2015

Skip-Thought Vectors 📄
Practical and Optimal LSH for Angular Distance

2016

Bag of Tricks for Efficient Text Classification 📄
Enriching Word Vectors with Subword Information 📄
Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
On Approximately Searching for Similar Word Embeddings
Learning Distributed Representations of Sentences from Unlabelled Data📄
Approximate Nearest Neighbor Search on High Dimensional Data --- Experiments, Analyses, and Improvement

2017

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data 📄
Semantic Textual Similarity For Hindi📄
Efficient Natural Language Response Suggestion for Smart Reply📃

2018

Universal Sentence Encoder 📄
Learning Semantic Textual Similarity from Conversations 📄
Google AI Blog: Advances in Semantic Textual Similarity 📄
Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech)🔊
Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data 🔊
Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph
The Case for Learned Index Structures

2019

LASER: Language Agnostic Sentence Representations 📄
Document Expansion by Query Prediction 📄
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks 📄
Multi-Stage Document Ranking with BERT 📄
Latent Retrieval for Weakly Supervised Open Domain Question Answering
End-to-End Open-Domain Question Answering with BERTserini
BioBERT: a pre-trained biomedical language representation model for biomedical text mining📄
Analyzing and Improving Representations with the Soft Nearest Neighbor Loss📷
DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node

2020

Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned 📄
PASSAGE RE-RANKING WITH BERT 📄
CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization 📄
LaBSE:Language-agnostic BERT Sentence Embedding 📄
Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset 📄
DeText: A deep NLP framework for intelligent text understanding 📄
Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation 📄
Pretrained Transformers for Text Ranking: BERT and Beyond 📄
REALM: Retrieval-Augmented Language Model Pre-Training
ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS📄
Improving Deep Learning For Airbnb Search
Managing Diversity in Airbnb Search📄
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval📄
Unsupervised Image Style Embeddings for Retrieval and Recognition Tasks📷
DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations📄

2021

Hybrid approach for semantic similarity calculation between Tamil words 📄
Augmented SBERT 📄
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models 📄
Compatibility-aware Heterogeneous Visual Search 📷
Learning Personal Style from Few Examples📷
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning📄
A Survey of Transformers📄📷
SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking📄
High Quality Related Search Query Suggestions using Deep Reinforcement Learning
Embedding-based Product Retrieval in Taobao Search📄📷
TPRM: A Topic-based Personalized Ranking Model for Web Search📄
mMARCO: A Multilingual Version of MS MARCO Passage Ranking Dataset📄
Database Reasoning Over Text📄
How Does Adversarial Fine-Tuning Benefit BERT?)📄
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation📄
Primer: Searching for Efficient Transformers for Language Modeling📄
How Familiar Does That Sound? Cross-Lingual Representational Similarity Analysis of Acoustic Word Embeddings🔊
SimCSE: Simple Contrastive Learning of Sentence Embeddings📄
Compositional Attention: Disentangling Search and Retrieval📄📷
SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search
GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval 📄
Generative Search Engines: Initial Experiments 📷
Rethinking Search: Making Domain Experts out of Dilettantes -WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach

2022

Text and Code Embeddings by Contrastive Pre-Training📄
RELIC: Retrieving Evidence for Literary Claims📄
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations📄
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation🔊
An Analysis of Fusion Functions for Hybrid Retrieval📄
Out-of-distribution Detection with Deep Nearest Neighbors
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition🔊
Analyzing Acoustic Word Embeddings From Pre-Trained Self-Supervised Speech Models)🔊
Rethinking with Retrieval: Faithful Large Language Model Inference📄
Precise Zero-Shot Dense Retrieval without Relevance Labels📄
Transformer Memory as a Differentiable Search Index📄

2023

FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Search📄
“Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors📄
SparseEmbed: Learning Sparse Lexical Representations with Contextual Embeddings for Retrieval 📄

Articles

Tackling Semantic Search
Semantic search in Azure Cognitive Search
How we used semantic search to make our search 10x smarter
Stanford AI Blog : Building Scalable, Explainable, and Adaptive NLP Models with Retrieval
Building a semantic search engine with dual space word embeddings
Billion-scale semantic similarity search with FAISS+SBERT
Some observations about similarity search thresholds
Near Duplicate Image Search using Locality Sensitive Hashing
Free Course on Vector Similarity Search and Faiss
Comprehensive Guide To Approximate Nearest Neighbors Algorithms
Introducing the hybrid index to enable keyword-aware semantic search
Argilla Semantic Search
Co:here's Multilingual Text Understanding Model
Simplify Search woth Multilingual Embedding Models

Libraries and Tools

fastText
Universal Sentence Encoder
SBERT
ELECTRA
LaBSE
LASER
Relevance AI - Vector Platform From Experimentation To Deployment
Haystack
Jina.AI
pinecone
SentEval Toolkit
ranx
BEIR :Benchmarking IR
RELiC: Retrieving Evidence for Literary Claims Dataset
matchzoo-py
deep_text_matching
Which Frame?
lexica.art
emoji semantic search
PySerini
BERTSerini
BERTSimilarity
milvus
NeuroNLP++
weaviate
semantic-search-through-wikipedia-with-weaviate
natural-language-youtube-search
same.energy
ann benchmarks
scaNN
REALM
annoy
pynndescent
nsg
FALCONN
redis HNSW
autofaiss
DPR
rank_BM25
FlashRank
nearPy
vearch
vespa
PyNNDescent
pgANN
Tensorflow Similarity
opensemanticsearch.org
GPT3 Semantic Search
searchy
txtai
HyperTag
vectorai
embeddinghub
AquilaDb
STripNet

Datasets

Semantic Text Similarity Dataset Hub
Facebook AI Image Similarity Challenge
WIT : Wikipedia-based Image Text Dataset
BEIR
MTEB

Milestones

Have a look at the project board for the task list to contribute to any of the open issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Awesome Semantic-Search

Contents

Papers

2010

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

Articles

Libraries and Tools

Datasets

Milestones

Files

README.md

Latest commit

History

README.md

File metadata and controls

Awesome Semantic-Search

Contents

Papers

2010

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

Articles

Libraries and Tools

Datasets

Milestones