Logo made by @createdbytango.
Looking for More Paper Additions. PS: Raise a PR
Following repository aims to serve a meta-repository for Semantic Search and Semantic Similarity related tasks.
Semantic Search isn't limited to text! It can be done with images, speech, etc.There are numerous different use-cases and applications of semantic search.
Feel free to raise a PR on this repo!
- Bag of Tricks for Efficient Text Classification π
- Enriching Word Vectors with Subword Information π
- Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
- On Approximately Searching for Similar Word Embeddings
- Learning Distributed Representations of Sentences from Unlabelled Dataπ
- Approximate Nearest Neighbor Search on High Dimensional Data --- Experiments, Analyses, and Improvement
- Supervised Learning of Universal Sentence Representations from Natural Language Inference Data π
- Semantic Textual Similarity For Hindiπ
- Efficient Natural Language Response Suggestion for Smart Replyπ
- Universal Sentence Encoder π
- Learning Semantic Textual Similarity from Conversations π
- Google AI Blog: Advances in Semantic Textual Similarity π
- Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech)π
- Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data π
- Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph
- The Case for Learned Index Structures
- LASER: Language Agnostic Sentence Representations π
- Document Expansion by Query Prediction π
- Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks π
- Multi-Stage Document Ranking with BERT π
- Latent Retrieval for Weakly Supervised Open Domain Question Answering
- End-to-End Open-Domain Question Answering with BERTserini
- BioBERT: a pre-trained biomedical language representation model for biomedical text miningπ
- Analyzing and Improving Representations with the Soft Nearest Neighbor Lossπ·
- DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node
- Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned π
- PASSAGE RE-RANKING WITH BERT π
- CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization π
- LaBSE:Language-agnostic BERT Sentence Embedding π
- Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset π
- DeText: A deep NLP framework for intelligent text understanding π
- Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation π
- Pretrained Transformers for Text Ranking: BERT and Beyond π
- REALM: Retrieval-Augmented Language Model Pre-Training
- ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORSπ
- Improving Deep Learning For Airbnb Search
- Managing Diversity in Airbnb Searchπ
- Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrievalπ
- Unsupervised Image Style Embeddings for Retrieval and Recognition Tasksπ·
- DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representationsπ
- Hybrid approach for semantic similarity calculation between Tamil words π
- Augmented SBERT π
- BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models π
- Compatibility-aware Heterogeneous Visual Search π·
- Learning Personal Style from Few Examplesπ·
- TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learningπ
- A Survey of Transformersππ·
- SPLADE: Sparse Lexical and Expansion Model for First Stage Rankingπ
- High Quality Related Search Query Suggestions using Deep Reinforcement Learning
- Embedding-based Product Retrieval in Taobao Searchππ·
- TPRM: A Topic-based Personalized Ranking Model for Web Searchπ
- mMARCO: A Multilingual Version of MS MARCO Passage Ranking Datasetπ
- Database Reasoning Over Textπ
- How Does Adversarial Fine-Tuning Benefit BERT?)π
- Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolationπ
- Primer: Searching for Efficient Transformers for Language Modelingπ
- How Familiar Does That Sound? Cross-Lingual Representational Similarity Analysis of Acoustic Word Embeddingsπ
- SimCSE: Simple Contrastive Learning of Sentence Embeddingsπ
- Compositional Attention: Disentangling Search and Retrievalππ·
- SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search
- GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval π
- Generative Search Engines: Initial Experiments π·
- Rethinking Search: Making Domain Experts out of Dilettantes -WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach
- Text and Code Embeddings by Contrastive Pre-Trainingπ
- RELIC: Retrieving Evidence for Literary Claimsπ
- Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillationsπ
- SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representationπ
- An Analysis of Fusion Functions for Hybrid Retrievalπ
- Out-of-distribution Detection with Deep Nearest Neighbors
- ESB: A Benchmark For Multi-Domain End-to-End Speech Recognitionπ
- Analyzing Acoustic Word Embeddings From Pre-Trained Self-Supervised Speech Models)π
- Rethinking with Retrieval: Faithful Large Language Model Inferenceπ
- Precise Zero-Shot Dense Retrieval without Relevance Labelsπ
- Transformer Memory as a Differentiable Search Indexπ
- FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Searchπ
- βLow-Resourceβ Text Classification: A Parameter-Free Classification Method with Compressorsπ
- SparseEmbed: Learning Sparse Lexical Representations with Contextual Embeddings for Retrieval π
- Tackling Semantic Search
- Semantic search in Azure Cognitive Search
- How we used semantic search to make our search 10x smarter
- Stanford AI Blog : Building Scalable, Explainable, and Adaptive NLP Models with Retrieval
- Building a semantic search engine with dual space word embeddings
- Billion-scale semantic similarity search with FAISS+SBERT
- Some observations about similarity search thresholds
- Near Duplicate Image Search using Locality Sensitive Hashing
- Free Course on Vector Similarity Search and Faiss
- Comprehensive Guide To Approximate Nearest Neighbors Algorithms
- Introducing the hybrid index to enable keyword-aware semantic search
- Argilla Semantic Search
- Co:here's Multilingual Text Understanding Model
- Simplify Search woth Multilingual Embedding Models
- fastText
- Universal Sentence Encoder
- SBERT
- ELECTRA
- LaBSE
- LASER
- Relevance AI - Vector Platform From Experimentation To Deployment
- Haystack
- Jina.AI
- pinecone
- SentEval Toolkit
- ranx
- BEIR :Benchmarking IR
- RELiC: Retrieving Evidence for Literary Claims Dataset
- matchzoo-py
- deep_text_matching
- Which Frame?
- lexica.art
- emoji semantic search
- PySerini
- BERTSerini
- BERTSimilarity
- milvus
- NeuroNLP++
- weaviate
- semantic-search-through-wikipedia-with-weaviate
- natural-language-youtube-search
- same.energy
- ann benchmarks
- scaNN
- REALM
- annoy
- pynndescent
- nsg
- FALCONN
- redis HNSW
- autofaiss
- DPR
- rank_BM25
- FlashRank
- nearPy
- vearch
- vespa
- PyNNDescent
- pgANN
- Tensorflow Similarity
- opensemanticsearch.org
- GPT3 Semantic Search
- searchy
- txtai
- HyperTag
- vectorai
- embeddinghub
- AquilaDb
- STripNet
- Semantic Text Similarity Dataset Hub
- Facebook AI Image Similarity Challenge
- WIT : Wikipedia-based Image Text Dataset
- BEIR
- MTEB
Have a look at the project board for the task list to contribute to any of the open issues.