Logo made by @createdbytango.
Looking for More Paper Additions. PS: Raise a PR
Following repository aims to serve a meta-repository for Semantic Search and Semantic Similarity related tasks.
Semantic Search isn't limited to text! It can be done with images, speech, etc.There are numerous different use-cases and applications of semantic search.
Feel free to raise a PR on this repo!
- Bag of Tricks for Efficient Text Classification 📄
- Enriching Word Vectors with Subword Information 📄
- Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
- On Approximately Searching for Similar Word Embeddings
- Learning Distributed Representations of Sentences from Unlabelled Data📄
- Approximate Nearest Neighbor Search on High Dimensional Data --- Experiments, Analyses, and Improvement
- Supervised Learning of Universal Sentence Representations from Natural Language Inference Data 📄
- Semantic Textual Similarity For Hindi📄
- Efficient Natural Language Response Suggestion for Smart Reply📃
- Universal Sentence Encoder 📄
- Learning Semantic Textual Similarity from Conversations 📄
- Google AI Blog: Advances in Semantic Textual Similarity 📄
- Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech)🔊
- Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data 🔊
- Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph
- The Case for Learned Index Structures
- LASER: Language Agnostic Sentence Representations 📄
- Document Expansion by Query Prediction 📄
- Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks 📄
- Multi-Stage Document Ranking with BERT 📄
- Latent Retrieval for Weakly Supervised Open Domain Question Answering
- End-to-End Open-Domain Question Answering with BERTserini
- BioBERT: a pre-trained biomedical language representation model for biomedical text mining📄
- Analyzing and Improving Representations with the Soft Nearest Neighbor Loss📷
- DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node
- Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned 📄
- PASSAGE RE-RANKING WITH BERT 📄
- CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization 📄
- LaBSE:Language-agnostic BERT Sentence Embedding 📄
- Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset 📄
- DeText: A deep NLP framework for intelligent text understanding 📄
- Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation 📄
- Pretrained Transformers for Text Ranking: BERT and Beyond 📄
- REALM: Retrieval-Augmented Language Model Pre-Training
- ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS📄
- Improving Deep Learning For Airbnb Search
- Managing Diversity in Airbnb Search📄
- Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval📄
- Unsupervised Image Style Embeddings for Retrieval and Recognition Tasks📷
- DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations📄
- Hybrid approach for semantic similarity calculation between Tamil words 📄
- Augmented SBERT 📄
- BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models 📄
- Compatibility-aware Heterogeneous Visual Search 📷
- Learning Personal Style from Few Examples📷
- TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning📄
- A Survey of Transformers📄📷
- SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking📄
- High Quality Related Search Query Suggestions using Deep Reinforcement Learning
- Embedding-based Product Retrieval in Taobao Search📄📷
- TPRM: A Topic-based Personalized Ranking Model for Web Search📄
- mMARCO: A Multilingual Version of MS MARCO Passage Ranking Dataset📄
- Database Reasoning Over Text📄
- How Does Adversarial Fine-Tuning Benefit BERT?)📄
- Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation📄
- Primer: Searching for Efficient Transformers for Language Modeling📄
- How Familiar Does That Sound? Cross-Lingual Representational Similarity Analysis of Acoustic Word Embeddings🔊
- SimCSE: Simple Contrastive Learning of Sentence Embeddings📄
- Compositional Attention: Disentangling Search and Retrieval📄📷
- SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search
- GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval 📄
- Generative Search Engines: Initial Experiments 📷
- Rethinking Search: Making Domain Experts out of Dilettantes -WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach
- Text and Code Embeddings by Contrastive Pre-Training📄
- RELIC: Retrieving Evidence for Literary Claims📄
- Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations📄
- SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation🔊
- An Analysis of Fusion Functions for Hybrid Retrieval📄
- Out-of-distribution Detection with Deep Nearest Neighbors
- ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition🔊
- Analyzing Acoustic Word Embeddings From Pre-Trained Self-Supervised Speech Models)🔊
- Rethinking with Retrieval: Faithful Large Language Model Inference📄
- Precise Zero-Shot Dense Retrieval without Relevance Labels📄
- Transformer Memory as a Differentiable Search Index📄
- FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Search📄
- “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors📄
- SparseEmbed: Learning Sparse Lexical Representations with Contextual Embeddings for Retrieval 📄
- Tackling Semantic Search
- Semantic search in Azure Cognitive Search
- How we used semantic search to make our search 10x smarter
- Stanford AI Blog : Building Scalable, Explainable, and Adaptive NLP Models with Retrieval
- Building a semantic search engine with dual space word embeddings
- Billion-scale semantic similarity search with FAISS+SBERT
- Some observations about similarity search thresholds
- Near Duplicate Image Search using Locality Sensitive Hashing
- Free Course on Vector Similarity Search and Faiss
- Comprehensive Guide To Approximate Nearest Neighbors Algorithms
- Introducing the hybrid index to enable keyword-aware semantic search
- Argilla Semantic Search
- Co:here's Multilingual Text Understanding Model
- Simplify Search woth Multilingual Embedding Models
- fastText
- Universal Sentence Encoder
- SBERT
- ELECTRA
- LaBSE
- LASER
- Relevance AI - Vector Platform From Experimentation To Deployment
- Haystack
- Jina.AI
- pinecone
- SentEval Toolkit
- ranx
- BEIR :Benchmarking IR
- RELiC: Retrieving Evidence for Literary Claims Dataset
- matchzoo-py
- deep_text_matching
- Which Frame?
- lexica.art
- emoji semantic search
- PySerini
- BERTSerini
- BERTSimilarity
- milvus
- NeuroNLP++
- weaviate
- semantic-search-through-wikipedia-with-weaviate
- natural-language-youtube-search
- same.energy
- ann benchmarks
- scaNN
- REALM
- annoy
- pynndescent
- nsg
- FALCONN
- redis HNSW
- autofaiss
- DPR
- rank_BM25
- FlashRank
- nearPy
- vearch
- vespa
- PyNNDescent
- pgANN
- Tensorflow Similarity
- opensemanticsearch.org
- GPT3 Semantic Search
- searchy
- txtai
- HyperTag
- vectorai
- embeddinghub
- AquilaDb
- STripNet
- Semantic Text Similarity Dataset Hub
- Facebook AI Image Similarity Challenge
- WIT : Wikipedia-based Image Text Dataset
- BEIR
- MTEB
Have a look at the project board for the task list to contribute to any of the open issues.