Sentence Embeddings
219 papers with code • 0 benchmarks • 11 datasets
Benchmarks
These leaderboards are used to track progress in Sentence Embeddings
Libraries
Use these libraries to find Sentence Embeddings models and implementationsDatasets
Subtasks
Most implemented papers
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10, 000 sentences requires about 50 million inference computations (~65 hours) with BERT.
Universal Sentence Encoder
For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance.
SimCSE: Simple Contrastive Learning of Sentence Embeddings
This paper presents SimCSE, a simple contrastive learning framework that greatly advances state-of-the-art sentence embeddings.
Evaluation of sentence embeddings in downstream and linguistic probing tasks
Despite the fast developmental pace of new sentence embedding methods, it is still challenging to find comprehensive evaluations of these different techniques.
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts.
Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
The training is based on the idea that a translated sentence should be mapped to the same location in the vector space as the original sentence.
Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings
Machine translation is highly sensitive to the size and quality of the training data, which has led to an increasing interest in collecting and filtering large parallel corpora.
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning
Learning sentence embeddings often requires a large amount of labeled data.
What you can cram into a single vector: Probing sentence embeddings for linguistic properties
Although much effort has recently been devoted to training high-quality sentence embeddings, we still have a poor understanding of what they are capturing.
WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia
We present an approach based on multilingual sentence embeddings to automatically extract parallel sentences from the content of Wikipedia articles in 85 languages, including several dialects or low-resource languages.