Methodology

Sentence Embeddings

219 papers with code • 0 benchmarks • 11 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Sentence Embeddings

You can find evaluation results in the subtasks. You can also submitting evaluation metrics for this task.

Libraries

Use these libraries to find Sentence Embeddings models and implementations

facebookresearch/InferSent

4 papers

2,279

facebookresearch/SentEval

4 papers

2,049

UKPLab/sentence-transformers

3 papers

13,775

facebookresearch/LASER

3 papers

3,520

See all 8 libraries.

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

UKPLab/sentence-transformers • • IJCNLP 2019

However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10, 000 sentences requires about 50 million inference computations (~65 hours) with BERT.

Paper
Code

Universal Sentence Encoder

facebookresearch/InferSent • • 29 Mar 2018

For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance.

Paper
Code

SimCSE: Simple Contrastive Learning of Sentence Embeddings

princeton-nlp/SimCSE • • EMNLP 2021

This paper presents SimCSE, a simple contrastive learning framework that greatly advances state-of-the-art sentence embeddings.

Paper
Code

Evaluation of sentence embeddings in downstream and linguistic probing tasks

allenai/bilm-tf • • 16 Jun 2018

Despite the fast developmental pace of new sentence embedding methods, it is still challenging to find comprehensive evaluations of these different techniques.

Paper
Code

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

facebookresearch/LASER • • TACL 2019

We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts.

Paper
Code

Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

UKPLab/sentence-transformers • • EMNLP 2020

The training is based on the idea that a translated sentence should be mapped to the same location in the vector space as the original sentence.

Paper
Code

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings

facebookresearch/LASER • • ACL 2019

Machine translation is highly sensitive to the size and quality of the training data, which has led to an increasing interest in collecting and filtering large parallel corpora.

Paper
Code

TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning

UKPLab/sentence-transformers • • 14 Apr 2021

Learning sentence embeddings often requires a large amount of labeled data.

Paper
Code

What you can cram into a single vector: Probing sentence embeddings for linguistic properties

facebookresearch/SentEval • • 3 May 2018

Although much effort has recently been devoted to training high-quality sentence embeddings, we still have a poor understanding of what they are capturing.

Paper
Code

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

facebookresearch/LASER • • EACL 2021

We present an approach based on multilingual sentence embeddings to automatically extract parallel sentences from the content of Wikipedia articles in 85 languages, including several dialects or low-resource languages.

Paper
Code

Sentence Embeddings

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result