# Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE

@article{Chen2021SimplerFS, title={Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE}, author={Junya Chen and Zhe Gan and Xuan Li and Qing Guo and Liqun Chen and Shuyang Gao and Tagyoung Chung and Yi Xu and Belinda Zeng and Wenlian Lu and Fan Li and Lawrence Carin and Chenyang Tao}, journal={ArXiv}, year={2021}, volume={abs/2107.01152} }

InfoNCE-based contrastive representation learners, such as SimCLR [1], have been tremendously successful in recent years. However, these contrastive schemes are notoriously resource demanding, as their effectiveness breaks down with smallbatch training (i.e., the log-K curse, whereas K is the batch-size). In this work, we reveal mathematically why contrastive learners fail in the small-batch-size regime, and present a novel simple, non-trivial contrastive objective named FlatNCE, which fixes… Expand

#### References

SHOWING 1-10 OF 64 REFERENCES

SimCSE: Simple Contrastive Learning of Sentence Embeddings

- Computer Science
- ArXiv
- 2021

This paper describes an unsupervised approach, which takes an input sentence and predicts itself in a contrastive objective, with only standard dropout used as noise, and shows that contrastive learning theoretically regularizes pretrained embeddings’ anisotropic space to be more uniform and it better aligns positive pairs when supervised signals are available. Expand

A Theoretical Analysis of Contrastive Unsupervised Representation Learning

- Computer Science, Mathematics
- ICML
- 2019

This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes and shows that learned representations can reduce (labeled) sample complexity on downstream tasks. Expand

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

- Computer Science
- ArXiv
- 2017

This paper empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization and enable training visual recognition models on internet-scale data with high efficiency. Expand

Learning word embeddings efficiently with noise-contrastive estimation

- Computer Science
- NIPS
- 2013

This work proposes a simple and scalable new approach to learning word embeddings based on training log-bilinear models with noise-contrastive estimation, and achieves results comparable to the best ones reported, using four times less data and more than an order of magnitude less computing time. Expand

Contrastive Representation Learning: A Framework and Review

- Computer Science, Mathematics
- IEEE Access
- 2020

A general Contrastive Representation Learning framework is proposed that simplifies and unifies many different contrastive learning methods and a taxonomy for each of the components is provided in order to summarise and distinguish it from other forms of machine learning. Expand

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

- Computer Science, Mathematics
- ICLR
- 2017

This work investigates the cause for this generalization drop in the large-batch regime and presents numerical evidence that supports the view that large- batch methods tend to converge to sharp minimizers of the training and testing functions - and as is well known, sharp minima lead to poorer generalization. Expand

Importance Weighted Autoencoders

- Computer Science, Mathematics
- ICLR
- 2016

The importance weighted autoencoder (IWAE), a generative model with the same architecture as the VAE, but which uses a strictly tighter log-likelihood lower bound derived from importance weighting, shows empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log- likelihood on density estimation benchmarks. Expand

Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency

- Computer Science, Mathematics
- EMNLP
- 2018

It is shown that the ranking-based variant of NCE gives consistent parameter estimates under weaker assumptions than the classification-based method, which is closely related to negative sampling methods, now widely used in NLP. Expand

Representation Learning with Contrastive Predictive Coding

- Computer Science, Mathematics
- ArXiv
- 2018

This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments. Expand

Wasserstein Dependency Measure for Representation Learning

- Computer Science, Mathematics
- NeurIPS
- 2019

It is empirically demonstrated that mutual information-based representation learning approaches do fail to learn complete representations on a number of designed and real-world tasks, and a practical approximation to this theoretically motivated solution, constructed using Lipschitz constraint techniques from the GAN literature, achieves substantially improved results on tasks where incomplete representations are a major challenge. Expand