Skip to main content

Showing 1–21 of 21 results for author: Touvron, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.15613  [pdf, other

    cs.LG cs.AI cs.CV

    Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

    Authors: Huy V. Vo, Vasil Khalidov, Timothée Darcet, Théo Moutakanni, Nikita Smetanin, Marc Szafraniec, Hugo Touvron, Camille Couprie, Maxime Oquab, Armand Joulin, Hervé Jégou, Patrick Labatut, Piotr Bojanowski

    Abstract: Self-supervised features are the cornerstone of modern machine learning systems. They are typically pre-trained on data collections whose construction and curation typically require extensive human effort. This manual process has some limitations similar to those encountered in supervised learning, e.g., the crowd-sourced selection of data is costly and time-consuming, preventing scaling the datas… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  2. arXiv:2308.12950  [pdf, other

    cs.CL

    Code Llama: Open Foundation Models for Code

    Authors: Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom , et al. (1 additional authors not shown)

    Abstract: We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama… ▽ More

    Submitted 31 January, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

  3. arXiv:2307.09288  [pdf, other

    cs.CL cs.AI

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Authors: Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini , et al. (43 additional authors not shown)

    Abstract: In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be… ▽ More

    Submitted 19 July, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

  4. arXiv:2302.13971  [pdf, other

    cs.CL

    LLaMA: Open and Efficient Foundation Language Models

    Authors: Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample

    Abstract: We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is co… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  5. arXiv:2212.04884  [pdf, other

    cs.CV

    Co-training $2^L$ Submodels for Visual Recognition

    Authors: Hugo Touvron, Matthieu Cord, Maxime Oquab, Piotr Bojanowski, Jakob Verbeek, Hervé Jégou

    Abstract: We introduce submodel co-training, a regularization method related to co-training, self-distillation and stochastic depth. Given a neural network to be trained, for each sample we implicitly instantiate two altered networks, ``submodels'', with stochastic depth: we activate only a subset of the layers. Each network serves as a soft teacher to the other, by providing a loss that complements the reg… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

  6. arXiv:2204.07118  [pdf, other

    cs.CV

    DeiT III: Revenge of the ViT

    Authors: Hugo Touvron, Matthieu Cord, Hervé Jégou

    Abstract: A Vision Transformer (ViT) is a simple neural architecture amenable to serve several computer vision tasks. It has limited built-in architectural priors, in contrast to more recent architectures that incorporate priors either about the input data or of specific tasks. Recent works show that ViTs benefit from self-supervised pre-training, in particular BerT-like pre-training like BeiT. In this pape… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  7. arXiv:2203.09795  [pdf, other

    cs.CV

    Three things everyone should know about Vision Transformers

    Authors: Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Jakob Verbeek, Hervé Jégou

    Abstract: After their initial success in natural language processing, transformer architectures have rapidly gained traction in computer vision, providing state-of-the-art results for tasks such as image classification, detection, segmentation, and video analysis. We offer three insights based on simple and easy to implement variants of vision transformers. (1) The residual layers of vision transformers, wh… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

  8. arXiv:2112.13692  [pdf, other

    cs.CV

    Augmenting Convolutional networks with attention-based aggregation

    Authors: Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Piotr Bojanowski, Armand Joulin, Gabriel Synnaeve, Hervé Jégou

    Abstract: We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning. We replace the final average pooling by an attention-based aggregation layer akin to a single transformer block, that weights how the patches are involved in the classification decision. We plug this learned aggregation layer with a simplistic patch-based convolutional network parame… ▽ More

    Submitted 27 December, 2021; originally announced December 2021.

  9. arXiv:2112.10740  [pdf, other

    cs.CV

    Are Large-scale Datasets Necessary for Self-Supervised Pre-training?

    Authors: Alaaeldin El-Nouby, Gautier Izacard, Hugo Touvron, Ivan Laptev, Hervé Jegou, Edouard Grave

    Abstract: Pre-training models on large scale datasets, like ImageNet, is a standard practice in computer vision. This paradigm is especially effective for tasks with small training sets, for which high-capacity models tend to overfit. In this work, we consider a self-supervised pre-training scenario that only leverages the target task data. We consider datasets, like Stanford Cars, Sketch or COCO, which are… ▽ More

    Submitted 20 December, 2021; originally announced December 2021.

  10. arXiv:2110.00476  [pdf, other

    cs.CV cs.LG

    ResNet strikes back: An improved training procedure in timm

    Authors: Ross Wightman, Hugo Touvron, Hervé Jégou

    Abstract: The influential Residual Networks designed by He et al. remain the gold-standard architecture in numerous scientific publications. They typically serve as the default architecture in studies, or as baselines when new architectures are proposed. Yet there has been significant progress on best practices for training neural networks since the inception of the ResNet architecture in 2015. Novel optimi… ▽ More

    Submitted 1 October, 2021; originally announced October 2021.

  11. arXiv:2106.09681  [pdf, other

    cs.CV cs.LG

    XCiT: Cross-Covariance Image Transformers

    Authors: Alaaeldin El-Nouby, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Hervé Jegou

    Abstract: Following their success in natural language processing, transformers have recently shown much promise for computer vision. The self-attention operation underlying transformers yields global interactions between all tokens ,i.e. words or image patches, and enables flexible modelling of image data beyond the local interactions of convolutions. This flexibility, however, comes with a quadratic comple… ▽ More

    Submitted 18 June, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

  12. arXiv:2105.03404  [pdf, other

    cs.CV

    ResMLP: Feedforward networks for image classification with data-efficient training

    Authors: Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, Hervé Jégou

    Abstract: We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using hea… ▽ More

    Submitted 10 June, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

  13. arXiv:2104.14294  [pdf, other

    cs.CV

    Emerging Properties in Self-Supervised Vision Transformers

    Authors: Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin

    Abstract: In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the fact that adapting self-supervised methods to this architecture works particularly well, we make the following observations: first, self-supervised ViT features contain explicit information about the semantic segmentatio… ▽ More

    Submitted 24 May, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: 21 pages

  14. arXiv:2104.01136  [pdf, other

    cs.CV

    LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

    Authors: Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze

    Abstract: We design a family of image classification architectures that optimize the trade-off between accuracy and efficiency in a high-speed regime. Our work exploits recent findings in attention-based architectures, which are competitive on highly parallel processing hardware. We revisit principles from the extensive literature on convolutional neural networks to apply them to transformers, in particular… ▽ More

    Submitted 6 May, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

  15. arXiv:2103.17239  [pdf, other

    cs.CV

    Going deeper with Image Transformers

    Authors: Hugo Touvron, Matthieu Cord, Alexandre Sablayrolles, Gabriel Synnaeve, Hervé Jégou

    Abstract: Transformers have been recently adapted for large scale image classification, achieving high scores shaking up the long supremacy of convolutional neural networks. However the optimization of image transformers has been little studied so far. In this work, we build and optimize deeper transformer networks for image classification. In particular, we investigate the interplay of architecture and opt… ▽ More

    Submitted 7 April, 2021; v1 submitted 31 March, 2021; originally announced March 2021.

  16. arXiv:2103.10697  [pdf, other

    cs.CV cs.LG stat.ML

    ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases

    Authors: Stéphane d'Ascoli, Hugo Touvron, Matthew Leavitt, Ari Morcos, Giulio Biroli, Levent Sagun

    Abstract: Convolutional architectures have proven extremely successful for vision tasks. Their hard inductive biases enable sample-efficient learning, but come at the cost of a potentially lower performance ceiling. Vision Transformers (ViTs) rely on more flexible self-attention layers, and have recently outperformed CNNs for image classification. However, they require costly pre-training on large external… ▽ More

    Submitted 10 June, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

  17. arXiv:2012.12877  [pdf, other

    cs.CV

    Training data-efficient image transformers & distillation through attention

    Authors: Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou

    Abstract: Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. However, these visual transformers are pre-trained with hundreds of millions of images using an expensive infrastructure, thereby limiting their adoption. In this work, we produce a competitive convolution-free transformer by training on Imagenet only. We train them o… ▽ More

    Submitted 15 January, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

  18. arXiv:2011.12982  [pdf, other

    cs.CV

    Grafit: Learning fine-grained image representations with coarse labels

    Authors: Hugo Touvron, Alexandre Sablayrolles, Matthijs Douze, Matthieu Cord, Hervé Jégou

    Abstract: This paper tackles the problem of learning a finer representation than the one provided by training labels. This enables fine-grained category retrieval of images in a collection annotated with coarse labels only. Our network is learned with a nearest-neighbor classifier objective, and an instance loss inspired by self-supervised learning. By jointly leveraging the coarse labels and the underlyi… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

  19. arXiv:2008.05763  [pdf, other

    cs.CV eess.IV

    Powers of layers for image-to-image translation

    Authors: Hugo Touvron, Matthijs Douze, Matthieu Cord, Hervé Jégou

    Abstract: We propose a simple architecture to address unpaired image-to-image translation tasks: style or class transfer, denoising, deblurring, deblocking, etc. We start from an image autoencoder architecture with fixed weights. For each task we learn a residual block operating in the latent space, which is iteratively called until the target domain is reached. A specific training schedule is required to a… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

  20. arXiv:2003.08237  [pdf, other

    cs.CV cs.LG

    Fixing the train-test resolution discrepancy: FixEfficientNet

    Authors: Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou

    Abstract: This paper provides an extensive analysis of the performance of the EfficientNet image classifiers with several recent training procedures, in particular one that corrects the discrepancy between train and test images. The resulting network, called FixEfficientNet, significantly outperforms the initial architecture with the same number of parameters. For instance, our FixEfficientNet-B0 trained… ▽ More

    Submitted 18 November, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

  21. arXiv:1906.06423  [pdf, other

    cs.CV cs.LG

    Fixing the train-test resolution discrepancy

    Authors: Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou

    Abstract: Data-augmentation is key to the training of neural networks for image classification. This paper first shows that existing augmentations induce a significant discrepancy between the typical size of the objects seen by the classifier at train and test time. We experimentally validate that, for a target test resolution, using a lower train resolution offers better classification at test time. We t… ▽ More

    Submitted 20 January, 2022; v1 submitted 14 June, 2019; originally announced June 2019.