BLISS Speaker Series Episode #17

BLISS is excited to feature Timothée Darcet, PhD student at Meta AI and Inria, who will deliver a 45-minute talk titled "CAPI: Cluster and Predict Latent Patches for Improved Masked Image Modeling." After the talk, attendees are invited to connect with fellow AI enthusiasts, exchange ideas and questions, and enjoy complimentary drinks. Please note that doors will close promptly at 7:15 PM, so arriving early is highly encouraged.

RSVP on Meetup is strictly required to guarantee entry. While Meetup has recently been actively promoting its Plus program, purchasing it is not necessary—both the platform and all BLISS events remain completely free.

Abstract:
Masked Image Modeling (MIM) offers a promising approach to self-supervised representation learning, yet existing MIM models continue to trail behind state-of-the-art methods. In this talk, Timothée Darcet will present CAPI, a novel pure-MIM framework that systematically rethinks target representations, loss functions, and architectures. CAPI relies on the prediction of latent clusterings using a clustering-based loss that is stable to train and demonstrates promising scaling behavior. Using a ViT-L backbone, CAPI achieves 83.8% accuracy on ImageNet and 32.1% mIoU on ADE20K with simple linear probes—substantially outperforming prior MIM approaches and approaching the performance of the current state-of-the-art, DINOv2.

TU Berlin

Straße des 17. Juni 135, 10623 Berlin

Register here