Selected Publications

CWM: An Open-Weights LLM for Research on Code Generation with World Models

CWM Team (Core Team, Infra TL)

Efficient Hardware Scaling and Diminishing Returns in Large-Scale Training of Language Models

Jared Fernandez, Luca Wehrstedt, Leonid Shamis, Mostafa Elhoushi, Kalyan Saladi, Yonatan Bisk, Emma Strubell, Jacob Kahn

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

C. Zhou, L. Yu, A. Babu, K. Tirumala, M. Yasunaga, L. Shamis, J. Kahn, X. Ma, L. Zettlemoyer, O. Levy

Chameleon: Mixed-modal Early Fusion Foundation Models

Chameleon Team (Scaling Lead)

Flashlight: Enabling Innovation in Tools for Machine Learning

J. Kahn, V. Pratap, T. Likhomanenko, Q. Xu, A. Hannun, J. Cai, P. Tomasello, A. Lee, E. Grave, G. Avidov, B. Steiner, V. Liptchinsky, G. Synnaeve, R. Collobert

slimIPL: Language-Model-Free Iterative Pseudo-Labeling

Tatiana Likhomanenko*, Qiantong Xu*, Jacob Kahn, Gabriel Synnaeve, Ronan Collobert

Libri-Light: A Benchmark for ASR with Limited or No Supervision

J. Kahn*, M. Rivière*, W. Zheng*, E. Kharitonov*, Q. Xu*, P.E. Mazaré*, J. Karadayi*, V. Liptchinsky, R. Collobert, C. Fuegen, T. Likhomanenko, G. Synnaeve, A. Joulin, A. Mohamed, E. Dupoux

Self-Training for End-to-End Speech Recognition

Jacob Kahn, Ann Lee, Awni Hannun

Differentiable Weighted Finite-State Transducers

Awni Hannun, Vineel Pratap, Jacob Kahn, Wei-Ning Hsu

End-to-End ASR: from Supervised to Semi-Supervised Learning with Modern Architectures

Gabriel Synnaeve*, Qiantong Xu*, Jacob Kahn*, Edouard Grave*, Tatiana Likhomanenko, Vineel Pratap, Anuroop Sriram, Vitaliy Liptchinsky, Ronan Collobert*

wav2Letter++: A Fast Open-source Speech Recognition System

Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert