Recurrentgemma: Moving Past Transformers For Efficient Open Language Models

Botev Aleksandar, De Soham, Smith Samuel L, Fernando Anushan, Muraru George-cristian, Haroun Ruba, Berrada Leonard, Pascanu Razvan, Sessa Pier Giuseppe, Dadashi Robert, Hussenot Léonard, Ferret Johan, Girgin Sertan, Bachem Olivier, Andreev Alek, Kenealy Kathleen, Mesnard Thomas, Hardin Cassidy, Bhupatiraju Surya, Pathak Shreya, Sifre Laurent, Rivière Morgane, Kale Mihir Sanjay, Love Juliette, Tafti Pouya, Joulin Armand, Fiedel Noah, Senter Evan, Chen Yutian, Srinivasan Srivatsan, Desjardins Guillaume, Budden David, Doucet Arnaud, Vikram Sharad, Paszke Adam, Gale Trevor, Borgeaud Sebastian, Chen Charlie, Brock Andy, Paterson Antonia, Brennan Jenny, Risdal Meg, Gundluru Raj, Devanathan Nesh, Mooney Paul, Chauhan Nilay, Culliton Phil, Martins Luiz Gustavo, Bandy Elisa, Huntsperger David, Cameron Glenn, Zucker Arthur, Warkentin Tris, Peran Ludovic, Giang Minh, Ghahramani Zoubin, Farabet Clément, Kavukcuoglu Koray, Hassabis Demis, Hadsell Raia, Teh Yee Whye, De Frietas Nando. Arxiv 2024

[Paper]
Attention Mechanism Model Architecture Pretraining Methods Reinforcement Learning Transformer

We introduce RecurrentGemma, a family of open language models which uses Google’s novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.

The Large Language Model Bible

Recurrentgemma: Moving Past Transformers For Efficient Open Language Models

Similar Work