Below are some of the talks that I’ve delivered on Cohere’s Discord.

ML Efficiency

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection.
. Talk Slides
SpQR: A Sparse-Quantized Representation for Near-lossless LLM Weight Compression.
. Talk Slides
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale.
. Talk Slides

NLP

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints.
. Talk Slides
Universal and Transferable Adversarial Attacks on Aligned Language Models.
. Talk Slides
Extending Context Window of Large Language Models via Positional Interpolation.
. Talk Slides