Talks

Table of Contents

Below are some of my talks that I’ve delivered on Cohere’s Discord.

ML Efficiency

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection.

SpQR: A Sparse-Quantized Representation for Near-lossless LLM Weight Compression.

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale.

NLP

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints.

Universal and Transferable Adversarial Attacks on Aligned Language Models.

Extending Context Window of Large Language Models via Positional Interpolation.