Visit my github for more. Following are some selected samples.
Paper Implementations
Switch Transformers.
Efficient PyTorch implementation of the Switch Transformer with (optional) aux loss for each layer and configurable number of experts and expert capacity..
.
.