WideNet.
WideNet in PyTorch.
.
Switch Transformers.
Switch Transformer in PyTorch with (optional) aux loss for each layer, configurable number of experts and expert capacity, aux loss free load balancing supported.
.