JetBrains (the home of IntelliJ and PyCharm) has released its Mellum2 Mixture-of-Experts coding model, of 12B parameters.
The model is available under the Apache 2.0 license.
The model has been published on Hugging Face and can be run locally.
A mixture-of-experts model works via a "gating network" which delegates work to smaller neural networks, the "experts", optimising overall performance. This model also leads to "sparse activation" - which means of all the possible parameters utilised by the model, only a subset are used per input.
Training an MoE model requires training the gating network and training the various "experts".