Tuesday, 2 June 2026

JetBrains Releases Mellum2

JetBrains (the home of IntelliJ and PyCharm) has released its Mellum2 Mixture-of-Experts coding model, of 12B parameters. 

The model is available under the Apache 2.0 license. 

The model has been published on Hugging Face and can be run locally.

A mixture-of-experts model works via a "gating network" which delegates work to smaller neural networks, the "experts", optimising overall performance. This model also leads to "sparse activation" - which means of all the possible parameters utilised by the model, only a subset are used per input.

Training an MoE model requires training the gating network and training the various "experts".

No comments: