Languages have a ton of overlap, a language model trained on all repos would in principle be stronger than one trained on a small subset language.
Languages have a ton of overlap, a language model trained on all repos would in principle be stronger than one trained on a small subset language.