Honestly, the reinforcement training doesn’t seem all that impressive… it’s clearly just a distillation of GPT4, seems like they just figured out a nice optimization. It doesn’t strike me as something the industry can’t sort out
Honestly, the reinforcement training doesn’t seem all that impressive… it’s clearly just a distillation of GPT4, seems like they just figured out a nice optimization. It doesn’t strike me as something the industry can’t sort out