> this would be very easy for architectural experiments; one could just copy the architecture and hparams… is bro just ignoring that GLM-5 is obviously derived from V3(.2) and K2? We can trace the evolution of Chinese LLMs. Yeah I don't know what OAI experiments buy them
some quick thoughts on knowledge transfer between labs
1) artificial analysis has found that glm-5.2 is the third best model available today on their professional knowledge work benchmark (> gpt-5.5)
2) it's interesting to consider how easy it would be for zhipu to catch up to an american lab if it had knowledge of its experimental results
3) epoch ai estimated that 10% of openai's 2024 r&d compute spend went to final training runs and the remaining 90% went to experiments
4) we don't know the breakdown of this experimental compute; but we can perhaps think about it as some mix of data pipeline experiments and architecture experiments
5) one important question is how transferrable each of these are; like if you had knowledge of the final experimental results how easily could you copy the final model
6) it feels like this would be very easy for architectural experiments; one could just copy the architecture and hparams wholesale with little care for the underlying experiments
7) data experiments feel like they could be a bit more challenging though, since you will not have the exact same data that the lab you are copying has
8) so, even though you know the pipeline, you don't actually have the data that the pipeline is built around and so might not be able to replicate the result
9) but, this feels like it depends on how important the data itself is and how important the selection, filtering and generation concepts are
10) evidence on this goes both ways: labs spend a lot of money on data, but anthropic researchers are very synthetic data and synthetic environment pilled
11) it's worth noting that this same value logic would apply to researchers being traded between us labs;
12) so like, noam shazeer may be worth more to openai than someone that primarily does data experiments bec architecture is more transferrable
13) note, obviously, distillation is another way to catchup using someone else's model and that may solve the data mix transfer problem
14) and, the ultimate answer is the aligned automated ai researcher, which will never leak your secrets, and which enables you to hold ip your employees can't take
