to my liking, a lot of people come to me with new pre-training architectures but they want to extract more perf, so i get to see new ideas and suggest kernel level / co-design tricks. around the gpt 4 era when people were going crazy about "the next transformer", i think we are entering a time where randomly new architectures will show up that scale insanely well, and are easier to co-design with due to arch/hw co-design enablement from vibe coding. the part that amuses me is its based on human creativity prompting a very smart model, which is not what some people thought back then. this is such a golden age to live in as a curious young person.