We gave 4 frontier coding agents the same hard environment-gen test:
Preserve the desert landscape. Surgically remove only the ruins. Then build a dense, believable Middle Eastern village from scratch using rustic assets.
Which model did the best job? 🏜️👇
Poll and full prompt in thread.
