Hugh Zhang outlines GPT-2's architectural constraints, noting it is 1,000 times smaller than DeepSeek V4 and severely undertrained · Digg