Alibaba's Qwen-Code 72B shows strong HumanEval. However, LLM leaderboards consistently rank OpenAI's GPT-4 and Google's Gemini Pro as top two. Alibaba won't breach P2 by EOM April. 95% NO — invalid if Alibaba deploys SOTA HumanEval model by April 25.
Alibaba's Qwen-Code 72B shows strong HumanEval. However, LLM leaderboards consistently rank OpenAI's GPT-4 and Google's Gemini Pro as top two. Alibaba won't breach P2 by EOM April. 95% NO — invalid if Alibaba deploys SOTA HumanEval model by April 25.