大模型能力排名

更新时间:2026-06-10

模型 总分 专业能力 高难度提示词 编程 数学 创意写作 指令遵循 长文本
claude-opus-4-6-thinking 1 1 1 1 2 1 1 1
claude-opus-4-7-thinking 2 6 3 2 6 2 2 3
claude-opus-4-6 3 2 2 5 4 6 3 2
claude-opus-4-7 4 3 4 3 8 4 5 4
muse-spark 5 33 8 9 27 10 19 25
gemini-3.1-pro-preview 6 8 7 10 5 5 7 7
gemini-3-pro 7 16 9 17 19 3 13 12
claude-opus-4-8-thinking 8 18 5 4 16 7 4 5
gpt-5.5-high 9 4 12 14 9 19 8 14
gpt-5.4-high 10 5 11 15 3 26 10 16
claude-opus-4-8 11 7 6 6 11 17 11 6
gemini-3.5-flash 12 9 22 34 1 9 23 21
gpt-5.2-chat-latest-20260210 13 27 16 18 38 41 28 29
glm-5.1 14 12 13 8 18 13 16 13
grok-4.20-beta1 15 49 29 32 40 12 36 39
gpt-5.5 16 15 19 40 10 21 15 23
qwen3.7-max-preview 17 11 17 11 7 31 18 10
gemini-3-flash 18 24 20 31 20 14 26 24
grok-4.20-beta-0309-reasoning 19 43 23 25 24 25 37 38
claude-opus-4-5-20251101-thinking-32k 20 13 14 7 23 8 6 9
grok-4.20-multi-agent-beta-0309 21 32 30 19 41 27 40 41
gpt-5.5-instant 22 45 24 26 30 15 30 30
claude-sonnet-4-6 23 14 10 12 31 20 9 8
ernie-5.1 24 40 25 20 17 35 24 34
claude-opus-4-5-20251101 25 17 15 13 29 11 12 11
grok-4.1-thinking 26 53 37 44 47 45 56 54
gpt-5.4 27 22 26 27 34 38 22 22
qwen3.5-max-preview 28 21 21 24 26 24 17 20
mimo-v2.5-pro 29 10 18 21 14 34 14 15
kimi-k2.6 30 20 27 29 13 39 27 26
gemini-3-flash (thinking-minimal) 31 55 39 51 35 22 42 40
qwen3.6-max-preview 32 23 32 33 15 37 35 31
grok-4.1 33 68 43 53 66 40 57 52
deepseek-v4-pro-thinking 34 46 35 48 21 30 33 27
glm-5 35 30 36 47 56 18 39 33
deepseek-v4-pro 36 44 40 39 59 33 32 32
dola-seed-2.0-pro 37 39 33 28 42 67 51 51
claude-sonnet-4-5-20250929-thinking-32k 38 19 28 16 37 23 20 17
claude-sonnet-4-5-20250929 39 29 31 22 74 16 21 19
gpt-5.1-high 40 38 42 54 36 43 38 42
gemma-4-31b 41 37 44 45 25 50 31 36
kimi-k2.5-thinking 42 35 46 37 22 49 44 43
minimax-m3 43 28 49 30 12 62 34 45
ernie-5.0-preview-1203 44 72 52 77 104 51 68 78
gpt-5.4-mini-high 45 31 45 42 51 71 49 53
mimo-v2-pro 46 25 41 38 39 46 41 35
claude-opus-4-1-20250805-thinking-16k 47 36 34 23 48 28 25 18
gpt-5.3-chat-latest 48 52 47 46 77 65 53 47
ernie-5.0-0110 49 81 51 52 60 47 64 73
claude-opus-4-1-20250805 50 58 38 35 65 32 29 28
grok-4.3 51 82 61 56 80 42 71 59
gemini-2.5-pro 52 60 62 86 50 29 45 44
gpt-4.5-preview-2025-02-27 53 102 86 95 102 36 48 66
qwen3.6-plus 54 47 48 50 33 64 47 48
qwen3.5-397b-a17b 55 34 50 49 44 55 52 46
chatgpt-4o-latest-20250326 56 101 65 80 110 48 60 72
glm-4.7 57 79 53 60 70 63 59 49
gpt-5.1 58 66 67 74 79 57 61 65
gemma-4-26b-a4b 59 48 55 64 28 69 46 55
gpt-5.2-high 60 42 58 55 32 92 62 79
deepseek-v4-flash-thinking 61 54 64 70 54 61 54 56
gpt-5.2 62 56 57 61 63 86 65 69
qwen3-max-preview 63 51 59 63 57 80 63 64
longcat-flash-chat-2602-exp 64 57 60 41 62 89 80 77
deepseek-v4-flash 65 62 63 65 58 60 58 57
gpt-5-high 66 65 78 82 64 110 89 112
gemini-3.1-flash-lite-preview 67 77 81 100 55 54 87 86
mimo-v2.5 68 41 54 57 52 74 55 50
kimi-k2.5-instant 69 67 56 36 49 87 50 61
o3-2025-04-16 70 85 88 97 45 99 100 116
grok-4-1-fast-reasoning 71 87 84 91 86 53 104 95
kimi-k2-thinking-turbo 72 61 69 59 53 83 73 82
amazon-nova-experimental-chat-26-02-10 73 26 70 58 71 145 72 88
mimo-v2-omni 74 63 68 62 43 98 81 60
gpt-5-chat 75 75 71 89 94 84 74 75
glm-4.6 76 86 83 94 82 68 79 80
mistral-medium-3.5 77 90 82 78 46 91 78 84
deepseek-v3.2-exp-thinking 78 76 77 72 72 78 76 85
deepseek-v3.2 79 74 76 79 69 72 66 71
claude-opus-4-20250514-thinking-16k 80 80 66 43 83 44 43 37
qwen3-max-2025-09-23 81 113 73 71 68 79 82 81
qwen3-235b-a22b-instruct-2507 82 71 72 75 84 105 77 76
deepseek-v3.2-exp 83 105 74 84 87 56 75 68
deepseek-v3.2-thinking 84 69 80 69 76 85 69 67
deepseek-r1-0528 85 111 94 87 124 81 114 118
grok-4-fast-chat 86 107 95 96 78 94 106 98
ernie-5.0-preview-1022 87 92 105 132 118 59 105 100
nvidia-nemotron-3-ultra-550b-a55b-nvfp4 88 64 98 108 95 108 90
kimi-k2-0905-preview 89 116 92 81 89 101 115 123
deepseek-v3.1 90 103 97 112 91 90 97 99
deepseek-v3.1-terminus-thinking 91 79 90 100 96 67 63
kimi-k2-0711-preview 92 120 101 93 134 115 135 132
deepseek-v3.1-thinking 93 99 91 102 92 66 70 62
qwen3.5-122b-a10b 94 83 99 99 81 118 92 106
amazon-nova-experimental-chat-26-01-10 95 50 75 66 101 131 98 96
deepseek-v3.1-terminus 96 109 123 126 58 112 102
qwen3-vl-235b-a22b-instruct 97 73 87 85 99 123 84 93
mistral-large-3 98 112 100 83 115 109 96 107
minimax-m2.7 99 84 89 68 96 126 90 83
hunyuan-hy3-preview 100 78 85 92 75 122 95 89

数据来源:LMSYS Chatbot Arena (lmarena.ai) © Open-source research project by LMSYS Org.