網易首頁 > 網易號 > 正文申請入駐

智譜創始人唐杰透露：原生多模態模型將在數月內上線

2026-05-14 21:09:33　來源: AI前線

北京舉報

分享至

編輯 | 四月

智譜的原生多模態模型，到底什么時候來？

最近這個問題，在 X 上被直接推到了智譜創始人、清華大學教授唐杰面前。在今年 1 月的公開活動上，唐杰曾談到，大模型如何把視覺、聲音、觸覺等多模態信息統一感知，也就是實現“原生多模態模型”，仍然是一個短板。

現在，他給出的答案是：數月內上線。

多模態對于智譜有多重要？

智譜是中國大模型牌桌上的頭部玩家，路透社將其稱之為 leading player。

今年 1 月，智譜登陸港股。上市以來股價從發行價 116.2 港元，一路漲到目前的約 1090 港元（今日最新收盤價），四個月漲幅 9 倍多，昨日市值更是突破 5000 億港元；同期上市的 MiniMax 市值超過 2600 億港元，約為智譜的一半。

資本市場給出如此高的敘事溢價，正在等待的也是智譜拼上最后一塊關鍵拼圖。GLM-5 發布后，智譜在 Coding 和 Long-running agent tasks（長時程代理任務）上發力，開源生態穩居全球第一梯隊，但在多模態，尤其是原生多模態上，確實還需要給外界一個更明確的答案。

這個答案有多重要？看看現在的格局就知道：Kimi 今年 1 月底發布的 K2.5 已經是原生多模態架構；阿里 Qwen3.5-Omni 3 月上線，基于超過 1 億小時音視頻數據端到端預訓練；GPT-4o 更是在去年 4 月就完成了原生多模態架構落地。

多模態的理解與構建，已成為頭部模型拉開差距的最關鍵維度。

唐杰在推文里說清了底層邏輯：感知環境是完成長任務的基礎，多模態不是功能附加，而是 Agent 真正落地的前提。

因此，補齊多模態不僅是支撐下一段資本敘事的必要條件，更是智譜走通技術路線閉環的必經之路。

數月之后，GLM 的旗艦模型會變成什么樣子，現在有了第一個公開的時間坐標。

不止是智譜創始人

更重要的是，在給出“數月內上線”承諾的同時，唐杰還發了一篇長文，拋出了他對 AI 下一戰場的核心判斷：今年最可能突破的方向，不是單輪問答，也不是簡單代碼生成，而是 long-horizon tasks（長時程任務）。

這也印證了，智譜補齊得原生多模態，絕非單純為了增加“看圖、生成視頻”的功能，而是將其作為 Agent 感知環境的觸角，最終服務于長任務執行。

這篇文章不是“談 AGI”的泛泛感想，而是智譜創始人在關鍵節點對大模型下半場的公開劇透。

唐杰還是誰？

他的頭銜和成就，可能遠比你熟知的要重得多。他是清華大學計算機系教授，是人工智能、圖挖掘、知識圖譜等領域的知名學者，他主導構建的 XLore 等大規模知識圖譜，是中文知識工程的基石。

在產業界，他的影響力同樣深遠。

他曾擔任北京智源人工智能研究院副院長，這一經歷讓他在中國 AI 產業頂層設計、大模型早期發展中扮演了關鍵角色。更令人津津樂道的是，Kimi 創始人、月之暗面創始人楊植麟在清華大學計算機系讀本科期間，唐杰正是他的導師。

換句話說，唐杰在中國大模型創業圈和學術圈，絕不是旁觀者，而是一個“源頭型人物”。

所以他的這次判斷，值得認真看看。

分享全文

以下是唐杰推文全文（中英對照）：

Recent thoughts: The Shift to Long-Horizon Tasks

The most likely breakthrough this year will be in long-horizon tasks. We are moving toward a stage where Large Language Models (LLMs) learn to complete extended, complex missions by interacting with Agent environments. This is perhaps where the true value of LLMs lies. Take cybersecurity as an example: imagine a model that continuously hunts for software bugs and vulnerabilities. While it sounds like a search process, it's actually the model learning the high-level intuition and methodology of a professional hacker. Unlike humans, AI can run 24/7 without fatigue. It could potentially find exploits at a much higher frequency and claim bounties on platforms like HackerOne or BugCrowd. It sounds fun, but fundamentally, it's a revolution that displaces the hacker. If even hackers are being "disrupted," one can only imagine the impact on general programmers.

最近的一些想法：轉向長時程任務

今年最有可能取得突破的領域是長期任務。我們正朝著大型語言模型（LLM）學習通過與智能體環境交互來完成擴展、復雜任務的階段邁進。這或許才是 LLM 真正價值所在。

以網絡安全為例：想象一個模型能夠持續不斷地搜尋軟件漏洞。這聽起來像是一個搜索過程，但實際上，該模型正在學習專業黑客的高級直覺和方法論。與人類不同，AI 可以全天候運行而不會感到疲勞，以更高的頻率發現漏洞，并在 HackerOne 或 BugCrowd 等平臺上領取賞金。

這聽起來很有趣，但從根本上講，這是一場顛覆黑客的革命。如果連黑客都受到了"顛覆"，那么普通程序員將面臨怎樣的沖擊？

From One-Person to None-Person Companies

Building on long-horizon capabilities, Autonomous Agent Systems (AAS) will inevitably become the next frontier. Last year, we were discussing the rise of the "One Person Company" (OPC). I didn't expect us to move so quickly toward the "None Person Company" (NPC). It's an ironic twist—we might all end up as NPCs in this new ecosystem.

從一人公司到無人公司

基于長遠發展能力，自主代理系統（AAS）必將成為下一個前沿領域。去年，我們還在討論"一人公司"（OPC）的興起。沒想到我們會如此迅速地邁向"無人公司"（NPC）。

這真是一個諷刺的轉折——在這個新生態里，我們最終可能都會成為 NPC。

Engineering the Impossible: Memory and Learning

To realize the vision above, we must solve three technical pillars: Memory, Continual Learning, and Self-Judging. I used to think these would require massive paradigm shifts and years of research. However, the pressure from both the technical and application sides is so intense that we are seeing these capabilities emerge through ingenious engineering "tricks": Memory: Long context windows (1M+) and RAG have significantly bridged the gap. Continual Learning: While true continual learning remains difficult, the release cycles are shrinking. Global models are updated monthly; domestic models are catching up. If we reach weekly updates by next year, it will effectively function as continual learning. Self-Judging: This remains the most elusive, yet models like Opus 4.7 are already demonstrating early self-correction and judgment capabilities.

化不可能為可能：記憶與學習

要實現上述愿景，我們必須解決三個技術支柱：記憶、持續學習和自我評判。我過去認為這些需要巨大的范式轉變和多年的研究。

然而，來自技術和應用方面的壓力如此之大，以至于我們看到這些能力正通過巧妙的工程"技巧"涌現出來：

記憶：長上下文窗口（1M+）和 RAG 顯著縮小了差距。
持續學習：雖然真正的持續學習仍然困難重重，但發布周期正在縮短。全球模型每月更新一次；國內模型也在迎頭趕上。如果明年能實現每周更新，就能有效地實現持續學習。
自我評判：這仍然是最難以捉摸的，但像 Opus 4.7 這樣的模型已經展現出早期自我糾正和判斷能力。

The Self-Evolving Endgame

The most difficult—and most promising—path is Self-Evolution. The current wave is incredibly fierce. I suspect that models like Claude may have already achieved a baseline for self-training: writing their own code, cleaning their own data, generating synthetic data, and then training on it. It might "waste" some compute, but it saves the most precious resources: human labor and time. In the LLM era, speed is everything. Rapid iteration is what creates the cognitive gap between leaders and followers. Claude's rumored 2-million-chip cluster for next year is likely dedicated to exactly this: autonomous model self-training.

Technical Summary: 1M Context: Necessary baseline. Memory & Continual Learning: Prerequisites, likely solved first via "tricky" engineering. Harnessing Environments: The breakthrough point. Self-Judging: The tipping point. Full Self-Training: The endgame.

自我演化的終局

最艱難也最有前途的道路是自我進化。當前的浪潮勢頭異常強勁。我懷疑像 Claude 這樣的模型可能已經達到了自我訓練的基準：編寫自己的代碼、清理自己的數據、生成合成數據，然后用這些數據進行訓練。

這或許會"浪費"一些計算資源，但卻節省了最寶貴的資源：人力和時間。在大模型時代，速度至關重要。快速迭代正是造成領導者和追隨者之間認知差距的關鍵。據傳 Claude 明年將要使用的擁有 200 萬個芯片的集群，很可能正是為此而建：自主模型的自我訓練。

技術路線概要： 1M 上下文：必要的基線。
記憶與持續學習：前提條件，很可能首先通過"巧妙的"工程手段來解決。
利用環境：突破點。自我評判：轉折點。
完全自主訓練：最終目標。

Redefining AGI and the Industry

If this is the road to AGI, then AGI's definition should be the sum of all human collective intelligence, not just an individual's intelligence. It must possess the creative capacity to produce something as profound as the "Theory of Relativity"—meeting the bar set by Hassabis. During this transition, every APP will need to be reconstructed as AI-native. In fact, we might move past the concept of APPs entirely. The most significant challenge will be the reconstruction of the operating system itself. In the future, you won't see a traditional desktop; you will see an LLM OS, where applications are "generated on demand." This challenges the 80-year-old Von Neumann architecture and represents a total upheaval of the computer science industry.

重新定義 AGI 和行業

如果這就是通往 AGI 的道路，那么 AGI 的定義應該是全人類集體智慧的總和，而不僅僅是個體的智慧。它必須具備創造能力，能夠創造出像"相對論"那樣意義深遠的理論——達到哈薩比斯設定的標準。

在此轉型過程中，所有應用程序都需要重構為 AI 原生應用。事實上，我們甚至可能徹底摒棄應用程序的概念。最大的挑戰將是操作系統本身的重構。未來你將不再看到傳統的桌面系統，而是看到 LLM 操作系統，其中應用程序是"按需生成的"。這將挑戰沿用 80 年的馮·諾依曼架構，徹底顛覆計算機科學行業。

The Irreversible Wave

From completing long-horizon tasks to fully autonomous operations, every sector—Security, Finance, Law, E-commerce—will be reshaped. Many friends have reached out lately, asking how to transform their enterprises to keep pace with AI. But few truly realize that this irreversible process has already begun. As this massive technical wave hits, we must be prepared to act, but we must also start thinking seriously about how to regulate it.

不可逆的浪潮

從完成長期任務到實現完全自主運營，各個領域——安全、金融、法律、電子商務——都將發生重塑。最近很多朋友聯系我，詢問如何轉型才能跟上 AI 的步伐。

但很少有人真正意識到，這一不可逆轉的進程已經開始。隨著這股技術浪潮的到來，我們必須做好應對準備，同時也必須認真思考如何對其進行監管。

聲明：本文為 AI 前線編譯，不代表平臺觀點，未經許可禁止轉載。

會議推薦

Agent 從 Demo 到工程化還差什么？安全與可信這道坎怎么過？研發體系不重構，還能撐多久？

AICon 上海站 2026，13 大重磅專題已上線，誠摯邀請你登臺分享實戰經驗。AICon 2026，期待與你同行。快來掃碼鎖定 8 折專屬席位或提交演講議題

今日薦文

你也「在看」嗎？

特別聲明：以上內容(如有圖片或視頻亦包括在內)為自媒體平臺“網易號”用戶上傳并發布，本平臺僅提供信息存儲服務。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.