In 2025, AI research transitioned from scaling models to enhancing their cognitive abilities, introducing paradigm shifts such as test-time compute and persistent memory modules, signalling a new era of smarter, more context-aware artificial intelligence.
In 2025 the contour of AI research shifted from raw scale to structural intelligence: engineers and researchers moved from “making models larger” to “making models smarter”, concentrating breakthroughs on fluid reasoning, long-term memory, spatial intelligence and meta-learning. According to the report by 36Kr, this year marked the end of what it calls the “brute force aesthetics” era and a return to basic research aimed at closing the gap between knowledge and cognitive ability. [1]
The most visible advance was the emergence of Test‑Time Compute (TTC) as a practical paradigm for fluid reasoning. Researchers demonstrated that by trading latency for iterative internal computation , effectively allowing models to “think slowly” , large language models could markedly improve in tasks that demand multi‑step deduction. Microsoft Research’s work on “Thinking‑Optimal Scaling” framed how different reasoning efforts should be allocated, while other studies documented both gains and novel failure modes when lengthening chain‑of‑thoughts, underscoring that more compute at test time is powerful but must be applied selectively. These findings mirror 36Kr’s account of a year in which reinforcement learning and post‑training strategies were central to improving immediate reasoning. [1][4][5]
That debate about how reasoning improvements arise also sharpened around reinforcement learning. Industry practice in 2025 emphasised sampling strategies, verifiable reward signals and new update algorithms: RL driven by verifiable rewards (RLVR) and sparse objective rewards (ORM) proved especially effective in domains with objective correctness such as mathematics and code, and the GPRO family of algorithms emerged as a cost‑effective alternative to PPO by replacing an explicit critic with population scoring. At the same time, academic analyses argued that RL often amplifies reasoning trajectories already present in base models rather than inventing wholly new cognitive primitives , although deep RL can chain asymmetric skills into novel problem‑solving behaviours when taken far enough. 36Kr summarised these tensions and the pragmatic engineering practices that nonetheless produced measurable benchmark gains. [1]
Parallel to reasoning gains, 2025 saw substantial progress on the memory problem that long constrained continual learning and personalised agents. Google Research’s Titans architecture introduced a neural long‑term memory module that can update its parameters during inference, allowing models to store and retrieve vast historical context beyond fixed transformer windows while preserving accuracy across millions of tokens. Complementary work on Nested Learning reframes architecture and optimisation as nested, interacting problems and aims to mitigate catastrophic forgetting by unifying model structure and learning algorithms into a self‑improving system. Both advances challenge the transformer assumption of statelessness and point toward models that accumulate persistent, usable memory. [2][3][1]
The technical design choices behind these memory systems matter for deployment and efficiency. Titans uses a Surprise Metric to decide what to store, updating neural memory where gradients indicate novelty or importance; Nested Learning proposes nested optimisation loops to stabilise parameter updates and reduce destructive interference. These approaches convert external retrieval buffers into internalised, differentiable memory that can be read and written during reasoning , a move that, 36Kr argues, gives models an emergent “hippocampus” and a pathway to cure “goldfish memory”. Practical constraints remain , online updates require careful engineering to control compute and stability , but the scientific direction is clear. [2][3][1]
Spatial intelligence and embodied world modelling also advanced beyond pixel‑stacking. Video generation systems in 2025 increasingly incorporate physical priors and temporal coherence, moving towards generative models that capture dynamics and physical plausibility rather than only per‑frame fidelity. Hardware and systems efforts echoed this trend: Nvidia’s Rubin CPX and disaggregated inference designs target inference throughput and bandwidth for long‑context and video workloads, signalling industry preparation for persistent, context‑heavy agentic applications. Independent work modelling hierarchical, multi‑timescale brain‑like processing reported improved reasoning efficiency, suggesting that biologically inspired architectures can outperform parameter‑heavy LLMs on selected benchmarks. These threads together point to a practical convergence of improved model algorithms and specialised inference hardware. [6][7][1]
Despite rapid progress, several papers flagged practical limits. Empirical studies show that indiscriminate scaling of test‑time compute can produce inverse gains, with failure modes including distraction by irrelevant context and overfitting to problem framings, and meta‑analyses indicate RL improvements follow a sigmoid rather than unbounded power law , implying ceilings to what post‑training alone can extract from a base model. The consensus in 2025 became one of calibrated optimism: TTC, memory modules and RL engineering can unlock large gains today, but sustaining the trajectory toward AGI will require continued base‑model and architectural innovation. [5][4][1]
Looking ahead, the architecture and optimisation advances of 2025 set a new baseline for capable, contextual and persistent AI systems. The year demonstrated that engineering ingenuity , smarter scoring, population‑based policy updates, surprise‑driven memory and differentiated hardware , can compensate for diminishing returns from parameter scale. As 36Kr framed it, the field has moved from brute force to reconstruction: the near term will be defined by integrating fluid reasoning, living memory and spatially aware models into deployed systems and by confronting the practical trade‑offs of compute, robustness and verifiability that those systems entail. [1]
📌 Reference Map:
- [1] (36Kr) – Paragraph 1, Paragraph 2, Paragraph 3, Paragraph 4, Paragraph 6, Paragraph 7
- [4] (Microsoft Research) – Paragraph 2, Paragraph 6
- [5] (arXiv paper on inverse scaling) – Paragraph 2, Paragraph 6
- [2] (Google Research – Titans paper) – Paragraph 4, Paragraph 5
- [3] (Google Research – Nested Learning paper) – Paragraph 4, Paragraph 5
- [6] (Tom’s Hardware on Nvidia Rubin CPX) – Paragraph 6
- [7] (LiveScience reporting on Sapient HRM) – Paragraph 6
Source: Noah Wire Services
Noah Fact Check Pro
The draft above was created using the information available at the time the story first
emerged. We’ve since applied our fact-checking process to the final narrative, based on the criteria listed
below. The results are intended to help you assess the credibility of the piece and highlight any areas that may
warrant further investigation.
Freshness check
Score:
8
Notes:
The narrative presents a comprehensive review of AI advancements in 2025, focusing on fluid reasoning, long-term memory, spatial intelligence, and meta-learning. The earliest known publication date of similar content is January 12, 2026, indicating recent coverage. The report cites multiple sources, including 36Kr, Microsoft Research, arXiv, and Google Research, suggesting a high level of originality. However, the presence of multiple citations and the detailed nature of the content may indicate a synthesis of existing information rather than entirely new findings. The report appears to be based on a press release, which typically warrants a high freshness score. No discrepancies in figures, dates, or quotes were identified. The narrative includes updated data but recycles older material, which may justify a higher freshness score but should still be flagged.
Quotes check
Score:
9
Notes:
The report includes direct quotes from various sources, such as Microsoft Research and Google Research. The earliest known usage of these quotes is from the respective publications, indicating originality. No identical quotes appear in earlier material, and no variations in quote wording were found. No online matches were found for some quotes, raising the score but flagging them as potentially original or exclusive content.
Source reliability
Score:
7
Notes:
The narrative originates from 36Kr, a reputable organisation known for its coverage of the New Economy sector in China. The report cites multiple reputable sources, including Microsoft Research, arXiv, and Google Research, enhancing its credibility. However, the reliance on a single outlet for the primary narrative introduces some uncertainty. The presence of multiple citations from reputable sources strengthens the overall reliability.
Plausability check
Score:
8
Notes:
The narrative presents plausible claims about advancements in AI research, supported by references to reputable sources. Time-sensitive claims, such as the emergence of Test-Time Compute (TTC) and the development of the Titans architecture, are verifiable against recent online information. The report lacks supporting detail from other reputable outlets, which is a concern. The language and tone are consistent with the region and topic, and the structure is focused on the claim without excessive or off-topic detail. The tone is formal and resembles typical corporate or official language.
Overall assessment
Verdict (FAIL, OPEN, PASS): OPEN
Confidence (LOW, MEDIUM, HIGH): MEDIUM
Summary:
The narrative provides a comprehensive review of AI advancements in 2025, citing multiple reputable sources. While the freshness score is high, indicating recent coverage, the reliance on a single outlet for the primary narrative introduces some uncertainty. The plausibility check reveals that the report lacks supporting detail from other reputable outlets, which is a concern. The quotes used are original and exclusive, enhancing the credibility of the content. Given these factors, the overall assessment is ‘OPEN’ with a medium confidence level.
