Deli AutoResearch 论文

Paper #1

From Copilots to Colleagues: A Survey of Autonomous Research Agents in the Age of Foundation Models

This survey proposes a five-level autonomy taxonomy (L1–L5), identifies four dominant architectural patterns, and systematically compares 17 major systems across a six-dimensional feature matrix. Includes an illustrative pilot study comparing 5 frontier models across 3 research tasks and 3 agent architectures, with a formal Architecture-Capability Trade-off Conjecture.

228 引用

63 页

60.5% 近1年

29.8% 已接收

8.5/10 审议

下载 (628 KB) V5 · 2026-06-04

@article{chendeli_202606_auto_research_survey, title={From Copilots to Colleagues: A Survey of Autonomous Research Agents in the Age of Foundation Models}, author={Chen, Deli}, year={2026}, url={https://victorchen96.github.io/auto_research/auto_research_survey.pdf}, note={Generated by Deli AutoResearch framework. 228 citations, 63 pages. Available at https://victorchen96.github.io/auto_research/paper.html} }

Paper #2

Never Stop Learning: A Unified Survey of Continual Learning and Self-Improvement in Large Language Models

Unifies continual learning and self-improvement under a three-axis taxonomy (Strategy × Scope × Objective). Formalizes CL×SI interaction via bilevel optimization with impossibility conjectures. Includes two pilot experiments: a CL×SI interaction study revealing GPT-5.2’s deterministic SI collapse, and a knowledge retention-acquisition trade-off study identifying Self-Verification as Pareto-optimal across 5 domains.

329 引用

70 页

54.3% 近1年

30.1% 已接收

8.5/10 审议

下载 (777 KB) V5 · 2026-06-04

@article{chendeli_2026_continue_learning_survey, title={Never Stop Learning: A Unified Survey of Continual Learning and Self-Improvement in Large Language Models}, author={Chen, Deli}, year={2026}, url={https://victorchen96.github.io/auto_research/continual_learning_survey.pdf}, note={Generated by Deli AutoResearch framework. 329 citations, 70 pages. Available at https://victorchen96.github.io/auto_research/paper.html} }

Paper #3

Navigating the Long Horizon: A Comprehensive Survey of Agent Architectures and Reinforcement Learning for Extended Sequential Decision-Making

Surveys 384 papers on long-horizon sequential decision-making, covering hierarchical planning, reactive agents, search-based methods (MCTS, PRM), and RL for agents. Features a rigorous horizon scaling experiment across 5 frontier models × 5 horizon lengths × 3 conditions × 3 task types, with exponential decay fitting (R² > 0.93). Chain-of-thought and hierarchical planning significantly reduce horizon degradation.

384 引用

57 页

35.4% 近1年

49.2% 已接收

8.5/10 审议

下载 (762 KB) V4 · 2026-06-04

@article{chendeli_202606_long_horizon_survey, title={Navigating the Long Horizon: A Comprehensive Survey of Agent Architectures and Reinforcement Learning for Extended Sequential Decision-Making}, author={Chen, Deli}, year={2026}, url={https://victorchen96.github.io/auto_research/long_horizon_survey.pdf}, note={Generated by Deli AutoResearch framework. 384 citations, 57 pages. Available at https://victorchen96.github.io/auto_research/paper.html} }

生产消耗统计

指标	Paper #1	Paper #2	Paper #3	合计
BibTeX 条目	228	326	384	938
PDF 页数	63	70	57	190
图片	5+	8+	13	26+
表格	14+	15+	30+	59+
同行审议 (最终)	8.5/10	8.5/10	8.5/10	8.5 avg
审议轮次	V1→V5	V1→V5	V1→V4	14 rounds
计算消耗
总迭代轮次	~60	~80	~70	~210
输出Token数	~550K	~720K	~680K	~1.95M
工具调用次数	~380	~470	~520	~1,370
子智能体数	12+	18+	18+	48+
总耗时	~10h	~12h	~16h	~38h
引用质量
期刊升级	16	14	6	36
新增引用 (6月)	34	41	66	141
织入论文	15	25	33	73
近1年引用占比	60.5%	54.3%	35.4%	—
已接收占比	29.8%	30.1%	49.2%	—

Subagent Consumption (Literature + Experiment + Review Cycle)

Phase	Subagents	Tokens	Tool Uses	Wall Clock
Literature collection (3 papers)	3	386,359	332	58 min
Text weaving (3 papers)	3	203,204	117	44 min
Experiment design + execution	2	111,115	100	46 min
Experiment integration + Review V3	1	64,460	45	27 min
Weakness fix + Review V4	1	87,498	58	26 min
合计	10+	~852,636	652	~201 min

Review Score Trajectory

Paper	V1	V2	V3	V4	V5 (Final)
Paper #1 (Auto-Research)	6.0	6.5	7.5	8.0	8.5 ✓
Paper #2 (Continual Learning)	6.0	6.5	7.0	8.0	8.5 ✓
Paper #3 (Long-Horizon)	7.0	3.0*	8.0	8.5 ✓	—

* Paper #3 V2 scored by adversarial reviewer with strict experimental standards; V3 addressed all concerns with redesigned horizon scaling experiment. V5 improvements focus on analytical depth, structural cohesion, and cross-benchmark validation.

Literature Funnel (4-Stage Pipeline)

Each paper goes through a systematic 4-stage literature review pipeline: Recall (broad keyword search via site:arxiv.org) → Score (LQS multi-dimensional quality scoring) → Classify (A/B/C/D citation depth assignment) → Upgrade (arXiv preprint → accepted venue via DBLP).

Stage	Paper #1	Paper #2	Paper #3	合计
Stage 1: Recall Keyword queries × site:arxiv.org	20 queries 170 results	10 queries 83 results	20+ queries 134 results	50+ queries 387 results
Stage 2: Score (LQS) Recency 30% + Citation 25% + Venue 20% + Institution 10% + Acceptance 15%	50 scored 14 must-cite 36 conditional 0 dropped	45 scored 45 must-cite 0 conditional 0 dropped	133 scored 72 must-cite 51 conditional 10 dropped	228 scored 131 must-cite 87 conditional 10 dropped
Stage 3: Classify A = deep discussion, B = detailed cite, C = brief cite, D = drop	A: 5 • B: 10 C: 35 • D: 0	A: 4 • B: 12 C: 29 • D: 0	A: 7 • B: 13 C: 103 • D: 10	A: 16 • B: 35 C: 167 • D: 10
Stage 4: Upgrade arXiv → @inproceedings via DBLP/OpenReview	16 upgraded	14 upgraded	6 upgraded	36 upgraded
Final BibTeX	228 entries	329 entries	384 entries	941 entries

LQS thresholds: ≥7.0 = must-cite (high quality + high relevance), 5.0–7.0 = conditional (fills taxonomy gap), <5.0 = dropped.
Citation depth: A-level papers get 1–3 paragraphs of discussion; B-level get 2–5 sentences; C-level get a single citation in context; D-level are excluded from the paper.

技能调用统计

研究流水线中调用的技能。

技能	ID	调用次数	阶段	用途
paper_writing 已开源	—	3	写作	父技能组：LaTeX 生成、章节结构、图表规范、编译
— literature_survey 已开源	—	12+	文献调研	关键词生成、LQS 评分、引用深度分类、期刊升级
— paper_structure 已开源	—	6+	写作	章节大纲、段落衔接、交叉引用一致性、分类体系设计
— experiment_design 已开源	—	2	实验	Horizon scaling 实验设计、CL×SI 交互实验设计
— figures_tables 已开源	—	8+	写作	图片排版、表格格式、标题生成、可视化规范
— peer_review_simulation 已开源	—	14	审议	多角色评分（5 种审稿人）、迭代修复（V1→V5）
内部技能（无法公开）
search_agent	#5	12+	文献 & 验证	arXiv 搜索、引用验证、DBLP 交叉检查、接收状态查询
call_api	#2	8+	审议 & 实验	多模型同行审议（3–5 审稿人 × 5 轮）、horizon scaling 实验（3300 次 API 调用）
static_file_service	#6	4	部署	PDF 托管、index.html 生成、服务重启
skill-router	#57	3	编排	动态技能匹配（文献、实验、部署子任务）
Deli_AutoResearch^*	—	3	编排	主框架：防循环、心跳、状态管理、多轨协调

技能总调用次数	—	68+	—	横跨 3 篇论文 × 5+ 审议轮次 × 多阶段流水线

paper_writing 为已开源技能组，包含 5 个子技能。标有 ID 编号的技能（#2、#5、#6、#57）依赖内部基础设施，无法公开。
^* Deli_AutoResearch 仍在持续迭代中，暂无稳定的公开发布版本。

Deli AutoResearch 论文集

From Copilots to Colleagues: A Survey of Autonomous Research Agents in the Age of Foundation Models

Never Stop Learning: A Unified Survey of Continual Learning and Self-Improvement in Large Language Models

Navigating the Long Horizon: A Comprehensive Survey of Agent Architectures and Reinforcement Learning for Extended Sequential Decision-Making

生产消耗统计

Subagent Consumption (Literature + Experiment + Review Cycle)

Review Score Trajectory

Literature Funnel (4-Stage Pipeline)

技能调用统计