首页 博客 论文 技能
EN ZH
3
论文
941
引用
190
页数
8.5
平均分
~2M
Token
17+
子智能体
Paper #1

From Copilots to Colleagues: A Survey of Autonomous Research Agents in the Age of Foundation Models

This survey proposes a five-level autonomy taxonomy (L1–L5), identifies four dominant architectural patterns, and systematically compares 17 major systems across a six-dimensional feature matrix. Includes an illustrative pilot study comparing 5 frontier models across 3 research tasks and 3 agent architectures, with a formal Architecture-Capability Trade-off Conjecture.
228 引用
63
60.5% 近1年
29.8% 已接收
8.5/10 审议
下载 (628 KB) V5 · 2026-06-04
@article{chendeli_202606_auto_research_survey, title={From Copilots to Colleagues: A Survey of Autonomous Research Agents in the Age of Foundation Models}, author={Chen, Deli}, year={2026}, url={https://victorchen96.github.io/auto_research/auto_research_survey.pdf}, note={Generated by Deli AutoResearch framework. 228 citations, 63 pages. Available at https://victorchen96.github.io/auto_research/paper.html} }
Paper #2

Never Stop Learning: A Unified Survey of Continual Learning and Self-Improvement in Large Language Models

Unifies continual learning and self-improvement under a three-axis taxonomy (Strategy × Scope × Objective). Formalizes CL×SI interaction via bilevel optimization with impossibility conjectures. Includes two pilot experiments: a CL×SI interaction study revealing GPT-5.2’s deterministic SI collapse, and a knowledge retention-acquisition trade-off study identifying Self-Verification as Pareto-optimal across 5 domains.
329 引用
70
54.3% 近1年
30.1% 已接收
8.5/10 审议
下载 (777 KB) V5 · 2026-06-04
@article{chendeli_2026_continue_learning_survey, title={Never Stop Learning: A Unified Survey of Continual Learning and Self-Improvement in Large Language Models}, author={Chen, Deli}, year={2026}, url={https://victorchen96.github.io/auto_research/continual_learning_survey.pdf}, note={Generated by Deli AutoResearch framework. 329 citations, 70 pages. Available at https://victorchen96.github.io/auto_research/paper.html} }
Paper #3

Navigating the Long Horizon: A Comprehensive Survey of Agent Architectures and Reinforcement Learning for Extended Sequential Decision-Making

Surveys 384 papers on long-horizon sequential decision-making, covering hierarchical planning, reactive agents, search-based methods (MCTS, PRM), and RL for agents. Features a rigorous horizon scaling experiment across 5 frontier models × 5 horizon lengths × 3 conditions × 3 task types, with exponential decay fitting (R² > 0.93). Chain-of-thought and hierarchical planning significantly reduce horizon degradation.
384 引用
57
35.4% 近1年
49.2% 已接收
8.5/10 审议
下载 (762 KB) V4 · 2026-06-04
@article{chendeli_202606_long_horizon_survey, title={Navigating the Long Horizon: A Comprehensive Survey of Agent Architectures and Reinforcement Learning for Extended Sequential Decision-Making}, author={Chen, Deli}, year={2026}, url={https://victorchen96.github.io/auto_research/long_horizon_survey.pdf}, note={Generated by Deli AutoResearch framework. 384 citations, 57 pages. Available at https://victorchen96.github.io/auto_research/paper.html} }

生产消耗统计

指标Paper #1Paper #2Paper #3合计
BibTeX 条目228326384938
PDF 页数637057190
图片5+8+1326+
表格14+15+30+59+
同行审议 (最终)8.5/108.5/108.5/108.5 avg
审议轮次V1→V5V1→V5V1→V414 rounds
计算消耗
总迭代轮次~60~80~70~210
输出Token数~550K~720K~680K~1.95M
工具调用次数~380~470~520~1,370
子智能体数12+18+18+48+
总耗时~10h~12h~16h~38h
引用质量
期刊升级1614636
新增引用 (6月)344166141
织入论文15253373
近1年引用占比60.5%54.3%35.4%
已接收占比29.8%30.1%49.2%

Subagent Consumption (Literature + Experiment + Review Cycle)

PhaseSubagentsTokensTool UsesWall Clock
Literature collection (3 papers)3386,35933258 min
Text weaving (3 papers)3203,20411744 min
Experiment design + execution2111,11510046 min
Experiment integration + Review V3164,4604527 min
Weakness fix + Review V4187,4985826 min
合计10+~852,636652~201 min

Review Score Trajectory

PaperV1V2V3V4V5 (Final)
Paper #1 (Auto-Research)6.06.57.58.08.5 ✓
Paper #2 (Continual Learning)6.06.57.08.08.5 ✓
Paper #3 (Long-Horizon)7.03.0*8.08.5 ✓

* Paper #3 V2 scored by adversarial reviewer with strict experimental standards; V3 addressed all concerns with redesigned horizon scaling experiment. V5 improvements focus on analytical depth, structural cohesion, and cross-benchmark validation.

Literature Funnel (4-Stage Pipeline)

Each paper goes through a systematic 4-stage literature review pipeline: Recall (broad keyword search via site:arxiv.org) → Score (LQS multi-dimensional quality scoring) → Classify (A/B/C/D citation depth assignment) → Upgrade (arXiv preprint → accepted venue via DBLP).

StagePaper #1Paper #2Paper #3合计
Stage 1: Recall
Keyword queries × site:arxiv.org
20 queries
170 results
10 queries
83 results
20+ queries
134 results
50+ queries
387 results
Stage 2: Score (LQS)
Recency 30% + Citation 25% + Venue 20% + Institution 10% + Acceptance 15%
50 scored
14 must-cite
36 conditional
0 dropped
45 scored
45 must-cite
0 conditional
0 dropped
133 scored
72 must-cite
51 conditional
10 dropped
228 scored
131 must-cite
87 conditional
10 dropped
Stage 3: Classify
A = deep discussion, B = detailed cite, C = brief cite, D = drop
A: 5 • B: 10
C: 35 • D: 0
A: 4 • B: 12
C: 29 • D: 0
A: 7 • B: 13
C: 103 • D: 10
A: 16 • B: 35
C: 167 • D: 10
Stage 4: Upgrade
arXiv → @inproceedings via DBLP/OpenReview
16 upgraded14 upgraded6 upgraded36 upgraded
Final BibTeX 228 entries329 entries384 entries941 entries

LQS thresholds: ≥7.0 = must-cite (high quality + high relevance), 5.0–7.0 = conditional (fills taxonomy gap), <5.0 = dropped.
Citation depth: A-level papers get 1–3 paragraphs of discussion; B-level get 2–5 sentences; C-level get a single citation in context; D-level are excluded from the paper.

技能调用统计

研究流水线中调用的技能。

技能ID调用次数阶段用途
paper_writing 已开源3写作父技能组:LaTeX 生成、章节结构、图表规范、编译
— literature_survey 已开源12+文献调研关键词生成、LQS 评分、引用深度分类、期刊升级
— paper_structure 已开源6+写作章节大纲、段落衔接、交叉引用一致性、分类体系设计
— experiment_design 已开源2实验Horizon scaling 实验设计、CL×SI 交互实验设计
— figures_tables 已开源8+写作图片排版、表格格式、标题生成、可视化规范
— peer_review_simulation 已开源14审议多角色评分(5 种审稿人)、迭代修复(V1→V5)
内部技能(无法公开)
search_agent#512+文献 & 验证arXiv 搜索、引用验证、DBLP 交叉检查、接收状态查询
call_api#28+审议 & 实验多模型同行审议(3–5 审稿人 × 5 轮)、horizon scaling 实验(3300 次 API 调用)
static_file_service#64部署PDF 托管、index.html 生成、服务重启
skill-router#573编排动态技能匹配(文献、实验、部署子任务)
Deli_AutoResearch*3编排主框架:防循环、心跳、状态管理、多轨协调
 
技能总调用次数68+横跨 3 篇论文 × 5+ 审议轮次 × 多阶段流水线

paper_writing 为已开源技能组,包含 5 个子技能。标有 ID 编号的技能(#2、#5、#6、#57)依赖内部基础设施,无法公开。
* Deli_AutoResearch 仍在持续迭代中,暂无稳定的公开发布版本。

paper_writing 技能已开源,其他技能为内部使用。

查看开源技能: paper_writing →
Loading...