Homepage Blog Papers Skill
EN ZH
3
Papers
941
Citations
190
Pages
8.5
Avg Score
~2M
Tokens
17+
Subagents
Paper #1

From Copilots to Colleagues: A Survey of Autonomous Research Agents in the Age of Foundation Models

This survey proposes a five-level autonomy taxonomy (L1–L5), identifies four dominant architectural patterns, and systematically compares 17 major systems across a six-dimensional feature matrix. Includes an illustrative pilot study comparing 5 frontier models across 3 research tasks and 3 agent architectures, with a formal Architecture-Capability Trade-off Conjecture.
228 citations
63 pages
60.5% within 1yr
29.8% accepted
8.5/10 review
Download (628 KB) V5 · 2026-06-04
@article{chendeli_202606_auto_research_survey, title={From Copilots to Colleagues: A Survey of Autonomous Research Agents in the Age of Foundation Models}, author={Chen, Deli}, year={2026}, url={https://victorchen96.github.io/auto_research/auto_research_survey.pdf}, note={Generated by Deli AutoResearch framework. 228 citations, 63 pages. Available at https://victorchen96.github.io/auto_research/paper.html} }
Paper #2

Never Stop Learning: A Unified Survey of Continual Learning and Self-Improvement in Large Language Models

Unifies continual learning and self-improvement under a three-axis taxonomy (Strategy × Scope × Objective). Formalizes CL×SI interaction via bilevel optimization with impossibility conjectures. Includes two pilot experiments: a CL×SI interaction study revealing GPT-5.2’s deterministic SI collapse, and a knowledge retention-acquisition trade-off study identifying Self-Verification as Pareto-optimal across 5 domains.
329 citations
70 pages
54.3% within 1yr
30.1% accepted
8.5/10 review
Download (777 KB) V5 · 2026-06-04
@article{chendeli_2026_continue_learning_survey, title={Never Stop Learning: A Unified Survey of Continual Learning and Self-Improvement in Large Language Models}, author={Chen, Deli}, year={2026}, url={https://victorchen96.github.io/auto_research/continual_learning_survey.pdf}, note={Generated by Deli AutoResearch framework. 329 citations, 70 pages. Available at https://victorchen96.github.io/auto_research/paper.html} }
Paper #3

Navigating the Long Horizon: A Comprehensive Survey of Agent Architectures and Reinforcement Learning for Extended Sequential Decision-Making

Surveys 384 papers on long-horizon sequential decision-making, covering hierarchical planning, reactive agents, search-based methods (MCTS, PRM), and RL for agents. Features a rigorous horizon scaling experiment across 5 frontier models × 5 horizon lengths × 3 conditions × 3 task types, with exponential decay fitting (R² > 0.93). Chain-of-thought and hierarchical planning significantly reduce horizon degradation.
384 citations
57 pages
35.4% within 1yr
49.2% accepted
8.5/10 review
Download (762 KB) V4 · 2026-06-04
@article{chendeli_202606_long_horizon_survey, title={Navigating the Long Horizon: A Comprehensive Survey of Agent Architectures and Reinforcement Learning for Extended Sequential Decision-Making}, author={Chen, Deli}, year={2026}, url={https://victorchen96.github.io/auto_research/long_horizon_survey.pdf}, note={Generated by Deli AutoResearch framework. 384 citations, 57 pages. Available at https://victorchen96.github.io/auto_research/paper.html} }

Production Statistics

MetricPaper #1Paper #2Paper #3Total
BibTeX entries228326384938
PDF pages637057190
Figures5+8+1326+
Tables14+15+30+59+
Peer review (final)8.5/108.5/108.5/108.5 avg
Review iterationsV1→V5V1→V5V1→V414 rounds
Compute Consumption
Total iterations (agent turns)~60~80~70~210
Output tokens~550K~720K~680K~1.95M
Tool invocations~380~470~520~1,370
Subagents spawned12+18+18+48+
Wall clock (total)~10h~12h~16h~38h
Citation Quality
Venue upgrades1614636
New refs added (June)344166141
Papers woven in15253373
1yr citation ratio60.5%54.3%35.4%
Accepted ratio29.8%30.1%49.2%

Subagent Consumption (Literature + Experiment + Review Cycle)

PhaseSubagentsTokensTool UsesWall Clock
Literature collection (3 papers)3386,35933258 min
Text weaving (3 papers)3203,20411744 min
Experiment design + execution2111,11510046 min
Experiment integration + Review V3164,4604527 min
Weakness fix + Review V4187,4985826 min
Total10+~852,636652~201 min

Review Score Trajectory

PaperV1V2V3V4V5 (Final)
Paper #1 (Auto-Research)6.06.57.58.08.5 ✓
Paper #2 (Continual Learning)6.06.57.08.08.5 ✓
Paper #3 (Long-Horizon)7.03.0*8.08.5 ✓

* Paper #3 V2 scored by adversarial reviewer with strict experimental standards; V3 addressed all concerns with redesigned horizon scaling experiment. V5 improvements focus on analytical depth, structural cohesion, and cross-benchmark validation.

Literature Funnel (4-Stage Pipeline)

Each paper goes through a systematic 4-stage literature review pipeline: Recall (broad keyword search via site:arxiv.org) → Score (LQS multi-dimensional quality scoring) → Classify (A/B/C/D citation depth assignment) → Upgrade (arXiv preprint → accepted venue via DBLP).

StagePaper #1Paper #2Paper #3Total
Stage 1: Recall
Keyword queries × site:arxiv.org
20 queries
170 results
10 queries
83 results
20+ queries
134 results
50+ queries
387 results
Stage 2: Score (LQS)
Recency 30% + Citation 25% + Venue 20% + Institution 10% + Acceptance 15%
50 scored
14 must-cite
36 conditional
0 dropped
45 scored
45 must-cite
0 conditional
0 dropped
133 scored
72 must-cite
51 conditional
10 dropped
228 scored
131 must-cite
87 conditional
10 dropped
Stage 3: Classify
A = deep discussion, B = detailed cite, C = brief cite, D = drop
A: 5 • B: 10
C: 35 • D: 0
A: 4 • B: 12
C: 29 • D: 0
A: 7 • B: 13
C: 103 • D: 10
A: 16 • B: 35
C: 167 • D: 10
Stage 4: Upgrade
arXiv → @inproceedings via DBLP/OpenReview
16 upgraded14 upgraded6 upgraded36 upgraded
Final BibTeX 228 entries329 entries384 entries941 entries

LQS thresholds: ≥7.0 = must-cite (high quality + high relevance), 5.0–7.0 = conditional (fills taxonomy gap), <5.0 = dropped.
Citation depth: A-level papers get 1–3 paragraphs of discussion; B-level get 2–5 sentences; C-level get a single citation in context; D-level are excluded from the paper.

Skill Hub Usage

Skills invoked during the research pipeline.

SkillIDInvocationsPhasePurpose
paper_writing OPEN SOURCE3WritingParent skill group: LaTeX generation, section structure, figure/table standards, compilation
— literature_survey OPEN SOURCE12+LiteratureKeyword generation, LQS scoring, citation depth classification, venue upgrade
— paper_structure OPEN SOURCE6+WritingSection outline, paragraph flow, cross-reference consistency, taxonomy design
— experiment_design OPEN SOURCE2ExperimentHorizon scaling study design, CL×SI interaction experiment design
— figures_tables OPEN SOURCE8+WritingFigure layout, table formatting, caption generation, visualization standards
— peer_review_simulation OPEN SOURCE14ReviewMulti-persona scoring (5 reviewer types), iterative fix cycles (V1→V5)
Internal Skills (not publicly available)
search_agent#512+Literature & VerifyarXiv search, citation verification, DBLP cross-check, acceptance status lookup
call_api#28+Review & ExperimentMulti-model peer review (3–5 reviewers × 5 rounds), horizon scaling experiment (3300 API calls)
static_file_service#64DeployPDF hosting, index.html generation, service restart
skill-router#573OrchestrationDynamic skill matching for sub-tasks (literature, experiment, deployment)
Deli_AutoResearch*3OrchestrationMaster framework: anti-loop, heartbeat, state management, multi-track coordination
 
Total skill invocations68+Across 3 papers × 5+ review rounds × multi-stage pipeline

paper_writing is the open-source skill group containing 5 sub-skills. Skills with IDs (#2, #5, #6, #57) depend on internal infrastructure and are not publicly available.
* Deli_AutoResearch is still actively iterating and does not have a stable public release yet.

The paper_writing skill is open source. Other skills in the pipeline are internal.

View Open Source Skill: paper_writing →
Loading...