首页 博客 论文 技能
EN ZH

子技能

01

文献调研

4-stage pipeline: Recall → Score (LQS) → Classify (A/B/C/D) → Upgrade (arXiv→accepted).
IN: topic + taxonomy keywords
OUT: references.bib + citation_plan.jsonl
Stage 1: High-Recall Retrieval
  • 20-30 keyword queries via search.py -o "site:arxiv.org ..."
  • Each taxonomy cell: 3+ query variants (core terms, synonyms, method names)
  • Snowball: seed paper citation networks
  • Target: 200-500 raw candidates
Stage 2: LQS Multi-Dimensional Scoring
DimensionWeightScoring
Recency30%6mo=10, 1yr=8, 2yr=5, 3yr=3
Citation Impact25%cites/mo: ≥50=10, ≥10=8, ≥3=6
Venue20%Top-tier=10, Strong=7, Workshop=4
Institution10%Top lab=10, Top uni=9
Acceptance15%Accepted=10, Under review=5, None=3

Thresholds: LQS≥7.0 must-cite, 5.0-7.0 conditional, <5.0 drop

Stage 3: Citation Depth Classification
  • A-level (1-3 paragraphs): section protagonist, 3-5 per chapter
  • B-level (2-5 sentences): important insight, 5-10 per chapter
  • C-level (1 sentence): supporting evidence
  • D-level: dropped, not cited
Stage 4: Venue Upgrade
  • Cross-check DBLP + OpenReview for acceptance status
  • arXiv with "Accepted at X" → @inproceedings
  • Target: arXiv-only ratio ≤ 60%
Verification
  • Every 20 citations: title match, author, year, venue check
  • Target: verification rate ≥80%, hallucinated = 0
  • Year distribution: within-1yr ≥40%, accepted ≥30%
02

论文结构与逻辑

Chapter architecture, paragraph logic chains, taxonomy design, formal claims, hedge language, abstract-conclusion alignment.
IN: bib + experiment findings
OUT: sections/*.tex (full manuscript)
Chapter Architecture (Survey Standard)
  • §1 Introduction: Hook → Gap → Contributions → Roadmap
  • §2 Background: formal definitions, taxonomy overview
  • §3-6 Core: one method family per chapter, with critical assessment
  • §7 Benchmarks + Experiments
  • §8 Future: specific open problems (Barrier + Attack vector)
  • §9 Conclusion: numbered key findings (not repeat of abstract)
Paragraph Logic Patterns
PatternStructureUse Case
Claim-Evidence-ImplicationAssert → Data → So whatMain body
Compare-ContrastA → B → Difference → Trade-offMethod comparison
Concession-RebuttalAdmit strength → But limitationCritical analysis
FunnelBroad → Narrow → This paperIntroduction
Taxonomy Design
  • Multi-axis matrix (not flat list)
  • MECE: mutually exclusive, collectively exhaustive
  • Must have empty cells → gap analysis material
  • Spanning methods show taxonomy tension (good)
Formal Claims
  • Default: Conjecture + Remark (not Theorem)
  • Hedge ladder: demonstrates > suggests > may > hypothesize
  • Rule: claim strength ≤ evidence strength
Related Work Differentiation
  • Mandatory comparison table with existing surveys
  • "We're more recent" is NOT sufficient differentiation
  • Need structural novelty: new taxonomy, new angle, new experiment
03

实验设计

4-stage loop: Design (hypothesis) → Execute (API/GPU) → Iterate (adjust) → Report (structured JSON).
IN: conjecture or gap
OUT: results.json + experiment_summary.md
Stage 1: Design (Most Important)
  • Must answer: "which paper claim does this support?"
  • Experiment spec: hypothesis, independent/dependent vars, control vars, expected results
  • Statistical plan decided BEFORE running (no HARKing)
  • Principles: falsifiable, minimal first, pre-registered, has control
Stage 2: Execute
PathScaleUse Case
Path A: APIHours, lightweightMulti-model comparison, prompt ablation
Path B: GPU RLDays, heavyweightAgent training, reward shaping
  • API: 3-5 frontier models × 2-3 conditions × 15-25 tasks × 3 trials
  • GPU: via cluster job submission + auto-monitoring loop
Stage 3: Iterate
  • Ceiling effect → increase difficulty
  • Floor effect → decrease difficulty or check for bugs
  • Not significant → increase trials or change hypothesis
  • Surprise finding → design follow-up
  • Max 5 iterations, then accept best result
Stage 4: Report (Data Only)
  • Output: results.json (schema: config + results + statistics + findings)
  • Output: experiment_summary.md (purpose, results, limitations)
  • Does NOT produce LaTeX tables or figures — that's the Figures skill's job
04

学术图表设计

High information-density tables and vector figures. Presentation layer for all data in the paper.
IN: results.json + section placeholders
OUT: figures/*.pdf + tables/*.tex
Table Types
TypeUseInfo Density
Comparison MatrixMethods × featuresVery high
Benchmark TableModels × metricsHigh
Ablation TableConditions × resultsHigh
Taxonomy TableClassification visualizationMedium
Meta-analysisAggregated cross-paper dataVery high
Table Rules
  • No vertical lines — booktabs three-line style only
  • Alternating row color: \rowcolor{gray!6}
  • Bold best results in each column
  • All experimental data: mean ± std
  • Caption must contain key finding, not just description
Figure Types & Tools
  • Data-driven (curves, bars, heatmaps): matplotlib → PDF
  • Architecture/flow diagrams: TikZ or SVG→PDF
  • Simple schematics: PIL → PNG (acceptable per reviewer feedback)
  • Priority: TikZ > matplotlib PDF > SVG→PDF > PIL PNG
Quality Checklist
  • Vector format (PDF) preferred, PNG ≥ 300 DPI
  • Font size ≥ 10pt after scaling
  • Academic palette: blue #2196F3, red #F44336, green #4CAF50, orange #FF9800
  • All axes labeled, all lines have legend
  • Light grid (alpha=0.3) for readability
  • Self-contained: understandable without reading main text
Quantity Targets
  • Full survey (50+ pages): ≥10 tables, ≥6 figures
  • Short survey (30 pages): ≥5 tables, ≥3 figures
05

同行审议模拟

Multi-persona scoring that drives the iteration loop by routing weaknesses back to sub-skills #1-4.
IN: compiled PDF
OUT: score + weakness list → routed to corresponding sub-skill
Reviewer Personas (3-5 per round)
PersonaFocusScoring Weight
R1 ExperimentalistStatistical rigor, baselines, replicationExperimental 30%
R2 TheoristFormal definitions, proofs, MECE taxonomyTechnical depth 35%
R3 PerfectionistWriting quality, figures, formattingClarity 30%
R4 SynthesizerCross-cutting analysis, gap identificationNovelty 25%
R5 NewcomerAccessibility, definitions, examplesClarity 35%
Scoring Protocol
  • Each reviewer scores independently (no anchoring)
  • Final score = median of all reviewers
  • Dimensions: Novelty, Comprehensiveness, Clarity, Technical Depth, Experimental Validation
  • Calibration: 6.0=workshop, 7.0=main conference, 8.0=Strong Accept (top 20%), 9.0=Oral
Anti-Inflation Rules
  • First round score capped at 7.0 (every paper has room to improve)
  • Max +1.5 per round
  • At least 1 "unresolved" weakness must remain
  • Different LLM model for at least 1 reviewer per round (diversity)
Output Format
  • Overall score + per-dimension scores
  • 3-5 Strengths, 3-5 Weaknesses (prioritized Major/Minor)
  • Concrete suggestions (actionable)
  • Recommendation: Accept / Weak Accept / Borderline / Reject
  • Regression check: are previously-fixed weaknesses still fixed?

工作流与阶段路由

Phase 0: Topic Selection (before pipeline starts) 3-question test: Scope? Angle? Audience? Phase 1: Draft (Iter 1-6, target: 6.0/10) Iter 1 [Structure] skeleton + §1-2 + compile Iter 2 [Literature] Stage 1-2: recall + LQS scoring Iter 3 [Structure] §3-6 core || [Figures] 2+ figures Iter 4 [Literature] Stage 3-4 || [Structure] §7-8 Iter 5 verify citations → compile → [Review] first score Iter 6 route fixes → compile Phase 2: Deep Improvement (Iter 7-9, target: 7.5-8.0) Iter 7 [Experiment] design + execute Iter 8 [Figures] present results + [Structure] integrate Iter 9 compile → [Review] → route fixes Phase 3: Sprint (Iter 10+, target: 8.5+) Loop: [Review] → weakness routing → fix → compile → [Review] Stop: score ≥ 8.5 OR Δ ≤ 0.3 for 2 rounds OR iter > 12

弱点路由表

当同行审议发现弱点时,按类型路由到对应子技能:

审稿人弱点路由目标修复动作
"Citation coverage insufficient"LiteratureStage 1-2 targeted search
"Too many arXiv-only refs"LiteratureStage 4 upgrade via DBLP
"Missing recent papers"Literature2025-2026 focused search
"Structure unclear"StructureReorganize + add transitions
"Analysis lacks depth"StructureAdd Critical Assessment
"Taxonomy not novel"StructureRedesign multi-axis
"Claims too strong"StructureHedge language downgrade
"No experiments"ExperimentDesign pilot study
"Experiment not rigorous"ExperimentAdd trials / ablation
"Tables incomparable"FiguresRegroup + add Δ column
"Missing visualizations"FiguresAdd figure
"No error bars"FiguresAdd ± std

质量门禁

每个子技能的产出必须通过对应门禁才能进入整合环节。Gate 1和2可并行;Gate 5为阻塞式终审。

Gate 1: Literature

  • Citations ≥ 80 (draft) / ≥ pages×3 (final)
  • Within 1yr ≥ 40%
  • Accepted ≥ 30%
  • arXiv-only ≤ 60%
  • Verification rate ≥ 80%
  • Every taxonomy cell ≥ 2 A/B refs

Gate 2: Experiment

  • Clear hypothesis pre-registered
  • Statistical test reported (p or CI)
  • ≥ 3 trials with std
  • No ceiling/floor effect
  • Links to specific paper claim
  • (Bonus) Surprise finding

Gate 3: Structure

  • Compiles with 0 errors & 0 undefined refs
  • Every .tex file ≤ 300 lines
  • Abstract-conclusion alignment
  • Inter-section transitions present
  • Critical assessment in core sections
  • ≥ 1 formal claim (conjecture/observation)
  • Terminology consistent throughout

Gate 4: Figures & Tables

  • Tables ≥ 10, Figures ≥ 6 (full survey)
  • booktabs format, no vertical lines
  • Each carries a non-trivial insight
  • Captions contain conclusion, not just description
  • Every figure/table referenced in text
  • Experimental data has mean ± std

Gate 5: Final Review (Blocking)

分数递进路径 (已验证)

分数相比上一级的新增要求典型补充内容
6.0Complete draft, 80+ refs, compilesFull 8 sections + basic tables
7.0+ logical transitions, quantitative data, gap analysisFormal conjecture + grouped tables
8.0+ original experiment, critical assessment, 150+ refsMulti-model pilot study + vector figures
8.5+ cross-validation, meta-analysis, key takeaways, proof sketchCross-benchmark table + deeper theory

生产统计

子技能时间占比评分贡献关键产出
Literature Survey20%基础分 (无此: ≤6.0)941 条引用 (3篇论文合计)
Structure & Logic35%主力驱动 (6.0→7.5)190 页手稿
Experiment Design20%+1.0~1.5 points3,300+ 次API调用, 9个模型
Figures & Tables10%+0.5~1.0 points59+ 张表格, 26+张图
Review + Integration15%驱动迭代14 轮审议

相关页面

← 论文与PDF