# 任务：重构分析模块上下文机制 (两阶段选择与统一 I/O 绑定的融合)

**状态**: 设计中 (Finalizing)
**日期**: 2025-11-27
**优先级**: 高
**负责人**: @User / @Assistant

## 1. 核心理念：意图与实现的解耦

我们经历了三个思维阶段，现在需要将其融合成一个完整的体系：
1.  **Context Projection**: 模块需要从全局上下文中“投影”出自己需要的数据。
2.  **Two-Stage Selection**: 这种投影过程分为“选择（我需要什么？）”和“分析（怎么处理它？）”两个阶段，且都需要 Prompt/Model 驱动。
3.  **Unified I/O Binding**: 模块本身不应处理物理路径，应由 Orchestrator 负责 I/O 绑定。

**融合方案**:
*   **Module 定义意图 (Intent)**: 模块通过 Configuration (Prompt/Rules) 描述“我需要什么样的输入”（例如：“我需要去年的财务数据” 或 “按此 Glob 规则匹配”）。
*   **Orchestrator 负责解析 (Resolution)**: Orchestrator（借助 IO Binder）根据模块的意图和当前的全局上下文状态，计算出具体的**物理路径**绑定。
*   **Module 执行实现 (Execution)**: 模块接收 Orchestrator 传来的物理路径，执行读取、分析和写入。

## 2. 架构设计

### 2.1. 模块配置：描述“我需要什么”

`AnalysisModuleConfig` 依然保持两阶段结构，但这里的“Input/Context Selector”描述的是**逻辑需求**。

```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct AnalysisModuleConfig {
    pub id: String,
    
    // Phase 1: Input Intent (我需要什么数据？)
    pub context_selector: ContextSelectorConfig, 
    // Manual: 明确的规则 (e.g., "financials/*.json")
    // Auto: 模糊的需求，交给 Orchestrator/Agent 自动推断
    // Hybrid: 具体的 Prompt (e.g., "Find all news about 'Environment' from last year")

    // Phase 2: Analysis Intent (怎么处理这些数据？)
    pub analysis_prompt: String,
    pub llm_config: Option<LlmConfig>,

    // Output Intent (结果是什么？)
    // 模块只定义它产生什么类型的结果，物理路径由 Orchestrator 分配
    pub output_type: String, // e.g., "markdown_report", "json_summary"
}
```

### 2.2. Orchestrator 运行时：解析“在哪里”

Orchestrator 在调度任务前，会执行一个 **Resolution Step**。

*   **对于 Manual Selector**:
    *   Orchestrator 根据规则（Glob）在当前 VGCS Head Commit 中查找匹配的文件。
    *   生成具体的 `InputBindings` (Map<FileName, PhysicalPath>)。
*   **对于 Auto/Hybrid Selector**:
    *   **这里是关键融合点**：Orchestrator (或专门的 Resolution Agent) 会运行一个轻量级的 LLM 任务。
    *   Input: 当前 VGCS 目录树 + 模块定义的 Selection Prompt (或 Auto 策略)。
    *   Output: 具体的 VGCS 文件路径列表。
    *   Orchestrator 将这些路径打包成 `InputBindings`。

### 2.3. 模块执行：执行“转换”

当模块真正启动时（Worker 接收到 Command），它看到的是**已经被解析过**的确定的世界。

```rust
// 最终发给 Worker 的指令
pub struct GenerateReportCommand {
    pub request_id: Uuid,
    pub commit_hash: String, // 锁定的世界状态
    
    // 具体的 I/O 绑定 (由 Orchestrator 解析完毕)
    pub input_bindings: Vec<String>, // e.g., ["raw/tushare/AAPL/financials.json", ...]
    pub output_path: String,         // e.g., "analysis/financial_v1/report.md"
    
    // 分析逻辑 (透传给 Worker)
    pub analysis_prompt: String,
    pub llm_config: Option<LlmConfig>,
}
```

**变化点**:
*   **复杂的 Selection 逻辑上移**：原本打算放在 Worker 里的 `Select_Smart` 逻辑，现在看来更适合作为 Orchestrator 的预处理步骤（或者一个独立的微任务）。
*   **Worker 变轻**：Worker 变得非常“傻”，只负责 `Read(paths) -> Expand -> Prompt -> Write(output_path)`。这就实现了真正的“模块只关注核心任务”。
*   **灵活性保留**：如果是 Auto/Hybrid 模式，Orchestrator 会动态决定 Input Bindings；如果是 Manual 模式，则是静态规则解析。对 Worker 来说，它收到的永远是确定的文件列表。

## 3. 实施路线图 (Revised)

### Phase 1: 协议与配置 (Contracts)
1.  定义 `AnalysisModuleConfig` (包含 Selector, Prompt, LlmConfig)。
2.  定义 `GenerateReportCommand` (包含 `input_bindings` 物理路径列表, `output_path`, `commit_hash`)。

### Phase 2: Orchestrator Resolution Logic
1.  实现 `ContextResolver` 组件：
    *   支持 Glob 解析 (Manual)。
    *   (后续) 支持 LLM 目录树推理 (Auto/Hybrid)。
2.  在调度循环中，在生成 Command 之前调用 `ContextResolver`。

### Phase 3: 模块改造 (Module Refactor)
1.  **Provider**: 接收 `output_path` (由 Orchestrator 按约定生成，如 `raw/{provider}/{symbol}`) 并写入。
2.  **Generator**:
    *   移除所有选择逻辑。
    *   直接读取 `cmd.input_bindings` 中的文件。
    *   执行 Expander (JSON->Table 等)。
    *   执行 Prompt。
    *   写入 `cmd.output_path`。

## 4. 总结
这个方案完美融合了我们的讨论：
*   **Input/Output Symmetry**: 都在 Command 中明确绑定。
*   **Two-Stage**: 
    *   Stage 1 (Selection) 发生在 **Orchestration Time** (解析 Binding)。
    *   Stage 2 (Analysis) 发生在 **Execution Time** (Worker 运行)。
*   **Module Focus**: 模块不需要知道“去哪找”，只知道“给我这些文件，我给你那个结果”。

## 5. 实施步骤清单 (Checklist)

### Phase 1: 协议与配置定义 (Contracts & Configs)
- [x] **Common Contracts**: 在 `services/common-contracts/src` 创建或更新 `configs.rs`。
    - [x] 定义 `SelectionMode` (Manual, Auto, Hybrid)。
    - [x] 定义 `LlmConfig` (model_id, parameters)。
    - [x] 定义 `ContextSelectorConfig` (mode, rules, prompt, llm_config)。
    - [x] 定义 `AnalysisModuleConfig` (id, selector, analysis_prompt, llm_config, output_type)。
- [x] **Messages**: 更新 `services/common-contracts/src/messages.rs`。
    - [x] `GenerateReportCommand`: 添加 `commit_hash`, `input_bindings: Vec<String>`, `output_path: String`, `llm_config`.
    - [x] `FetchCompanyDataCommand`: 添加 `output_path: Option<String>`.
- [x] **VGCS Types**: 确保 `workflow-context` crate 中的类型足以支持路径操作。(Confirmed: Vgcs struct has methods)

### Phase 2: Orchestrator 改造 (Resolution Logic)
- [x] **Context Resolver**: 在 `workflow-orchestrator-service` 中创建 `context_resolver.rs`。
    - [x] 实现 `resolve_input(selector, vgcs_client, commit_hash) -> Result<Vec<String>>`。
    - [x] 针对 `Manual` 模式：实现 Glob 匹配逻辑 (调用 VGCS `list_dir` 递归查找)。
    - [x] 针对 `Auto/Hybrid` 模式：(暂留接口) 返回 Empty 或 NotImplemented，后续接入 LLM。
- [x] **IO Binder**: 实现 `io_binder.rs`。
    - [x] 实现 `allocate_output_path(task_type, task_id) -> String` 约定生成逻辑。
- [x] **Scheduler**: 更新 `dag_scheduler.rs`。
    - [x] 在 dispatch 任务前，调用 `ContextResolver` 和 `IOBinder`。
    - [x] 将解析结果填入 Command。

### Phase 3: 写入端改造 (Provider Adaptation)
- [x] **Tushare Provider**: 更新 `services/tushare-provider-service/src/generic_worker.rs`。
    - [x] 读取 Command 中的 `output_path` (如果存在)。
    - [x] 使用 `WorkerContext` 写入数据到指定路径 (不再硬编码 `raw/tushare/...`，而是信任 Command)。
    - [x] 提交并返回 New Commit Hash。

### Phase 4: 读取端改造 (Report Generator Adaptation)
- [x] **Worker Refactor**: 重写 `services/report-generator-service/src/worker.rs`。
    - [x] **Remove**: 删除 `fetch_data_and_configs` (旧的 DB 读取逻辑)。
    - [x] **Checkout**: 使用 `vgcs.checkout(cmd.commit_hash)`。
    - [x] **Read Input**: 遍历 `cmd.input_bindings`，使用 `vgcs.read_file` 读取内容。
    - [x] **Expand**: 实现简单 `Expander` (JSON -> Markdown Table)。
    - [x] **Prompt**: 渲染 `cmd.analysis_prompt`。
    - [x] **LLM Call**: 使用 `cmd.llm_config` 初始化 Client 并调用。
    - [x] **Write Output**: 将结果写入 `cmd.output_path`。
    - [x] **Commit**: 提交更改并广播 Event。

### Phase 5: 集成与验证 (Integration)
- [x] **Config Migration**: 更新 `config/analysis-config.json` (或 DB 中的配置)，适配新的 `AnalysisModuleConfig` 结构。
- [ ] **End-to-End Test**: 运行完整流程，验证：
    1. Provider 写文件到 Git。
    2. Orchestrator 解析路径。
    3. Generator 读文件并生成报告。