Forráskód Böngészése

feat: AI 智能筛选新闻系统 + 翻译范围控制 + 多项修复

sansan 2 hónapja
szülő
commit
94e907ab39
47 módosított fájl, 4429 hozzáadás és 537 törlés
  1. 99 56
      README-EN.md
  2. 95 44
      README.md
  3. 0 1
      config/ai_analysis_prompt.txt
  4. 25 0
      config/ai_filter/extract_prompt.txt
  5. 32 0
      config/ai_filter/prompt.txt
  6. 43 0
      config/ai_filter/update_tags_prompt.txt
  7. 33 0
      config/ai_interests.txt
  8. 3 2
      config/ai_translation_prompt.txt
  9. 114 37
      config/config.yaml
  10. 0 0
      config/custom/ai/.gitkeep
  11. 0 0
      config/custom/keyword/.gitkeep
  12. 43 2
      config/timeline.yaml
  13. 8 0
      docker/.env
  14. 1 1
      docker/Dockerfile
  15. 1 1
      docker/Dockerfile.mcp
  16. 2 0
      docker/docker-compose-build.yml
  17. 2 0
      docker/docker-compose.yml
  18. 18 1
      docker/entrypoint.sh
  19. 202 40
      docker/manage.py
  20. 347 38
      docs/assets/script.js
  21. 189 66
      docs/assets/style.css
  22. 125 69
      docs/index.html
  23. 2 1
      pyproject.toml
  24. 1 1
      trendradar/__init__.py
  25. 539 52
      trendradar/__main__.py
  26. 4 0
      trendradar/ai/__init__.py
  27. 113 36
      trendradar/ai/analyzer.py
  28. 9 2
      trendradar/ai/client.py
  29. 586 0
      trendradar/ai/filter.py
  30. 6 5
      trendradar/ai/formatter.py
  31. 21 6
      trendradar/ai/translator.py
  32. 637 3
      trendradar/context.py
  33. 7 4
      trendradar/core/frequency.py
  34. 53 1
      trendradar/core/loader.py
  35. 13 2
      trendradar/core/scheduler.py
  36. 2 3
      trendradar/crawler/rss/fetcher.py
  37. 82 36
      trendradar/notification/dispatcher.py
  38. 18 5
      trendradar/notification/formatters.py
  39. 7 8
      trendradar/notification/senders.py
  40. 66 0
      trendradar/storage/ai_filter_schema.sql
  41. 63 3
      trendradar/storage/base.py
  42. 57 3
      trendradar/storage/local.py
  43. 80 2
      trendradar/storage/manager.py
  44. 108 2
      trendradar/storage/remote.py
  45. 568 0
      trendradar/storage/sqlite_mixin.py
  46. 1 1
      version
  47. 4 3
      version_configs

+ 99 - 56
README-EN.md

@@ -11,7 +11,7 @@ Deploy in <strong>30 seconds</strong> — Say goodbye to endless scrolling, only
 [![GitHub Stars](https://img.shields.io/github/stars/sansan0/TrendRadar?style=flat-square&logo=github&color=yellow)](https://github.com/sansan0/TrendRadar/stargazers)
 [![GitHub Forks](https://img.shields.io/github/forks/sansan0/TrendRadar?style=flat-square&logo=github&color=blue)](https://github.com/sansan0/TrendRadar/network/members)
 [![License](https://img.shields.io/badge/license-GPL--3.0-blue.svg?style=flat-square)](LICENSE)
-[![Version](https://img.shields.io/badge/version-v6.0.0-blue.svg)](https://github.com/sansan0/TrendRadar)
+[![Version](https://img.shields.io/badge/version-v6.5.0-blue.svg)](https://github.com/sansan0/TrendRadar)
 [![MCP](https://img.shields.io/badge/MCP-v4.0.0-green.svg)](https://github.com/sansan0/TrendRadar)
 [![RSS](https://img.shields.io/badge/RSS-Feed_Support-orange.svg?style=flat-square&logo=rss&logoColor=white)](https://github.com/sansan0/TrendRadar)
 [![AI Translation](https://img.shields.io/badge/AI-Multi--Language-purple.svg?style=flat-square)](https://github.com/sansan0/TrendRadar)
@@ -33,6 +33,7 @@ Deploy in <strong>30 seconds</strong> — Say goodbye to endless scrolling, only
 [![Docker](https://img.shields.io/badge/Docker-Deployment-2496ED?style=flat-square&logo=docker&logoColor=white)](https://hub.docker.com/r/wantcat/trendradar)
 [![MCP Support](https://img.shields.io/badge/MCP-AI_Analysis-FF6B6B?style=flat-square&logo=ai&logoColor=white)](https://modelcontextprotocol.io/)
 [![AI Analysis Push](https://img.shields.io/badge/AI-Analysis_Push-FF6B6B?style=flat-square&logo=openai&logoColor=white)](#)
+[![AI Smart Filter](https://img.shields.io/badge/AI-Smart_News_Filter-9B59B6?style=flat-square&logo=openai&logoColor=white)](#)
 
 </div>
 
@@ -191,6 +192,33 @@ This contributes to the sustainable maintenance of the project and the growth of
 - **Tip**: Check [Changelog] to understand specific [Features]
 
 
+### 2026/03/12 - v6.5.0
+
+- **AI Smart News Filtering**: No more manual keyword setup! Describe your interests in everyday language in `ai_interests.txt` (e.g., "I want AI and renewable energy news"), and AI automatically extracts tags, scores every headline, and only pushes what truly matters to you. If AI filtering encounters issues, it auto-falls back to keyword matching — push delivery never stops
+- **Per-Period Filter Strategy & Interests**: Each time period in Timeline can now independently choose its filtering method and what topics to focus on. For example: mornings use a "tech keyword list" for quick filtering, evenings switch to "finance AI interests" for in-depth AI filtering — same system, different content at different times
+- **AI Analysis Independent from Push Mode**: AI analysis scope can differ from push content. For example: push only delivers new items (avoiding repeated notifications), while AI analyzes the full day's news (capturing complete trends). Each time period can also set its own AI analysis mode
+- **AI Filter Token Savings**: Previously analyzed news won't be re-processed; when you edit your interests, AI auto-evaluates the change magnitude — minor tweaks only update affected tags, major changes trigger full reclassification
+- **Multi-File Config & Tag Isolation**: Custom keyword files go in `config/custom/keyword/`, AI interest files go in `config/custom/ai/` — tags from different files are fully isolated and independent
+- **AI Translation Precision Control**: Independently toggle translation for hotlist, RSS, and standalone sections; regions with display turned off are automatically skipped, saving tokens
+- **Remote Storage Batch Upload**: Multiple write operations are batched and submitted to cloud in one go, reducing API call count
+- **Per-Group Display Limit**: New `max_news_per_keyword` controls max items shown per keyword/tag group, preventing a single hot topic from filling the entire push
+- **Time Period Conflict Detection**: Overlapping time periods are automatically detected — system alerts you to fix the config, preventing unexpected behavior
+- Various bug fixes
+
+
+
+### 2026/02/09 - mcp-v4.0.0
+
+- **🔥 Push any AI message to all channels**: Send AI-generated content to Feishu, DingTalk, Telegram, Email and all 9 channels with one call — Markdown auto-adapts to each platform's native format
+- **New format guide tool**: `get_channel_format_guide` tells AI what each channel supports and its limitations, so generated content looks great everywhere
+- **Smart batch splitting**: Long messages auto-split per channel byte limits (Feishu 30KB, DingTalk 20KB, etc.), reads config from config.yaml
+- **Fixed channel detection**: ntfy no longer falsely reported as "configured" due to default server URL
+- **Code reuse**: Batch utilities now imported from trendradar core instead of duplicated
+
+
+<details>
+<summary>👉 Click to expand: <strong>Historical Updates</strong></summary>
+
 ### 2026/02/09 - v6.0.0
 
 > **Breaking Change**: Config file upgrade (config.yaml 2.0.0), old `push_window` and `analysis_window` configs are no longer compatible, please refer to the new config.yaml for migration
@@ -217,19 +245,6 @@ This contributes to the sustainable maintenance of the project and the growth of
   - New `standalone_summaries` JSON field ("Source Snapshot"), all notification channels adapted for rendering
 
 
-### 2026/02/09 - mcp-v4.0.0
-
-- **🔥 Push any AI message to all channels**: Send AI-generated content to Feishu, DingTalk, Telegram, Email and all 9 channels with one call — Markdown auto-adapts to each platform's native format
-- **New format guide tool**: `get_channel_format_guide` tells AI what each channel supports and its limitations, so generated content looks great everywhere
-- **Smart batch splitting**: Long messages auto-split per channel byte limits (Feishu 30KB, DingTalk 20KB, etc.), reads config from config.yaml
-- **Fixed channel detection**: ntfy no longer falsely reported as "configured" due to default server URL
-- **Code reuse**: Batch utilities now imported from trendradar core instead of duplicated
-
-
-<details>
-<summary>👉 Click to expand: <strong>Historical Updates</strong></summary>
-
-
 ### 2026/01/28 - v5.5.0
 
 > Like the MCP feature, I'm not creating a separate repo for this tool either — it's pure frontend, so bundling it together
@@ -904,6 +919,7 @@ Supports RSS/Atom feed crawling, keyword-based grouping and statistics (consiste
 - **Unified Format**: RSS and trending use the same keyword matching and display format
 - **Simple Config**: Add RSS sources directly in `config.yaml`
 - **Merged Push**: Trending and RSS are merged into a single notification
+- **Freshness Filter**: Automatically filters out articles older than a specified number of days to avoid repeated pushes. Supports both global default and per-feed settings
 
 > 💡 RSS uses the same `frequency_words.txt` for keyword filtering as trending
 
@@ -936,7 +952,7 @@ A web-based graphical configuration interface — no need to manually edit YAML
 
 | Feature | Description | Default |
 |---------|-------------|---------|
-| **Scheduling System** | Per-day-of-week scheduling: assign different time periods, push modes, and AI analysis strategies to each day (Mon–Sun). 5 built-in presets (always_on / morning_evening / office_hours / night_owl / custom), or define your own. Supports weekday vs weekend differentiation, cross-midnight periods, and per-period once-only dedup (v6.0.0) | morning_evening |
+| **Scheduling System** | Per-day-of-week scheduling: assign different time periods, push modes, and AI analysis strategies to each day (Mon–Sun). **Each period can independently set its filter method (keyword/AI) and interest focus**, enabling different content at different times. 5 built-in presets (always_on / morning_evening / office_hours / night_owl / custom), or define your own. Supports weekday vs weekend differentiation, cross-midnight periods, per-period once-only dedup, and overlap conflict detection (v6.0.0 + v6.5.0) | morning_evening |
 | **Content Order Configuration** | Use `display.region_order` to adjust display order of all regions (hotlist, new items, RSS, standalone, AI analysis); use `display.regions` to toggle each region on/off (v5.2.0) | See config |
 | **Display Mode Switch** | `keyword`=group by keyword, `platform`=group by platform (v4.6.0 new) | keyword |
 
@@ -953,6 +969,28 @@ Set personal keywords (e.g., AI, BYD, Education Policy) to receive only relevant
 > 💡 You can also skip filtering and receive all trending news (leave frequency_words.txt empty)
 
 
+### **AI Smart News Filtering** (v6.5.0 New)
+
+Describe your interests in natural language and let AI automatically classify news — replacing traditional keyword matching
+
+- **Natural Language Interests**: Write your focus areas in everyday language in `ai_interests.txt`, no keyword syntax to learn
+- **Two-Stage Smart Processing**: AI first extracts structured tags from interest descriptions, then batch-classifies and scores news against those tags
+- **Score Threshold Control**: Fine-tune push quality with `ai_filter.min_score` — only highly relevant news gets delivered
+- **Auto Fallback**: Automatically falls back to keyword matching if AI filtering fails, ensuring uninterrupted push delivery
+- **Smart Tag Updates**: When interests change, AI evaluates the change magnitude to decide incremental or full reclassification
+- **Flexible Switching**: `filter.method` supports `keyword` (default) and `ai` modes, Timeline can override per time period
+- **Per-Period Personalization**: Different time periods can use different keyword files or AI interest descriptions. For example: mornings use a "tech keyword list" for quick filtering, evenings switch to "finance interests" for AI deep filtering
+
+```yaml
+# config.yaml quick enable example
+filter:
+  method: ai          # keyword (default) | ai
+ai_filter:
+  min_score: 6         # Minimum push score threshold (1-10)
+```
+
+> 💡 AI filtering shares model config with AI analysis/translation — just configure `ai.api_key` once
+
 ### **Trending Analysis**
 
 Real-time tracking of news popularity changes helps you understand not just "what's trending" but "how trends evolve."
@@ -973,7 +1011,7 @@ No longer controlled by platform algorithms, TrendRadar reorganizes all trending
 
 ### **Multi-Channel Multi-Account Push**
 
-Supports **WeWork** (+ WeChat push solution), **Feishu**, **DingTalk**, **Telegram**, **Email**, **ntfy**, **Bark**, **Slack** — messages delivered directly to phone and email.
+Supports **WeWork** (+ WeChat push solution), **Feishu**, **DingTalk**, **Telegram**, **Email**, **ntfy**, **Bark**, **Slack**, **Generic Webhook** (connect to Discord, IFTTT, or any platform) — messages delivered directly to phone and email.
 
 > 💡 For detailed configuration, see [Configuration Guide - Multi-Account Push Configuration](#10-multiple-account-push-configuration)
 
@@ -1022,7 +1060,8 @@ ai_translation:
 Use AI models to deeply analyze push content, automatically generate trending insights report
 
 - **Smart Analysis**: Automatically analyze trending topics, keyword popularity, cross-platform correlation, potential impact
-- **Multi Provider**: Supports DeepSeek, OpenAI, Gemini, and OpenAI-compatible APIs
+- **Multi Provider**: Built on LiteLLM unified interface, supports 100+ AI providers (DeepSeek, OpenAI, Gemini, Anthropic, local Ollama, etc.), with automatic fallback model switching
+- **Independent Analysis Mode**: AI analysis scope can differ from push content — push only new items (less noise), while AI analyzes the full day's news (complete trend picture)
 - **Flexible Push**: Choose original content only, AI analysis only, or both
 - **Custom Prompts**: Customize analysis perspective via `config/ai_analysis_prompt.txt`
 
@@ -2558,46 +2597,56 @@ TrendRadar provides two independent Docker images, deploy according to your need
 
 1. **Create Project Directory and Config**:
 
-   **Method 1-A: Using git clone (Recommended, Simplest)**
    ```bash
    # Clone project to local
    git clone https://github.com/sansan0/TrendRadar.git
    cd TrendRadar
    ```
 
-   **Method 1-B: Using wget to download config files**
-   ```bash
-   # Create directory structure
-   mkdir -p trendradar/{config,docker}
-   cd trendradar
-
-   # Download config file templates
-   wget https://raw.githubusercontent.com/sansan0/TrendRadar/master/config/config.yaml -P config/
-   wget https://raw.githubusercontent.com/sansan0/TrendRadar/master/config/frequency_words.txt -P config/
-   wget https://raw.githubusercontent.com/sansan0/TrendRadar/master/config/ai_analysis_prompt.txt -P config/
-
-   # Download docker compose config
-   wget https://raw.githubusercontent.com/sansan0/TrendRadar/master/docker/.env -P docker/
-   wget https://raw.githubusercontent.com/sansan0/TrendRadar/master/docker/docker-compose.yml -P docker/
-   ```
-
    > 💡 **Note**: Key directory structure required for Docker deployment:
 ```
 current directory/
 ├── config/
-│   ├── config.yaml
-│   ├── frequency_words.txt
-│   └── ai_analysis_prompt.txt    # AI analysis prompt (v5.0.0 new, optional)
+│   ├── config.yaml                 # Core config (required)
+│   ├── frequency_words.txt         # Keyword config (required)
+│   ├── timeline.yaml               # Timeline config
+│   ├── ai_analysis_prompt.txt      # AI analysis prompt (optional)
+│   ├── ai_translation_prompt.txt   # AI translation prompt (optional)
+│   ├── ai_interests.txt            # AI interest filtering config (optional)
+│   ├── ai_filter/                  # AI filter prompts
+│   │   ├── prompt.txt
+│   │   ├── extract_prompt.txt
+│   │   └── update_tags_prompt.txt
+│   └── custom/                     # User custom config (optional)
+│       ├── ai/                     # Custom AI prompts
+│       └── keyword/                # Custom keyword files
 └── docker/
-    ├── .env
-    └── docker-compose.yml
+    ├── .env                        # Sensitive info + Docker-specific config
+    └── docker-compose.yml          # Docker Compose orchestration file
 ```
 
 2. **Config File Description**:
-   - `config/config.yaml` - Application main config (report mode, push settings, AI analysis, etc.)
-   - `config/frequency_words.txt` - Keyword config (set your interested trending keywords)
-   - `config/ai_analysis_prompt.txt` - AI prompt config (customize AI analysis role and output format, v5.0.0 new)
-   - `.env` - Environment variable config (webhook URLs, API Keys, scheduled tasks)
+
+   **Configuration Division Principles (v4.6.0 optimized)**:
+
+   | File | Purpose | Change Frequency | Description |
+   |------|---------|-----------------|-------------|
+   | `config/config.yaml` | **Core config** | Low | Report mode, push settings, storage format, push window, AI analysis toggle, platform enable/disable, etc. |
+   | `config/frequency_words.txt` | **Keyword config** | High | Set your interested trending keywords, supports groups, regex, aliases, and advanced syntax |
+   | `config/timeline.yaml` | **Timeline config** | Low | Controls news timeline display and filtering rules |
+   | `config/ai_analysis_prompt.txt` | **AI analysis prompt** | Medium | Customize AI analysis role definition and output format (v5.0.0+) |
+   | `config/ai_translation_prompt.txt` | **AI translation prompt** | Low | Customize AI translation prompt template |
+   | `config/ai_interests.txt` | **AI interest filtering** | Medium | Define rules for AI to auto-filter news based on interests |
+   | `config/ai_filter/` | **AI filter prompts** | Low | Internal prompts for AI filter module (usually no need to modify) |
+   | `config/custom/` | **User custom extensions** | As needed | `custom/ai/` for custom AI prompts, `custom/keyword/` for custom keyword files |
+   | `docker/.env` | **Sensitive info + Docker-specific config** | Low | Webhook URLs, API Keys, S3 credentials, scheduled tasks, **not tracked by git** |
+
+   > 💡 **Division Guidelines**:
+   > - **Feature behavior** → Edit `config.yaml` (e.g., enable/disable platforms, adjust push mode)
+   > - **Content of interest** → Edit `frequency_words.txt` (e.g., add new keywords to follow)
+   > - **AI output style** → Edit `ai_analysis_prompt.txt` or `ai_translation_prompt.txt`
+   > - **Keys & credentials** → Edit `docker/.env` (API Keys, Webhook URLs, and other sensitive info go here)
+   > - **Custom extensions** → Use `config/custom/` directory to avoid default configs being overwritten by upgrades
 
    **⚙️ Environment Variable Override Mechanism (v3.0.5+)**
 
@@ -2607,6 +2656,8 @@ current directory/
    |---------------------|---------------------|---------------|-------------|
    | `ENABLE_WEBSERVER` | - | `true` / `false` | Auto-start web server |
    | `WEBSERVER_PORT` | - | `8080` | Web server port |
+   | `WEBSERVER_WATCHDOG` | - | `true` / `false` | Turn on "auto-recover web page service" (restarts it if it crashes) |
+   | `WEBSERVER_WATCHDOG_INTERVAL` | - | `60` | How often to check and auto-recover (seconds) |
    | `FEISHU_WEBHOOK_URL` | `notification.channels.feishu.webhook_url` | `https://...` | Feishu Webhook (multi-account use `;` separator) |
    | `AI_ANALYSIS_ENABLED` | `ai_analysis.enabled` | `true` / `false` | Enable AI analysis (v5.0.0 new) |
    | `AI_API_KEY` | `ai.api_key` | `sk-xxx...` | AI API Key (shared by ai_analysis and ai_translation) |
@@ -2771,6 +2822,9 @@ docker rm trendradar
 > - Access historical reports via directory navigation (e.g., `http://localhost:8080/2025-xx-xx/`)
 > - Port can be configured in `.env` file with `WEBSERVER_PORT` parameter
 > - Auto-start: Set `ENABLE_WEBSERVER=true` in `.env`
+> - Auto-recover: `WEBSERVER_WATCHDOG=true` (default). It checks every `WEBSERVER_WATCHDOG_INTERVAL` seconds and restarts the web page service if needed
+> - `stop_webserver` means you manually turn off the web page service (command: `docker exec -it trendradar python manage.py stop_webserver`)
+> - "Auto restart" means the system turns that web page service back on automatically. If you stopped it manually and want it back, run `docker exec -it trendradar python manage.py start_webserver`
 > - Security: Static files only, limited to output directory, localhost binding only
 
 #### Data Persistence
@@ -2859,22 +2913,11 @@ flowchart TB
 Use docker compose to start both news push and MCP services:
 
 ```bash
-# Method 1: Clone project (Recommended)
+# Clone project (Recommended)
 git clone https://github.com/sansan0/TrendRadar.git
 cd TrendRadar/docker
 docker compose up -d
 
-# Method 2: Download docker-compose.yml separately
-mkdir trendradar && cd trendradar
-wget https://raw.githubusercontent.com/sansan0/TrendRadar/master/docker/docker-compose.yml
-wget https://raw.githubusercontent.com/sansan0/TrendRadar/master/docker/.env
-mkdir -p config output
-# Download config files
-wget https://raw.githubusercontent.com/sansan0/TrendRadar/master/config/config.yaml -P config/
-wget https://raw.githubusercontent.com/sansan0/TrendRadar/master/config/frequency_words.txt -P config/
-# Modify volume paths in docker-compose.yml: ../config -> ./config, ../output -> ./output
-docker compose up -d
-
 # Check running status
 docker ps | grep trendradar
 ```
@@ -3775,4 +3818,4 @@ GPL-3.0 License
 
 [🔝 Back to Top](#trendradar)
 
-</div>
+</div>

+ 95 - 44
README.md

@@ -12,7 +12,7 @@
 [![GitHub Stars](https://img.shields.io/github/stars/sansan0/TrendRadar?style=flat-square&logo=github&color=yellow)](https://github.com/sansan0/TrendRadar/stargazers)
 [![GitHub Forks](https://img.shields.io/github/forks/sansan0/TrendRadar?style=flat-square&logo=github&color=blue)](https://github.com/sansan0/TrendRadar/network/members)
 [![License](https://img.shields.io/badge/license-GPL--3.0-blue.svg?style=flat-square)](LICENSE)
-[![Version](https://img.shields.io/badge/version-v6.0.0-blue.svg)](https://github.com/sansan0/TrendRadar)
+[![Version](https://img.shields.io/badge/version-v6.5.0-blue.svg)](https://github.com/sansan0/TrendRadar)
 [![MCP](https://img.shields.io/badge/MCP-v4.0.0-green.svg)](https://github.com/sansan0/TrendRadar)
 [![RSS](https://img.shields.io/badge/RSS-订阅源支持-orange.svg?style=flat-square&logo=rss&logoColor=white)](https://github.com/sansan0/TrendRadar)
 [![AI翻译](https://img.shields.io/badge/AI-多语言推送-purple.svg?style=flat-square)](https://github.com/sansan0/TrendRadar)
@@ -34,6 +34,7 @@
 [![Docker](https://img.shields.io/badge/Docker-部署-2496ED?style=flat-square&logo=docker&logoColor=white)](https://hub.docker.com/r/wantcat/trendradar)
 [![MCP Support](https://img.shields.io/badge/MCP-AI分析支持-FF6B6B?style=flat-square&logo=ai&logoColor=white)](https://modelcontextprotocol.io/)
 [![AI分析推送](https://img.shields.io/badge/AI-分析推送-FF6B6B?style=flat-square&logo=openai&logoColor=white)](#)
+[![AI智能筛选](https://img.shields.io/badge/AI-智能筛选新闻-9B59B6?style=flat-square&logo=openai&logoColor=white)](#)
 
 </div>
 
@@ -239,6 +240,32 @@
 - **提示**:建议查看【历史更新】,明确具体的【功能内容】
 
 
+### 2026/03/12 - v6.5.0
+
+- **AI 智能筛选系统**:不用再手动设关键词!在 `ai_interests.txt` 里用日常语言写下你关注的方向(如"我想看 AI 和新能源相关新闻"),AI 会自动提取标签并对每条新闻打分,只推送真正和你相关的内容。万一 AI 筛选出了问题,会自动切回关键词匹配,推送不中断
+- **每个时段支持不同的筛选方式和关注方向**:Timeline 中的每个时间段现在可以独立设置用什么方式筛选、看什么类型的新闻。比如:早上用"科技关键词"快速过滤,晚上换成"金融 AI 兴趣描述"做深度筛选——同一个系统,不同时段看不同内容
+- **AI 分析范围独立于推送**:AI 分析的数据范围可以和推送内容不同。比如推送只发新增消息(避免重复打扰),但 AI 分析当天全部新闻(看完整趋势)。每个时段也能单独设置 AI 分析模式
+- **AI 筛选智能省钱**:已分析过的新闻不会重复消耗 token;兴趣描述修改后,AI 自动判断变化幅度——小改动只更新受影响的标签,大改动才全量重新分类
+- **多文件配置与标签隔离**:自定义关键词文件放 `config/custom/keyword/`,AI 兴趣文件放 `config/custom/ai/`,不同文件产生的标签各自独立、互不干扰
+- **AI 翻译精准控制**:可分别控制热榜、RSS、独立展示区是否翻译,没开启显示的区域自动跳过,不浪费 token
+- **远程存储批量上传**:多次写操作攒在一起统一提交云端,减少 API 调用次数
+- **每组关键词/标签展示数量限制**:通过 `max_news_per_keyword` 控制每个分组最多显示多少条新闻,避免单个热门话题占满整条推送
+- **时段冲突智能检测**:两个时间段如果有时间重叠,系统会自动报错提醒修改,避免配置冲突导致意外行为
+- 修复若干bug
+
+
+### 2026/02/09 - mcp-v4.0.0
+
+- **🔥 AI 消息直推所有渠道**:让 AI 写好的内容一键推送到飞书、钉钉、Telegram、邮件等 9 个渠道,Markdown 自动适配各平台格式,不用操心格式差异
+- **新增格式化策略指南**:新增 `get_channel_format_guide` 工具,告诉 AI 每个渠道支持什么格式、有什么限制,生成的内容排版更好看
+- **智能分批发送**:超长消息自动按各渠道字节限制拆分(飞书 30KB、钉钉 20KB 等),配置读取自 config.yaml
+- **修复渠道误检测**:ntfy 不再因为默认地址被误报为"已配置"
+- **代码复用优化**:批次处理函数直接复用 trendradar 核心模块,不重复造轮子
+
+
+<details>
+<summary>👉 点击展开:<strong>历史更新</strong></summary>
+
 ### 2026/02/09 - v6.0.0
 
 > **Breaking Change**:配置文件升级(config.yaml 2.0.0),旧版 `push_window` 和 `analysis_window` 配置不再兼容,请参考新版 config.yaml 迁移
@@ -265,19 +292,6 @@
   - 新增 `standalone_summaries` JSON 字段(独立源点速览),所有推送渠道均已适配渲染
 
 
-### 2026/02/09 - mcp-v4.0.0
-
-- **🔥 AI 消息直推所有渠道**:让 AI 写好的内容一键推送到飞书、钉钉、Telegram、邮件等 9 个渠道,Markdown 自动适配各平台格式,不用操心格式差异
-- **新增格式化策略指南**:新增 `get_channel_format_guide` 工具,告诉 AI 每个渠道支持什么格式、有什么限制,生成的内容排版更好看
-- **智能分批发送**:超长消息自动按各渠道字节限制拆分(飞书 30KB、钉钉 20KB 等),配置读取自 config.yaml
-- **修复渠道误检测**:ntfy 不再因为默认地址被误报为"已配置"
-- **代码复用优化**:批次处理函数直接复用 trendradar 核心模块,不重复造轮子
-
-
-<details>
-<summary>👉 点击展开:<strong>历史更新</strong></summary>
-
-
 ### 2026/01/28 - v5.5.0
 
 > 和 mcp 功能一样, 这个小工具我也不新开一个仓库维护了, 反正纯前端, 都搁一起吧
@@ -960,6 +974,7 @@ frequency_words.txt 文件增加了一个【必须词】功能,使用 + 号
 - **统一格式**:RSS 与热榜使用相同的关键词匹配和显示格式
 - **简单配置**:直接在 `config.yaml` 中添加 RSS 源
 - **合并推送**:热榜和 RSS 合并为一条消息推送
+- **新鲜度过滤**:自动过滤超过指定天数的旧文章,避免重复推送。支持全局默认天数和单源独立设置
 
 > 💡 RSS 使用与热榜相同的 `frequency_words.txt` 进行关键词过滤
 
@@ -992,7 +1007,7 @@ frequency_words.txt 文件增加了一个【必须词】功能,使用 + 号
 
 | 功能 | 说明 | 默认 |
 |------|------|------|
-| **调度系统** | 按周一到周日逐日编排:为每天分配不同时间段、推送模式和 AI 分析策略。内置 5 种预设(always_on / morning_evening / office_hours / night_owl / custom),也可自定义。支持工作日/周末差异化、跨午夜时段、per-period 去重(v6.0.0) | morning_evening |
+| **调度系统** | 按周一到周日逐日编排:为每天分配不同时间段、推送模式和 AI 分析策略。**每个时段可独立设置筛选方式(关键词/AI)和关注方向**,实现不同时间看不同类型新闻。内置 5 种预设(always_on / morning_evening / office_hours / night_owl / custom),也可自定义。支持工作日/周末差异化、跨午夜时段、per-period 去重、时段冲突检测(v6.0.0 + v6.5.0) | morning_evening |
 | **内容顺序配置** | 通过 `display.region_order` 调整各区域(热榜、新增热点、RSS、独立展示区、AI 分析)的显示顺序;通过 `display.regions` 控制各区域是否显示(v5.2.0) | 见配置文件 |
 | **显示模式切换** | `keyword`=按关键词分组,`platform`=按平台分组(v4.6.0 新增) | keyword |
 
@@ -1008,6 +1023,28 @@ frequency_words.txt 文件增加了一个【必须词】功能,使用 + 号
 >
 > 💡 也可以不做筛选,完整推送所有热点(将 frequency_words.txt 留空)
 
+### **AI 智能筛选新闻**(v6.5.0 新增)
+
+用自然语言描述你的兴趣,AI 自动分类新闻,替代传统关键词匹配
+
+- **自然语言兴趣描述**:在 `ai_interests.txt` 中用日常语言写下关注方向,无需学习关键词语法
+- **两阶段智能处理**:AI 先从兴趣描述提取结构化标签,再对新闻按标签批量分类打分
+- **分数阈值控制**:通过 `ai_filter.min_score` 精确控制推送质量,只推送高相关度新闻
+- **自动回退保障**:AI 筛选失败时自动回退到关键词匹配,确保推送不中断
+- **智能标签更新**:兴趣变更时 AI 自动评估变化幅度,决定增量或全量重分类
+- **灵活切换**:`filter.method` 支持 `keyword`(默认)和 `ai` 两种模式,Timeline 可按时段覆盖
+- **分时段个性化**:不同时间段可以使用不同的关键词文件或 AI 兴趣描述。例如早上用"科技词库"快速过滤,晚上换成"金融兴趣"做 AI 深度筛选
+
+```yaml
+# config.yaml 快速启用示例
+filter:
+  method: ai          # keyword(默认)| ai
+ai_filter:
+  min_score: 6         # 推送最低分数阈值(1-10)
+```
+
+> 💡 AI 筛选与 AI 分析/翻译共享模型配置,只需配置一次 `ai.api_key`
+
 ### **热点趋势分析**
 
 实时追踪新闻热度变化,让你不仅知道"什么在热搜",更了解"热点如何演变"
@@ -1028,7 +1065,7 @@ frequency_words.txt 文件增加了一个【必须词】功能,使用 + 号
 
 ### **多渠道多账号推送**
 
-支持**企业微信**(+ 微信推送方案)、**飞书**、**钉钉**、**Telegram**、**邮件**、**ntfy**、**Bark**、**Slack**,消息直达手机和邮箱
+支持**企业微信**(+ 微信推送方案)、**飞书**、**钉钉**、**Telegram**、**邮件**、**ntfy**、**Bark**、**Slack**、**通用 Webhook**(可对接 Discord、IFTTT 等任意平台),消息直达手机和邮箱
 
 > 💡 详细配置教程见 [推送到多个群/设备](#10-推送到多个群设备)
 
@@ -1077,7 +1114,8 @@ ai_translation:
 使用 AI 大模型对推送内容进行深度分析,自动生成热点洞察报告
 
 - **智能分析**:自动分析热点趋势、关键词热度、跨平台关联、潜在影响
-- **多提供商**:支持 DeepSeek、OpenAI、Gemini 及 OpenAI 兼容接口
+- **多提供商**:基于 LiteLLM 统一接口,支持 100+ AI 提供商(DeepSeek、OpenAI、Gemini、Anthropic、本地 Ollama 等),还支持备用模型自动切换
+- **分析模式独立**:AI 的分析范围可以和推送不同——推送只发新增消息(避免打扰),但 AI 可以分析当天全部新闻(看完整趋势)
 - **灵活推送**:可选仅原始内容、仅 AI 分析、或两者都推送
 - **自定义提示词**:通过 `config/ai_analysis_prompt.txt` 自定义分析角度
 
@@ -2607,48 +2645,56 @@ TrendRadar 提供两个独立的 Docker 镜像,可根据需求选择部署:
 
 1. **创建项目目录和配置**:
 
-   **方式 1-A:使用 git clone(推荐,最简单)**
    ```bash
    # 克隆项目到本地
    git clone https://github.com/sansan0/TrendRadar.git
    cd TrendRadar
    ```
 
-   **方式 1-B:使用 wget 下载配置文件**
-   ```bash
-   # 创建目录结构
-   mkdir -p trendradar/{config,docker}
-   cd trendradar
-
-   # 下载配置文件模板
-   wget https://raw.githubusercontent.com/sansan0/TrendRadar/master/config/config.yaml -P config/
-   wget https://raw.githubusercontent.com/sansan0/TrendRadar/master/config/frequency_words.txt -P config/
-   wget https://raw.githubusercontent.com/sansan0/TrendRadar/master/config/ai_analysis_prompt.txt -P config/
-
-   # 下载 docker compose 配置
-   wget https://raw.githubusercontent.com/sansan0/TrendRadar/master/docker/.env  -P docker/
-   wget https://raw.githubusercontent.com/sansan0/TrendRadar/master/docker/docker-compose.yml  -P docker/
-   ```
-
    > 💡 **说明**:Docker 部署需要的关键目录结构如下:
 ```
 当前目录/
 ├── config/
-│   ├── config.yaml
-│   ├── frequency_words.txt
-│   └── ai_analysis_prompt.txt    # AI 分析提示词(v5.0.0 新增,可选)
+│   ├── config.yaml                 # 核心功能配置(必需)
+│   ├── frequency_words.txt         # 关键词配置(必需)
+│   ├── timeline.yaml               # 时间线配置
+│   ├── ai_analysis_prompt.txt      # AI 分析提示词(可选)
+│   ├── ai_translation_prompt.txt   # AI 翻译提示词(可选)
+│   ├── ai_interests.txt            # AI 兴趣过滤配置(可选)
+│   ├── ai_filter/                  # AI 过滤相关提示词
+│   │   ├── prompt.txt
+│   │   ├── extract_prompt.txt
+│   │   └── update_tags_prompt.txt
+│   └── custom/                     # 用户自定义配置(可选)
+│       ├── ai/                     # 自定义 AI 提示词
+│       └── keyword/                # 自定义关键词文件
 └── docker/
-    ├── .env
-    └── docker-compose.yml
+    ├── .env                        # 敏感信息 + Docker 特有配置
+    └── docker-compose.yml          # Docker Compose 编排文件
 ```
 
 2. **配置文件说明**:
 
    **配置分工原则(v4.6.0 优化)**:
-   - `config/config.yaml` - **功能配置**(报告模式、推送设置、存储格式、推送窗口、AI 分析等)
-   - `config/frequency_words.txt` - **关键词配置**(设置你关心的热点词汇)
-   - `config/ai_analysis_prompt.txt` - **AI 提示词配置**(自定义 AI 分析角色和输出格式,v5.0.0 新增)
-   - `docker/.env` - **敏感信息 + Docker 特有配置**(webhook URLs、API Key、S3 密钥、定时任务)
+
+   | 文件 | 用途 | 修改频率 | 说明 |
+   |------|------|---------|------|
+   | `config/config.yaml` | **核心功能配置** | 低 | 报告模式、推送设置、存储格式、推送窗口、AI 分析开关、平台启用等全局行为控制 |
+   | `config/frequency_words.txt` | **关键词配置** | 高 | 设置你关心的热点词汇,支持分组、正则、别名等高级语法 |
+   | `config/timeline.yaml` | **时间线配置** | 低 | 控制新闻时间线的展示和过滤规则 |
+   | `config/ai_analysis_prompt.txt` | **AI 分析提示词** | 中 | 自定义 AI 分析的角色定义和输出格式(v5.0.0+) |
+   | `config/ai_translation_prompt.txt` | **AI 翻译提示词** | 低 | 自定义 AI 翻译的提示词模板 |
+   | `config/ai_interests.txt` | **AI 兴趣过滤** | 中 | 定义 AI 基于兴趣自动过滤新闻的规则 |
+   | `config/ai_filter/` | **AI 过滤提示词** | 低 | AI 过滤模块的内部提示词(一般无需修改) |
+   | `config/custom/` | **用户自定义扩展** | 按需 | `custom/ai/` 放自定义 AI 提示词,`custom/keyword/` 放自定义关键词文件 |
+   | `docker/.env` | **敏感信息 + Docker 特有配置** | 低 | webhook URLs、API Key、S3 密钥、定时任务等,**不会被 git 追踪** |
+
+   > 💡 **分工要点**:
+   > - **功能行为** → 改 `config.yaml`(如开启/关闭某个平台、调整推送模式)
+   > - **关注内容** → 改 `frequency_words.txt`(如添加新的关注关键词)
+   > - **AI 输出风格** → 改 `ai_analysis_prompt.txt` 或 `ai_translation_prompt.txt`
+   > - **密钥与凭证** → 改 `docker/.env`(API Key、Webhook URL 等敏感信息统一放这里)
+   > - **个性化扩展** → 使用 `config/custom/` 目录,避免直接修改默认配置被升级覆盖
 
    > 💡 **配置修改生效**:修改 `config.yaml` 后,执行 `docker compose up -d` 重启容器即可生效
 
@@ -2660,6 +2706,8 @@ TrendRadar 提供两个独立的 Docker 镜像,可根据需求选择部署:
    |---------|---------|-------|------|
    | `ENABLE_WEBSERVER` | - | `true` / `false` | 是否自动启动 Web 服务器 |
    | `WEBSERVER_PORT` | - | `8080` | Web 服务器端口 |
+   | `WEBSERVER_WATCHDOG` | - | `true` / `false` | 是否开启“网页服务自动恢复”(服务异常时自动重开) |
+   | `WEBSERVER_WATCHDOG_INTERVAL` | - | `60` | 自动恢复检查间隔(秒) |
    | `FEISHU_WEBHOOK_URL` | `notification.channels.feishu.webhook_url` | `https://...` | 飞书 Webhook(多账号用 `;` 分隔) |
    | `AI_ANALYSIS_ENABLED` | `ai_analysis.enabled` | `true` / `false` | 是否启用 AI 分析(v5.0.0 新增) |
    | `AI_API_KEY` | `ai.api_key` | `sk-xxx...` | AI API Key(ai_analysis 和 ai_translation 共享) |
@@ -2824,6 +2872,9 @@ docker rm trendradar
 > - 通过目录导航访问历史报告(如:`http://localhost:8080/2025-xx-xx/`)
 > - 端口可在 `.env` 文件中配置 `WEBSERVER_PORT` 参数
 > - 自动启动:在 `.env` 中设置 `ENABLE_WEBSERVER=true`
+> - 自动恢复:`WEBSERVER_WATCHDOG=true`(默认开启),每隔 `WEBSERVER_WATCHDOG_INTERVAL` 秒检查一次,异常会自动重开网页服务
+> - `stop_webserver` 的意思是“你主动手动关闭网页服务”(命令:`docker exec -it trendradar python manage.py stop_webserver`)
+> - “自动拉起”就是“系统自动把网页服务重新打开”;若你手动关闭后想恢复,请执行 `docker exec -it trendradar python manage.py start_webserver`
 > - 安全提示:仅提供静态文件访问,限制在 output 目录,只绑定本地访问
 
 #### 数据持久化
@@ -3768,4 +3819,4 @@ GPL-3.0 License
 
 [🔝 回到顶部](#trendradar)
 
-</div>
+</div>

+ 0 - 1
config/ai_analysis_prompt.txt

@@ -181,7 +181,6 @@
 ```
 
 要求:
-- 必须返回有效的 JSON,用 ```json 代码块包裹
 - 使用 {language} 输出,语言简练专业
 - 6个板块内容不重叠不冗余
 - 若某板块无明显内容,可简写"暂无显著异常"

+ 25 - 0
config/ai_filter/extract_prompt.txt

@@ -0,0 +1,25 @@
+[system]
+你是一个兴趣标签提取专家。你的任务是从用户的兴趣描述中提取出结构化的新闻分类标签。
+
+提取规则:
+1. 每个标签简洁(2-6个字),同时配一句描述说明该标签涵盖哪些话题和关键词
+2. 标签之间尽量不重叠
+3. 标签数量控制在 5~20 个,优先保留细分标签,只有语义高度重叠时才合并
+4. 描述要具体,包含具体的人名、公司名、产品名等关键词,方便后续分类
+5. 返回顺序必须尽量遵循用户兴趣描述中的先后顺序,越靠前代表优先级越高
+
+[user]
+用户的兴趣描述如下:
+
+{interests_content}
+
+请从中提取出新闻分类标签。
+
+返回严格的 JSON 格式(不要添加任何其他内容):
+```json
+{
+  "tags": [
+    {"tag": "标签名", "description": "该标签涵盖的话题、关键词描述"}
+  ]
+}
+```

+ 32 - 0
config/ai_filter/prompt.txt

@@ -0,0 +1,32 @@
+[system]
+你是一个高效的新闻分类专家。根据给定的标签列表,快速判断每条新闻标题最适合哪个标签。
+
+分类规则:
+1. 每条新闻只归入一个最相关的标签(选相关度最高的那个)
+2. 不匹配任何标签的新闻不要输出(不要返回空 tags)
+3. 给出 0.0-1.0 的相关度分数(1.0=完全相关,0.5=部分相关)
+4. 只根据标题判断,不要过度推测
+5. 严格遵循用户偏好中的额外过滤要求(如有)
+6. 如果两类标签相关度接近,优先选择排序更靠前的标签(前面的标签优先级更高)
+
+[user]
+## 用户偏好
+
+{interests_content}
+
+## 分类标签
+
+{tags_list}
+
+## 新闻列表(共 {news_count} 条)
+
+{news_list}
+
+请对每条新闻进行分类。返回严格的 JSON 数组(不要添加任何其他内容):
+```json
+[
+  {"id": 1, "tag_id": 1, "score": 0.9},
+  {"id": 5, "tag_id": 2, "score": 0.8}
+]
+```
+只返回有匹配的新闻,无匹配的不要包含在结果中。

+ 43 - 0
config/ai_filter/update_tags_prompt.txt

@@ -0,0 +1,43 @@
+[system]
+你是一个标签管理专家。用户修改了兴趣描述后,你需要对比旧标签集和新的兴趣描述,给出标签更新方案。
+
+核心原则:
+1. 语义等价的标签视为同一个标签(如"AI/大模型"和"AI与大模型"是同一个标签),优先保留旧标签名
+2. 只有用户明确不再关注的方向才标记移除
+3. 新增的兴趣方向才需要新增标签
+4. 标签名简洁(2-10个字),描述要具体,包含关键词、人名、公司名、产品名
+5. 标签总数控制在 20 个以内,优先保留细分标签,只有语义高度重叠时再合并
+6. keep 和 add 的输出顺序应尽量遵循用户兴趣描述中的先后顺序(越靠前优先级越高)
+
+change_ratio 评估标准:
+- 0.0 = 兴趣几乎没变(只是措辞调整、补充细节)
+- 0.1~0.3 = 小幅调整(新增或移除了 1-2 个方向)
+- 0.4~0.6 = 中等变化(多个方向有调整)
+- 0.7~1.0 = 大幅改变(兴趣方向基本重写)
+
+[user]
+## 当前标签集
+
+{old_tags_json}
+
+## 新的兴趣描述
+
+{interests_content}
+
+## 任务
+
+对比当前标签集和新的兴趣描述,判断每个旧标签是保留还是移除,以及是否需要新增标签。
+
+返回严格的 JSON 格式(不要添加任何其他内容):
+```json
+{
+  "keep": [
+    {"tag": "旧标签名", "description": "根据新兴趣更新后的描述"}
+  ],
+  "add": [
+    {"tag": "新标签名", "description": "该标签涵盖的话题、关键词描述"}
+  ],
+  "remove": ["要废弃的旧标签名"],
+  "change_ratio": 0.2
+}
+```

+ 33 - 0
config/ai_interests.txt

@@ -0,0 +1,33 @@
+# ═══════════════════════════════════════════════════════════════
+#                    TrendRadar AI 兴趣描述文件
+#                         Version: 1.1.0
+# ═══════════════════════════════════════════════════════════════
+# 用自然语言描述你关注的话题,AI 会自动提取标签并对新闻进行分类
+# 修改此文件后,下次运行时自动生效(旧分类会被标记废弃,重新分类)
+
+
+下面是我要关注的内容:
+# 重要性排序说明:从上到下优先级递减,越靠前越重要。
+# 如果一条新闻同时可能匹配多个方向,请优先归入更靠前的方向。
+
+1. 中国科技与互联网公司:重点关注 DeepSeek、华为、腾讯、字节跳动、京东及相关核心人物和业务线(含鸿蒙、海思、昇腾、抖音、微信等)的战略、组织调整、产品节奏、资本动作与监管影响。
+2. 大模型与 AI 产品:关注 OpenAI、Claude、ChatGPT、Sora、DALL-E、Qwen、MiniMax、GLM 等模型和产品的能力演进、开源闭源策略与生态竞争。
+3. AI 基础设施与云算力:关注英伟达、AMD、华为算力体系、CUDA、Azure、Google Cloud 相关的算力供给、推理成本、训练效率与供应链变化。
+4. 芯片与半导体制造:关注芯片、半导体、光刻机、先进封装、国产替代、关键材料设备与供应安全。
+5. 智能汽车与自动驾驶:关注比亚迪、特斯拉、FSD、无人驾驶、智驾、刀片电池、云辇等技术路线,以及销量、定价与出海变化。
+6. 机器人与具身智能:关注宇树、智元、众擎、大疆在机器人、机械狗、四足、人形、具身智能方向的产品发布、量产和场景落地。
+7. 全球科技巨头:关注苹果、微软、谷歌、Anthropic、OpenAI 的财报、发布会、产品路线、合作与竞争格局。
+8. 地缘政治与国际关系(独立于金融):重点关注中美欧日印及俄罗斯相关的关税、制裁、外交、冲突、产业脱钩和关键供应链博弈。
+9. 金融市场与宏观政策:关注美联储利率路径、汇率波动、通胀、就业、股债商品表现及全球流动性变化。
+10. 能源与电力系统:关注光伏、太阳能、水电(含雅鲁藏布江项目)、核能和新型电力系统建设。
+11. 航天与深空探索:关注 SpaceX、登月、火星、飞船、卫星、空间站、商业航天的技术节点与产业化进展。
+12. 前沿科学技术:关注量子、脑机接口、基因工程等前沿方向的重要科研突破与产业应用。
+13. 文化 IP 与内容产业:关注黑神话悟空、影之刃零、三体、流浪地球、申奥相关内容,以及游戏工业化和文化出海。
+14. 零售与消费品牌:关注胖东来等零售标杆在组织效率、供应链管理、门店运营和消费趋势方面的信号。
+15. 国家与区域观察:关注中国、美国、加拿大、日本、韩国、朝鲜、英国、法国、印度、俄罗斯相关的政策、科技、产业与社会议题(作为背景参考,不高于上述核心方向)。
+
+
+# 标题质量要求(即使匹配了上面的标签,符合以下特征的标题也请跳过)
+# 可自由增删改,按你的偏好来
+- 不要标题党/震惊体(如"震惊!"、"太可怕了!"、"竟然..."、"刚刚!")
+- 不要营销软文、广告推广类标题

+ 3 - 2
config/ai_translation_prompt.txt

@@ -1,6 +1,6 @@
 # ═══════════════════════════════════════════════════════════════
 #                    TrendRadar AI 翻译提示词配置
-#                      Version: 1.1.0
+#                      Version: 1.2.0
 # ═══════════════════════════════════════════════════════════════
 #
 # 此文件定义 AI 翻译内容时使用的提示词模板
@@ -19,7 +19,8 @@
 2. 保持新闻标题的吸引力,但不要做标题党。
 3. 专有名词(人名、地名、机构名)若有通用译名请使用通用译名,否则保留原文或在括号内备注。
 4. 输出格式必须严格遵循要求,不要输出任何多余的解释性文字。
-5. 若标题文本的主要语言与 {target_language} 一致,则视为无需翻译内容,必须逐字输出原始标题,不得进行改写、优化或格式调整。
+5. ⚠️重点:输入可能包含混合语言列表。请务必逐行检查每一条内容。如果某条内容不是 {target_language},**必须**将其翻译为 {target_language}。严禁保留非 {target_language} 的原文(除非是纯专有名词)。即使列表中 99% 已经是目标语言,也绝对不能忽略剩下的 1%。
+6. 格式严格限制:输出结果中**只允许包含目标语言**的文本。绝对禁止“原文 + 译文”的形式。如果进行了翻译,直接用译文替换原文,不要在后面括号备注原文,也不要保留原文。
 
 [user]
 请将以下内容翻译成 {target_language}:

+ 114 - 37
config/config.yaml

@@ -1,6 +1,6 @@
 # ═══════════════════════════════════════════════════════════════
 #                    TrendRadar 配置文件
-#                      Version: 2.0.0
+#                      Version: 2.2.0
 # ═══════════════════════════════════════════════════════════════
 
 
@@ -104,7 +104,7 @@ rss:
   freshness_filter:
     enabled: true                     # 是否启用新鲜度过滤(默认启用)
 
-    max_age_days: 3                   # 最大文章年龄(天)
+    max_age_days: 1                   # 最大文章年龄(天)
                                       # - 正整数:只推送 N 天内的文章
                                       # - 0:禁用过滤,推送所有文章
 
@@ -116,17 +116,16 @@ rss:
     - id: "hacker-news"
       name: "Hacker News"
       url: "https://hnrss.org/frontpage"
-      # max_age_days: 1               # 示例:只推送1天内的文章
 
     - id: "ruanyifeng"
       name: "阮一峰的网络日志"
       url: "http://www.ruanyifeng.com/blog/atom.xml"
-      # max_age_days: 7               # 示例:推送7天内的文章(更新较慢的博客)
-    
+      enabled: false                  # 禁用
+      # max_age_days: 3               # 示例:推送 3 天内的文章(更新较慢的博客)
+     
     - id: "yahoo-finance"
       name: "雅虎财经"
       url: "https://finance.yahoo.com/news/rssindex"
-      enabled: false                  # 禁用
 
     # 自定义源示例
     # - id: "custom-feed"
@@ -139,38 +138,99 @@ rss:
 # ===============================================================
 # 4. 报告模式
 #
-# 🔸 daily(当日汇总模式)
-#    • 推送时机:按时推送(默认每小时推送一次)
-#    • 显示内容:当日所有匹配新闻 + 新增新闻区域
-#    • 适用场景:日报总结、全面了解当日热点趋势
+# 新手 5 行:
+# 1) 先选 mode:daily(当日汇总) / current(当前榜单) / incremental(仅新增)
+# 2) 再选 display_mode:keyword(按词/标签) / platform(按平台)
+# 3) 如果你开了 schedule,这里的 mode 只是默认值,会被 timeline 时段覆盖
+# 4) sort_by_position_first 只影响 keyword 模式排序
+# 5) rank_threshold 和 max_news_per_keyword 只影响展示,不影响抓取
 #
-# 🔸 current(当前榜单模式)
-#    • 推送时机:按时推送(默认每小时推送一次)
-#    • 显示内容:当前榜单匹配新闻 + 新增新闻区域
-#    • 适用场景:实时热点追踪、了解当前最火的内容
-#
-# 🔸 incremental(增量监控模式)
-#    • 推送时机:有新增才推送
-#    • 显示内容:新出现的匹配频率词新闻
-#    • 适用场景:避免重复信息干扰
+# 进阶说明:
+# - daily:信息最全,但重复最多
+# - current:适合盯当前热度
+# - incremental:最少打扰,只看新增
 # ===============================================================
 report:
-  mode: "current"                     # 可选: daily | current | incremental
-                                      # ⚠️ 开启调度系统后,此值会被当前时间段的 report_mode 覆盖
-
+  mode: "current"                     # daily | current | incremental(schedule 开启时作为默认值)
 
   display_mode: "keyword"             # 分组维度: keyword | platform
                                       # keyword: 按关键词分组显示(默认)
                                       # platform: 按平台/来源分组显示
 
-  # 关键词组排序方式(仅 display_mode: keyword 时生效)
-  # true: 按 frequency_words.txt 的定义顺序排列
+  # 关键词模式分组排序方式(仅 keyword 模式生效)
+  # true: 按 frequency_words.txt 的定义顺序排列
   # false: 按匹配到的热点条数排序(条数多的在前)
   sort_by_position_first: false
 
-  rank_threshold: 5                   # 排名高亮阈值
+  rank_threshold: 5                   # 排名高亮阈值(影响展示强调,不改变抓取范围)
 
-  max_news_per_keyword: 0             # 每个关键词最大显示数量(0=不限制)
+  max_news_per_keyword: 0             # 每个关键词/标签最大显示数量(0=不限制,仅影响展示裁剪)
+
+
+# ===============================================================
+# 4.5 筛选策略
+#
+# 新手 5 行:
+# 1) 先选 method:keyword(关键词)或 ai(兴趣分类)
+# 2) keyword 模式:看 config/frequency_words.txt
+# 3) ai 模式:看 config/ai_interests.txt + 下方 ai_filter 配置
+# 4) priority_sort_enabled 只影响 ai 模式标签排序
+# 5) 这里决定“筛选路径”,不决定 AI 模型(模型在 ai 段)
+# ===============================================================
+filter:
+  method: "ai"                     # 可选: keyword | ai
+
+  # AI 模式标签排序开关(仅 ai 模式生效)
+  # true: 按标签优先级排序(来自兴趣描述提取顺序)
+  # false: 按匹配条数排序(条数多的在前)
+  priority_sort_enabled: true
+
+
+# ===============================================================
+# 4.6 AI 智能筛选配置(当 filter.method=ai 时生效)
+#
+# 新手 5 行:
+# 1) 先调 min_score(推荐 0.5~0.7)
+# 2) 再调 reclassify_threshold(大改兴趣建议更低)
+# 3) 批量参数只影响速度/限流,不影响分类逻辑
+# 4) interests_file 不填就用 config/ai_interests.txt
+# 5) prompt_file 系列属于进阶项,默认一般不用改
+#
+# 进阶说明:
+# - min_score 越高,结果越“准”但会漏召回
+# - reclassify_threshold 越低,越倾向全量重分类(更耗 token)
+# - 模型配置统一在下方 ai 段
+# ===============================================================
+ai_filter:
+  batch_size: 200                         # 每批发送给 AI 的标题数(控制单次 API 调用量)
+                                          # 新闻超过此数量时自动分批处理
+  batch_interval: 2                       # 分批处理时,每批之间的等待时间(秒)
+                                          # 避免频繁调用 API 触发限流,设为 0 则不等待
+
+  min_score: 0.7                          # 推送最低分数阈值(0.0 ~ 1.0)
+                                          # 0 = 不过滤;值越高越严格(推荐先用 0.5~0.7)
+
+  # 兴趣描述文件
+  # 默认使用 config/ai_interests.txt,无需在此配置
+  # 这里设置的是“全局默认”,可被 timeline.yaml 时段内的 interests_file 覆盖
+  # 如需使用自定义文件,将文件放入 config/custom/ai/ 目录,然后指定文件名:
+  # interests_file: "finance.txt"    # → 加载 config/custom/ai/finance.txt
+
+  # 全量重分类触发阈值(0~1)
+  # change_ratio >= 此值:全量重分类;否则增量更新
+  # 0.0 最准确最费;1.0 最省但可能陈旧;0.6 是平衡点
+  reclassify_threshold: 0.6
+
+  # 以下提示词模板一般无需修改(不建议动)
+
+  # 分类提示词模板
+  prompt_file: "prompt.txt"
+
+  # 标签提取提示词模板(首次运行时使用)
+  extract_prompt_file: "extract_prompt.txt"
+
+  # 标签更新提示词模板(兴趣变更时 AI 对比新旧标签)
+  update_tags_prompt_file: "update_tags_prompt.txt"
 
 
 # ===============================================================
@@ -191,16 +251,16 @@ display:
   #   1. 在此列表中
   #   2. 下方 regions 中对应开关为 true
   region_order:
-    - new_items                       # 1️⃣ 新增热点区域
-    - hotlist                         # 2️⃣ 热榜区域(关键词匹配)
-    - rss                             # 3️⃣ RSS 订阅区域
-    - standalone                      # 4️⃣ 独立展示区
-    - ai_analysis                     # 5️⃣ AI 分析区域
+    - new_items                           # 1️⃣ 新增热点区域
+    - hotlist                             # 2️⃣ 热榜区域(关键词匹配 / AI 智能筛选
+    - rss                                 # 3️⃣ RSS 订阅区域
+    - standalone                          # 4️⃣ 独立展示区
+    - ai_analysis                         # 5️⃣ AI 分析区域
 
   # 推送区域开关
   # 控制各区域是否启用(配合 region_order 使用)
   regions:
-    hotlist: true                     # 热榜区域(关键词匹配的热点新闻
+    hotlist: true                     # 热榜区域(关键词匹配 / AI 智能筛选
     new_items: false                   # 新增热点区域(含热榜新增 + RSS 新增)
                                       # 注:热点词汇统计中的新增标记🆕不受此配置影响
 
@@ -215,7 +275,7 @@ display:
   # 用途:将指定平台的完整热榜/RSS 数据独立提取,不受关键词过滤影响
   # 两个独立用途:
   #   - 推送展示:由 regions.standalone 开关控制,在推送中单独展示完整热榜
-  #   - AI 分析:由 ai.include_standalone 开关控制,将完整数据送入 AI 做深度分析
+  #   - AI 分析:由 ai_analysis.include_standalone 开关控制,将完整数据送入 AI 做深度分析
   # 两者共享此处的平台/RSS 配置,但开关互相独立(可只开 AI 分析、不推送展示)
   standalone:
     platforms: ["zhihu", "wallstreetcn-hot"]     # 热榜平台 ID 列表(如 ["zhihu", "weibo"])
@@ -242,6 +302,10 @@ display:
 # • 需要配对的配置(如 Telegram 的 token 和 chat_id)数量必须一致
 # • 每个渠道最多支持 max_accounts_per_channel 个账号
 # • 邮箱已支持多收件人(逗号分隔)
+#
+# 新手建议:
+# • 第一次先只配置 1 个渠道(建议 ntfy 或 telegram)验证通路
+# • 跑通后再增加多渠道和多账号,排障成本最低
 # ===============================================================
 notification:
   enabled: true                       # 是否启用通知功能(总开关)
@@ -337,7 +401,7 @@ storage:
 # ===============================================================
 # 8. AI 模型配置(共享)
 #
-# ai_analysis 和 ai_translation 共用此模型配置
+# ai_analysis / ai_translation / ai_filter 共用此模型配置
 # 基于 LiteLLM 统一接口,支持 100+ AI 提供商
 # ===============================================================
 ai:
@@ -445,7 +509,8 @@ ai_analysis:
 
   include_rss: false                # 是否包含 RSS 内容进行分析
   
-  include_standalone: true          # 是否将独立展示区数据纳入 AI 分析(只需上方 display 区的 standalone 配置了平台/RSS 即可)
+  include_standalone: true          # 是否将独立展示区数据纳入 AI 分析
+                                    # 数据源列表来自 display.standalone.platforms / display.standalone.rss_feeds
 
   include_rank_timeline: true       # 是否传递完整排名时间线
                                     # false: 使用简化格式(排名范围+时间范围+出现次数)
@@ -458,11 +523,11 @@ ai_analysis:
 # ===============================================================
 # 10. AI 翻译功能
 #
-# 对推送内容进行多语言翻译,不包含 ai_analysis 分析的内容 
+# 对推送内容进行多语言翻译,不包含 ai_analysis 分析的内容
 # 模型配置见上方 ai 配置段
 # ===============================================================
 ai_translation:
-  enabled: false                    # 是否启用翻译功能
+  enabled: true                    # 是否启用翻译功能
 
   # 翻译目标语言
   # 格式:自然语言描述
@@ -472,6 +537,17 @@ ai_translation:
   # 提示词配置文件路径(相对于 config 目录)
   prompt_file: "ai_translation_prompt.txt"
 
+  # 翻译范围
+  # 控制哪些区域的标题会被翻译
+  # hotlist: 热榜标题 + 新增热点
+  # rss: RSS 统计 + RSS 新增
+  # standalone: 独立展示区(热榜平台 + RSS 源)
+  # 如果 display.regions 关闭了显示,那么这边即使开启了也不会翻译
+  scope:
+    hotlist: false                  # 热榜区域
+    rss: true                      # RSS 区域
+    standalone: true               # 独立展示区
+
 
 # ===============================================================
 # 11. 高级设置(一般无需修改)
@@ -508,6 +584,7 @@ advanced:
   # 多账号限制
   max_accounts_per_channel: 3         # 每个渠道最大账号数量
 
+  # 以下为内部参数(一般无需修改)
   # 消息分批大小(字节)- 内部配置,请勿修改
   batch_size:
     default: 4000

+ 0 - 0
config/custom/ai/.gitkeep


+ 0 - 0
config/custom/keyword/.gitkeep


+ 43 - 2
config/timeline.yaml

@@ -1,6 +1,6 @@
 # ═══════════════════════════════════════════════════════════════
 #                   TrendRadar 时间线配置
-#                      Version: 1.0.0
+#                      Version: 1.2.0
 # ═══════════════════════════════════════════════════════════════
 #
 # 这个文件控制「什么时间做什么事」。
@@ -169,6 +169,9 @@ presets:
       ai_mode: "current"           # AI 分析当前榜单
       push: true                   # 每次推送当前在榜热点
       report_mode: "current"       # 当前在榜的新闻
+      # frequency_file: "xxx.txt"               # 关键词文件(可选,位于 config/custom/keyword/)
+      # interests_file: "xxx.txt"                # AI 兴趣文件(可选,位于 config/custom/ai/)
+      # filter_method: "keyword"                # 筛选策略(可选: keyword | ai,不填用全局 filter.method)
       once:
         analyze: false             # 不限制分析次数
         push: false                # 不限制推送次数
@@ -179,6 +182,9 @@ presets:
         name: "晚间汇总"
         start: "20:00"
         end: "22:00"
+        # frequency_file: "xxx.txt"               # 关键词文件(可选,位于 config/custom/keyword/)
+        # interests_file: "xxx.txt"                # AI 兴趣文件(可选,位于 config/custom/ai/)
+        # filter_method: "keyword"                # 筛选策略(可选: keyword | ai,不填用全局 filter.method)
         analyze: true              # 晚间做 AI 分析
         ai_mode: "daily"           # AI 也汇总全天内容
         report_mode: "daily"       # 切换为当日全部新闻汇总
@@ -242,7 +248,7 @@ presets:
       noon_update:
         name: "午间热点"
         start: "13:00"
-        end: "15:00"              
+        end: "15:00"
         push: true                 # 午间推送当前在榜热点
         report_mode: "current"     # 当前在榜的新闻
         # analyze 继承 default: false → 午间不做 AI 分析,节省 API
@@ -411,6 +417,29 @@ custom:
                                    #   daily       → 当日所有新闻的汇总
                                    #   current     → 当前在榜的新闻
                                    #   incremental → 只推送新增内容
+
+                                   
+    # frequency_file: "general.txt"
+                                   # 关键词文件(可选,位于 config/custom/keyword/)
+                                   # 不填则使用默认的 config/frequency_words.txt
+                                   # 时间段(period)中也可以设置此字段来覆盖默认值
+                                   # 例如晚间汇总用科技词库:
+                                   #   frequency_file: "tech.txt"
+                                   # 注意:仅在 filter_method 为 keyword 时生效
+                                   
+    # interests_file: "finance.txt"
+                                   # AI 兴趣描述文件(可选,位于 config/custom/ai/)
+                                   # 不填则使用默认的 config/ai_interests.txt
+                                   # 时间段(period)中也可以设置此字段来覆盖默认值
+                                   # 例如晚间汇总用金融兴趣:
+                                   #   interests_file: "finance.txt"
+                                   # 注意:仅在 filter_method 为 ai 时生效
+
+    # filter_method: "keyword"     # 筛选策略(可选: keyword | ai)
+                                   # 不填则使用全局 config.yaml 的 filter.method
+                                   # 时间段(period)中也可以设置此字段来覆盖
+                                   # 例如晚间汇总用 AI 筛选:
+                                   #   filter_method: "ai"
     once:
       analyze: true                # 该时间段内只分析一次(省 API)
       push: true                   # 该时间段内只推送一次(省打扰)
@@ -434,6 +463,9 @@ custom:
       name: "深夜静默"
       start: "23:00"
       end: "06:00"                 # 23:00 → 次日 06:00(跨日时间段)
+      # frequency_file: "xxx.txt"               # 关键词文件(可选,位于 config/custom/keyword/)
+      # interests_file: "xxx.txt"                # AI 兴趣文件(可选,位于 config/custom/ai/)
+      # filter_method: "keyword"                # 筛选策略(可选: keyword | ai,不填用全局 filter.method)
       collect: true                # 夜间继续采集数据
       analyze: true                # 夜间可以跑 AI 分析(反正不推送)
       push: false                  # 深夜不推送,避免打扰
@@ -442,6 +474,9 @@ custom:
       name: "工作日早间"
       start: "08:00"
       end: "10:00"                 # 跨度 2h,留足触发裕量
+      # frequency_file: "xxx.txt"               # 关键词文件(可选,位于 config/custom/keyword/)
+      # interests_file: "xxx.txt"                # AI 兴趣文件(可选,位于 config/custom/ai/)
+      # filter_method: "keyword"                # 筛选策略(可选: keyword | ai,不填用全局 filter.method)
       push: true                   # 早上推送一次
       report_mode: "incremental"   # 只推新增内容
       # once 继承 default(push: true)→ 窗口内只推一次
@@ -450,6 +485,9 @@ custom:
       name: "周末早间"
       start: "10:00"
       end: "12:00"                 # 跨度 2h
+      # frequency_file: "xxx.txt"               # 关键词文件(可选,位于 config/custom/keyword/)
+      # interests_file: "xxx.txt"                # AI 兴趣文件(可选,位于 config/custom/ai/)
+      # filter_method: "keyword"                # 筛选策略(可选: keyword | ai,不填用全局 filter.method)
       push: true
       report_mode: "daily"         # 周末看全天汇总
       # once 继承 default(push: true)→ 窗口内只推一次
@@ -458,6 +496,9 @@ custom:
       name: "晚间汇总"
       start: "19:00"
       end: "21:00"
+      # frequency_file: "xxx.txt"               # 关键词文件(可选,位于 config/custom/keyword/)
+      # interests_file: "xxx.txt"                # AI 兴趣文件(可选,位于 config/custom/ai/)
+      # filter_method: "keyword"                # 筛选策略(可选: keyword | ai,不填用全局 filter.method)
       analyze: true                # 晚间做 AI 分析
       ai_mode: "daily"             # AI 也分析全天内容
       push: true                   # 晚间推送

+ 8 - 0
docker/.env

@@ -11,6 +11,14 @@ ENABLE_WEBSERVER=false
 # 注意:修改后需要重启容器生效
 WEBSERVER_PORT=8080
 
+# 是否开启“网页服务自动恢复”功能 (true/false)
+# true:网页服务挂了会自动重开(推荐)
+# false:不会自动重开,适合你想长期手动关闭网页服务的场景
+WEBSERVER_WATCHDOG=true
+
+# 自动恢复检查间隔(秒),默认每 60 秒检查一次
+WEBSERVER_WATCHDOG_INTERVAL=60
+
 # ============================================
 # 通知渠道配置(多账号用 ; 分隔)
 # ============================================

+ 1 - 1
docker/Dockerfile

@@ -1,4 +1,4 @@
-FROM python:3.10-slim
+FROM python:3.12-slim-bookworm
 
 WORKDIR /app
 

+ 1 - 1
docker/Dockerfile.mcp

@@ -1,4 +1,4 @@
-FROM python:3.10-slim
+FROM python:3.12-slim-bookworm
 
 WORKDIR /app
 

+ 2 - 0
docker/docker-compose-build.yml

@@ -18,6 +18,8 @@ services:
       # Web 服务器
       - ENABLE_WEBSERVER=${ENABLE_WEBSERVER:-false}
       - WEBSERVER_PORT=${WEBSERVER_PORT:-8080}
+      - WEBSERVER_WATCHDOG=${WEBSERVER_WATCHDOG:-true}
+      - WEBSERVER_WATCHDOG_INTERVAL=${WEBSERVER_WATCHDOG_INTERVAL:-60}
       # 通知渠道
       - FEISHU_WEBHOOK_URL=${FEISHU_WEBHOOK_URL:-}
       - TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN:-}

+ 2 - 0
docker/docker-compose.yml

@@ -16,6 +16,8 @@ services:
       # Web 服务器
       - ENABLE_WEBSERVER=${ENABLE_WEBSERVER:-false}
       - WEBSERVER_PORT=${WEBSERVER_PORT:-8080}
+      - WEBSERVER_WATCHDOG=${WEBSERVER_WATCHDOG:-true}
+      - WEBSERVER_WATCHDOG_INTERVAL=${WEBSERVER_WATCHDOG_INTERVAL:-60}
       # 通知渠道
       - FEISHU_WEBHOOK_URL=${FEISHU_WEBHOOK_URL:-}
       - TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN:-}

+ 18 - 1
docker/entrypoint.sh

@@ -37,6 +37,23 @@ case "${RUN_MODE:-cron}" in
     if [ "${ENABLE_WEBSERVER:-false}" = "true" ]; then
         echo "🌐 启动 Web 服务器..."
         /usr/local/bin/python manage.py start_webserver
+
+        WEBSERVER_WATCHDOG_ENABLED=$(echo "${WEBSERVER_WATCHDOG:-true}" | tr '[:upper:]' '[:lower:]')
+        WEBSERVER_WATCHDOG_INTERVAL=${WEBSERVER_WATCHDOG_INTERVAL:-60}
+        if [ "$WEBSERVER_WATCHDOG_ENABLED" = "true" ] || [ "$WEBSERVER_WATCHDOG_ENABLED" = "1" ] || [ "$WEBSERVER_WATCHDOG_ENABLED" = "yes" ] || [ "$WEBSERVER_WATCHDOG_ENABLED" = "on" ]; then
+            # 启动后台 watchdog 定期检查 Web 服务器健康状态
+            echo "🔄 启动 Web 服务器 watchdog (间隔: ${WEBSERVER_WATCHDOG_INTERVAL}s)..."
+            (
+                while true; do
+                    sleep "$WEBSERVER_WATCHDOG_INTERVAL"
+                    /usr/local/bin/python manage.py webserver_autofix
+                done
+            ) &
+            WEBSERVER_WATCHDOG_PID=$!
+            echo "  ✅ watchdog 已启动 (PID: $WEBSERVER_WATCHDOG_PID)"
+        else
+            echo "⏸️ Web 服务器 watchdog 已禁用"
+        fi
     fi
 
     echo "⏰ 启动supercronic: ${CRON_SCHEDULE:-*/30 * * * *}"
@@ -47,4 +64,4 @@ case "${RUN_MODE:-cron}" in
 *)
     exec "$@"
     ;;
-esac
+esac

+ 202 - 40
docker/manage.py

@@ -10,11 +10,29 @@ import subprocess
 import time
 import signal
 from pathlib import Path
+from datetime import datetime
 
 # Web 服务器配置
 WEBSERVER_PORT = int(os.environ.get("WEBSERVER_PORT", "8080"))
 WEBSERVER_DIR = "/app/output"
 WEBSERVER_PID_FILE = "/tmp/webserver.pid"
+WEBSERVER_MANUAL_STOP_FILE = "/tmp/webserver.manual_stop"
+
+
+def _env_bool(name: str, default: bool) -> bool:
+    """读取布尔环境变量,兼容 true/1/yes/on。"""
+    value = os.environ.get(name)
+    if value is None:
+        return default
+    return value.strip().lower() in {"1", "true", "yes", "on"}
+
+
+WEBSERVER_AUTOFIX_LOG_HEALTHY = _env_bool("WEBSERVER_AUTOFIX_LOG_HEALTHY", False)
+
+
+def get_timestamp():
+    """获取当前时间戳字符串"""
+    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")
 
 
 def run_command(cmd, shell=True, capture_output=True):
@@ -447,31 +465,150 @@ def restart_supercronic():
         print("  💡 建议重启容器: docker restart trendradar")
 
 
-def start_webserver():
+def _read_proc_cmdline(pid: int) -> str:
+    """读取进程 cmdline,失败时返回空字符串。"""
+    proc_cmdline = Path(f"/proc/{pid}/cmdline")
+    if not proc_cmdline.exists():
+        return ""
+    try:
+        with open(proc_cmdline, "rb") as f:
+            return f.read().replace(b"\x00", b" ").decode("utf-8", errors="ignore").strip()
+    except Exception:
+        return ""
+
+
+def _is_expected_webserver_process(pid: int) -> bool:
+    """检查 pid 是否是当前端口的 http.server 进程。"""
+    cmdline = _read_proc_cmdline(pid)
+    if not cmdline:
+        return False
+    return "http.server" in cmdline and str(WEBSERVER_PORT) in cmdline
+
+
+def _is_manual_stop_requested() -> bool:
+    """是否处于手动停服状态。"""
+    return Path(WEBSERVER_MANUAL_STOP_FILE).exists()
+
+
+def _set_manual_stop_marker():
+    """写入手动停服标记,防止 watchdog 自动拉起。"""
+    try:
+        with open(WEBSERVER_MANUAL_STOP_FILE, "w", encoding="utf-8") as f:
+            f.write(get_timestamp())
+    except Exception:
+        pass
+
+
+def _clear_manual_stop_marker():
+    """清理手动停服标记。"""
+    try:
+        if Path(WEBSERVER_MANUAL_STOP_FILE).exists():
+            os.remove(WEBSERVER_MANUAL_STOP_FILE)
+    except Exception:
+        pass
+
+
+def _terminate_webserver_process(pid: int, require_expected: bool = True) -> bool:
+    """尝试终止 Web 服务器进程。
+
+    require_expected=True 时,仅终止确认是 http.server 的进程,避免误杀。
+    """
+    try:
+        os.kill(pid, 0)
+    except OSError:
+        return True
+
+    if require_expected and not _is_expected_webserver_process(pid):
+        print(f"  ⚠️ PID {pid} 存在但并非 Web 服务器进程,跳过终止")
+        return False
+
+    try:
+        os.kill(pid, signal.SIGTERM)
+        time.sleep(0.5)
+        try:
+            os.kill(pid, 0)
+            os.kill(pid, signal.SIGKILL)
+            print(f"  ⚠️ 强制停止 Web 服务器 (PID: {pid})")
+        except OSError:
+            print(f"  ✅ Web 服务器已停止 (PID: {pid})")
+        return True
+    except OSError:
+        return True
+
+
+def _is_webserver_running(pid: int) -> bool:
+    """检查 Web 服务器进程是否真正在运行。"""
+    try:
+        os.kill(pid, 0)
+    except OSError:
+        return False
+
+    if not _is_expected_webserver_process(pid):
+        return False
+
+    try:
+        import urllib.request
+        req = urllib.request.Request(f"http://127.0.0.1:{WEBSERVER_PORT}/", method="HEAD")
+        urllib.request.urlopen(req, timeout=3)
+        return True
+    except Exception:
+        try:
+            time.sleep(1)
+            import urllib.request
+            req = urllib.request.Request(f"http://127.0.0.1:{WEBSERVER_PORT}/", method="HEAD")
+            urllib.request.urlopen(req, timeout=3)
+            return True
+        except Exception:
+            return False
+
+
+def _cleanup_stale_pid():
+    """清理失效的 PID 文件"""
+    if not Path(WEBSERVER_PID_FILE).exists():
+        return False
+
+    try:
+        with open(WEBSERVER_PID_FILE, 'r') as f:
+            old_pid = int(f.read().strip())
+        os.remove(WEBSERVER_PID_FILE)
+        print(f"  🧹 清理失效 PID 文件 (PID: {old_pid})")
+        return True
+    except Exception:
+        return False
+
+
+def start_webserver(force: bool = False):
     """启动 Web 服务器托管 output 目录"""
     print(f"🌐 启动 Web 服务器 (端口: {WEBSERVER_PORT})...")
     print(f"  🔒 安全提示:仅提供静态文件访问,限制在 {WEBSERVER_DIR} 目录")
 
+    if force:
+        _clear_manual_stop_marker()
+    elif _is_manual_stop_requested():
+        print("  ℹ️ 检测到手动停服标记,跳过自动启动")
+        return
+
     # 检查是否已经运行
     if Path(WEBSERVER_PID_FILE).exists():
         try:
             with open(WEBSERVER_PID_FILE, 'r') as f:
                 old_pid = int(f.read().strip())
-            try:
-                os.kill(old_pid, 0)  # 检查进程是否存在
+
+            # 使用增强的进程检查
+            if _is_webserver_running(old_pid):
                 print(f"  ⚠️ Web 服务器已在运行 (PID: {old_pid})")
                 print(f"  💡 访问: http://localhost:{WEBSERVER_PORT}")
                 print("  💡 停止服务: python manage.py stop_webserver")
                 return
-            except OSError:
-                # 进程不存在,删除旧的 PID 文件
-                os.remove(WEBSERVER_PID_FILE)
+
+            # 进程异常时优先尝试终止旧进程,避免端口占用导致重启失败
+            _terminate_webserver_process(old_pid, require_expected=True)
+            _cleanup_stale_pid()
+            print(f"  ℹ️ 检测到失效的 PID 文件,已清理")
+
         except Exception as e:
             print(f"  ⚠️ 清理旧的 PID 文件: {e}")
-            try:
-                os.remove(WEBSERVER_PID_FILE)
-            except:
-                pass
+            _cleanup_stale_pid()
 
     # 检查目录是否存在
     if not Path(WEBSERVER_DIR).exists():
@@ -498,6 +635,7 @@ def start_webserver():
             # 保存 PID
             with open(WEBSERVER_PID_FILE, 'w') as f:
                 f.write(str(process.pid))
+            _clear_manual_stop_marker()
 
             print(f"  ✅ Web 服务器已启动 (PID: {process.pid})")
             print(f"  📁 服务目录: {WEBSERVER_DIR} (只读,仅静态文件)")
@@ -513,36 +651,20 @@ def start_webserver():
 def stop_webserver():
     """停止 Web 服务器"""
     print("🛑 停止 Web 服务器...")
+    _set_manual_stop_marker()
 
     if not Path(WEBSERVER_PID_FILE).exists():
         print("  ℹ️ Web 服务器未运行")
+        print("  ℹ️ 已写入手动停服标记,watchdog 不会自动拉起")
         return
 
     try:
         with open(WEBSERVER_PID_FILE, 'r') as f:
             pid = int(f.read().strip())
-
-        try:
-            # 尝试终止进程
-            os.kill(pid, signal.SIGTERM)
-            time.sleep(0.5)
-
-            # 检查进程是否已终止
-            try:
-                os.kill(pid, 0)
-                # 进程还在,强制杀死
-                os.kill(pid, signal.SIGKILL)
-                print(f"  ⚠️ 强制停止 Web 服务器 (PID: {pid})")
-            except OSError:
-                print(f"  ✅ Web 服务器已停止 (PID: {pid})")
-        except OSError as e:
-            if e.errno == 3:  # No such process
-                print(f"  ℹ️ 进程已不存在 (PID: {pid})")
-            else:
-                raise
-
-        # 删除 PID 文件
-        os.remove(WEBSERVER_PID_FILE)
+        _terminate_webserver_process(pid, require_expected=True)
+        if Path(WEBSERVER_PID_FILE).exists():
+            os.remove(WEBSERVER_PID_FILE)
+        print("  ℹ️ 已写入手动停服标记,watchdog 不会自动拉起")
     except Exception as e:
         print(f"  ❌ 停止失败: {e}")
         # 尝试清理 PID 文件
@@ -558,6 +680,8 @@ def webserver_status():
 
     if not Path(WEBSERVER_PID_FILE).exists():
         print("  ⭕ 未运行")
+        if _is_manual_stop_requested():
+            print("  ℹ️ 当前为手动停服状态,watchdog 不会自动拉起")
         print(f"  💡 启动服务: python manage.py start_webserver")
         return
 
@@ -565,21 +689,58 @@ def webserver_status():
         with open(WEBSERVER_PID_FILE, 'r') as f:
             pid = int(f.read().strip())
 
-        try:
-            os.kill(pid, 0)  # 检查进程是否存在
+        # 使用增强的进程检查
+        if _is_webserver_running(pid):
             print(f"  ✅ 运行中 (PID: {pid})")
             print(f"  📁 服务目录: {WEBSERVER_DIR}")
             print(f"  🌐 访问地址: http://localhost:{WEBSERVER_PORT}")
             print(f"  📄 首页: http://localhost:{WEBSERVER_PORT}/index.html")
             print("  💡 停止服务: python manage.py stop_webserver")
-        except OSError:
-            print(f"  ⭕ 未运行 (PID 文件存在但进程不存在)")
-            os.remove(WEBSERVER_PID_FILE)
+        else:
+            print(f"  ⭕ 未运行 (PID 文件存在但进程不可用)")
+            _cleanup_stale_pid()
             print("  💡 启动服务: python manage.py start_webserver")
     except Exception as e:
         print(f"  ❌ 状态检查失败: {e}")
 
 
+def webserver_autofix():
+    """Web 服务器健康检查和自动修复
+
+    供 watchdog/定时任务调用,检查服务状态并在需要时自动重启。
+    输出日志格式便于外部监控系统解析。
+    """
+    if _is_manual_stop_requested():
+        if WEBSERVER_AUTOFIX_LOG_HEALTHY:
+            print(f"[{get_timestamp()}] ℹ️ 手动停服状态,跳过自动修复")
+        return
+
+    if not Path(WEBSERVER_PID_FILE).exists():
+        print(f"[{get_timestamp()}] ℹ️ Web 服务器未运行,启动中...")
+        start_webserver(force=False)
+        return
+
+    try:
+        with open(WEBSERVER_PID_FILE, 'r') as f:
+            pid = int(f.read().strip())
+
+        # 使用增强检查
+        if not _is_webserver_running(pid):
+            print(f"[{get_timestamp()}] ⚠️ Web 服务器不可用 (PID: {pid}),尝试重启...")
+            _terminate_webserver_process(pid, require_expected=True)
+            _cleanup_stale_pid()
+            start_webserver(force=False)
+            return
+
+        if WEBSERVER_AUTOFIX_LOG_HEALTHY:
+            print(f"[{get_timestamp()}] ✅ Web 服务器健康 (PID: {pid})")
+
+    except Exception as e:
+        print(f"[{get_timestamp()}] ❌ 健康检查异常: {e}")
+        _cleanup_stale_pid()
+        start_webserver(force=False)
+
+
 def show_help():
     """显示帮助信息"""
     help_text = """
@@ -630,7 +791,7 @@ def show_help():
 
   5. Web 服务器管理:
      - 启动: start_webserver
-     - 停止: stop_webserver
+     - 停止: stop_webserver(写入手动停服标记,watchdog 不自动拉起)
      - 状态: webserver_status
      - 访问: http://localhost:8080
 """
@@ -650,9 +811,10 @@ def main():
         "files": show_files,
         "logs": show_logs,
         "restart": restart_supercronic,
-        "start_webserver": start_webserver,
+        "start_webserver": lambda: start_webserver(force=True),
         "stop_webserver": stop_webserver,
         "webserver_status": webserver_status,
+        "webserver_autofix": webserver_autofix,
         "help": show_help,
     }
 
@@ -669,4 +831,4 @@ def main():
 
 
 if __name__ == "__main__":
-    main()
+    main()

+ 347 - 38
docs/assets/script.js

@@ -41,48 +41,58 @@ function syncScroll(textareaId, backdropId) {
 }
 
 // ==========================================
-// 12. 支持项目弹窗逻辑
+// 12. 二维码放大弹窗逻辑
 // ==========================================
 
-/**
- * 打开支持弹窗
- */
-function openSupportModal() {
-    const modal = document.getElementById('support-modal');
-    if (modal) {
-        modal.classList.remove('hidden');
-        document.body.style.overflow = 'hidden'; // 禁止背景滚动
+const QR_MODAL_DATA = {
+    weixin: {
+        icon: '<i class="fa-brands fa-weixin text-green-600"></i>',
+        iconBg: 'bg-green-100',
+        title: '不迷路',
+        subtitle: '第一时间获取更新通知',
+        img: './assets/weixin.webp',
+        alt: '微信公众号',
+        hint: '微信扫码关注公众号'
+    },
+    donate: {
+        icon: '<i class="fa-solid fa-hand-holding-heart text-emerald-600"></i>',
+        iconBg: 'bg-emerald-100',
+        title: '随心赞赏',
+        subtitle: '金额随意,1 元也是鼓励 (´▽`ʃ♡ƪ)',
+        img: 'https://cdn-1258574687.cos.ap-shanghai.myqcloud.com/img/%2F2026%2F01%2F18ecce7c224ce0ea4c59394c29e408f8-e0d1db45.webp',
+        alt: '微信支付',
+        hint: '微信扫码 · 丰俭由人'
     }
-}
+};
 
-/**
- * 关闭支持弹窗
- */
-function closeSupportModal() {
-    const modal = document.getElementById('support-modal');
-    if (modal) {
-        modal.classList.add('hidden');
-        document.body.style.overflow = ''; // 恢复滚动
-    }
+function openQrModal(type) {
+    const data = QR_MODAL_DATA[type];
+    if (!data) return;
+    const modal = document.getElementById('qr-modal');
+    document.getElementById('qr-modal-icon').className = 'w-10 h-10 rounded-xl flex items-center justify-center text-lg ' + data.iconBg;
+    document.getElementById('qr-modal-icon').innerHTML = data.icon;
+    document.getElementById('qr-modal-title').textContent = data.title;
+    document.getElementById('qr-modal-subtitle').textContent = data.subtitle;
+    document.getElementById('qr-modal-img').src = data.img;
+    document.getElementById('qr-modal-img').alt = data.alt;
+    document.getElementById('qr-modal-hint').textContent = data.hint;
+    modal.classList.remove('hidden');
 }
 
-/**
- * 点击外部关闭
- */
-function closeSupportModalOutside(event) {
-    if (event.target.id === 'support-modal') {
-        closeSupportModal();
-    }
+function closeQrModal() {
+    const modal = document.getElementById('qr-modal');
+    if (modal) modal.classList.add('hidden');
 }
 
-window.openSupportModal = openSupportModal;
-window.closeSupportModal = closeSupportModal;
-window.closeSupportModalOutside = closeSupportModalOutside;
+window.openQrModal = openQrModal;
+window.closeQrModal = closeQrModal;
 const MODULE_DEFS = [
     { id: 1, name: "1. 基础设置", key: "app", editable: false },
     { id: 2, name: "2. 数据源 - 热榜平台", key: "platforms", editable: true },
     { id: 3, name: "3. 数据源 - RSS 订阅", key: "rss", editable: true },
     { id: 4, name: "4. 报告模式", key: "report", editable: true },
+    { id: "4.5", name: "4.5 筛选策略", key: "filter", editable: true },
+    { id: "4.6", name: "4.6 AI 智能筛选", key: "ai_filter", editable: true },
     { id: 5, name: "5. 推送内容控制", key: "display", editable: true },
     { id: 6, name: "6. 推送通知", key: "notification", editable: true, partial: true },
     { id: 7, name: "7. 存储配置", key: "storage", editable: false },
@@ -791,8 +801,9 @@ window.scrollToModuleInEditor = function(modKey) {
     const mod = MODULE_DEFS.find(m => m.key === modKey);
     if (!mod) return;
 
-    // 直接匹配包含模块编号的标题行,如:# 5. 推送内容控制
-    const moduleTitlePattern = new RegExp(`^#\\s*${mod.id}\\.\\s+`, 'i');
+    // 直接匹配包含模块编号的标题行,兼容 "4." 和 "4.5" 两种编号格式
+    const escapedId = String(mod.id).replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
+    const moduleTitlePattern = new RegExp(`^#\\s*${escapedId}(?:\\.)?\\s+`, 'i');
 
     for (let i = 0; i < lines.length; i++) {
         const line = lines[i];
@@ -882,6 +893,35 @@ function renderControls(mod) {
             html += createNumberControl(mod.key, "rank_threshold", "排名高亮阈值");
             html += createNumberControl(mod.key, "max_news_per_keyword", "每个关键词最大显示数量");
             break;
+        case "filter":
+            html = createSelectControl(mod.key, "method", "筛选方法", ["keyword", "ai"]);
+            html += createToggleControl(mod.key, "priority_sort_enabled", "AI 模式按标签优先级排序");
+            html += `<div class="text-xs text-gray-500 mt-2 p-2 bg-blue-50 rounded border border-blue-200">
+                        <i class="fa-solid fa-info-circle mr-1 text-blue-500"></i>
+                        <strong>说明:</strong><code>method=keyword</code> 使用 <code>frequency_words.txt</code>;
+                        <code>method=ai</code> 使用 <code>ai_interests.txt</code> + AI 筛选配置。<br>
+                        <code>priority_sort_enabled</code> 仅在 <code>method=ai</code> 时生效。
+                     </div>`;
+            break;
+        case "ai_filter":
+            html = `<div class="text-xs text-gray-500 mb-3 p-2 bg-blue-50 rounded border border-blue-200">
+                        <i class="fa-solid fa-info-circle mr-1 text-blue-500"></i>
+                        仅当 <strong>filter.method=ai</strong> 时生效。
+                    </div>`;
+            html += createNumberControl(mod.key, "batch_size", "每批标题数量");
+            html += createNumberControl(mod.key, "batch_interval", "分批间隔 (秒)");
+            html += createNumberControl(mod.key, "min_score", "最低分数阈值 (0~1)");
+            html += createInputControl(mod.key, "interests_file", "兴趣描述文件 (可选)");
+            html += `<div class="text-xs text-amber-700 mt-1 mb-3 p-2 bg-amber-50 rounded border border-amber-200">
+                        <i class="fa-solid fa-folder-tree mr-1"></i>
+                        留空时使用 <code>config/ai_interests.txt</code>;填写后仅从
+                        <code>config/custom/ai/</code> 查找该文件名。
+                     </div>`;
+            html += createNumberControl(mod.key, "reclassify_threshold", "全量重分类阈值 (0~1)");
+            html += createInputControl(mod.key, "prompt_file", "分类提示词文件");
+            html += createInputControl(mod.key, "extract_prompt_file", "标签提取提示词文件");
+            html += createInputControl(mod.key, "update_tags_prompt_file", "标签更新提示词文件");
+            break;
         case "display":
             html = `<div class="text-xs font-bold text-gray-700 mb-2">推送内容控制 <span class="text-gray-400 font-normal">(可拖拽排序)</span></div>`;
             html += `<div id="display-regions-list" class="space-y-2"></div>`;
@@ -1074,7 +1114,22 @@ function updateYamlFromUI(modKey, path, el) {
         }
     }
 
-    if (targetLine < 0) return;
+    if (targetLine < 0) {
+        // 允许为模块新增一级字段(例如默认被注释掉的 ai_filter.interests_file)
+        if (pathParts.length === 1) {
+            let formattedVal = newVal;
+            if (typeof newVal === 'string') {
+                formattedVal = `"${newVal.replace(/"/g, '\\"')}"`;
+            }
+
+            lines.splice(moduleEndLine, 0, `  ${searchKey}: ${formattedVal}`);
+            editor.value = lines.join('\n');
+            currentYaml = editor.value;
+            updateBackdrop('yaml-editor', 'yaml-backdrop');
+            debounceSaveConfig();
+        }
+        return;
+    }
 
     // 更新该行,保留注释
     const originalLine = lines[targetLine];
@@ -3937,7 +3992,7 @@ function renderPeriodDetails(config, presetName) {
         </div>
         <div class="tl-collapsible-body">
             <div class="text-xs text-gray-500 mb-2">不在任何时间段内时,使用以下配置:</div>
-            ${renderBehaviorToggles(defaults, presetName, 'default')}
+            ${renderBehaviorToggles(defaults, presetName, 'default', defaults)}
         </div>
     </div>`;
 
@@ -3967,7 +4022,7 @@ function renderPeriodDetails(config, presetName) {
                         <button onclick="deleteTlPeriod('${presetName}','${key}')" class="tl-inline-btn text-red-400 hover:text-red-600" title="删除"><i class="fa-regular fa-trash-can"></i></button>
                     </div>
                 </div>
-                ${renderBehaviorToggles(merged, presetName, key)}
+                ${renderBehaviorToggles(merged, presetName, key, p)}
             </div>`;
         });
         html += `</div>`;
@@ -4057,6 +4112,28 @@ function renderPeriodDetails(config, presetName) {
         </div>
     </div>`;
 
+    // custom 专属:时间段冲突策略
+    if (isCustom) {
+        const overlapPolicy = (config.overlap && config.overlap.policy) || 'error_on_overlap';
+        html += `<div class="mt-6">
+            <div class="tl-section-title"><i class="fa-solid fa-code-branch"></i>冲突策略 (Overlap)</div>
+            <div class="bg-white border border-gray-200 rounded-lg px-3 py-3">
+                <div class="flex items-center gap-2">
+                    <span class="text-xs text-gray-500">policy:</span>
+                    <select class="text-xs border border-gray-200 rounded px-2 py-1 bg-white"
+                            onchange="onTlCustomOverlapPolicy(this.value)">
+                        <option value="error_on_overlap" ${overlapPolicy === 'error_on_overlap' ? 'selected' : ''}>error_on_overlap(推荐)</option>
+                        <option value="last_wins" ${overlapPolicy === 'last_wins' ? 'selected' : ''}>last_wins(后定义优先)</option>
+                    </select>
+                </div>
+                <div class="text-[10px] text-gray-400 mt-2">
+                    <i class="fa-solid fa-info-circle mr-1"></i>
+                    <code>error_on_overlap</code> 会在时间段重叠时直接报错;<code>last_wins</code> 会按 day_plans 中靠后的时间段覆盖。
+                </div>
+            </div>
+        </div>`;
+    }
+
     // 提示
     if (!isCustom) {
         html += `<div class="mt-4 text-xs text-gray-400 p-3 bg-gray-50 rounded-lg border border-gray-200">
@@ -4078,7 +4155,7 @@ function renderPeriodDetails(config, presetName) {
  * presetName: 当前预设名(用于定位 YAML 中的位置)
  * periodKey: 'default' 或时间段 key(如 'weekday_morning')
  */
-function renderBehaviorToggles(cfg, presetName, periodKey) {
+function renderBehaviorToggles(cfg, presetName, periodKey, rawCfg = null) {
     const toggleItems = [
         { k: 'collect', label: '采集', icon: 'fa-download' },
         { k: 'analyze', label: '分析', icon: 'fa-brain' },
@@ -4156,6 +4233,44 @@ function renderBehaviorToggles(cfg, presetName, periodKey) {
         </div>`;
     }
 
+    // 可选筛选覆盖(仅显示“当前层”字段,避免把继承值误当作显式配置)
+    const baseCfg = rawCfg || {};
+    const filterMethod = baseCfg.filter_method || '';
+    const frequencyFile = baseCfg.frequency_file || '';
+    const interestsFile = baseCfg.interests_file || '';
+    const methodHint = periodKey === 'default' ? '不填则跟随全局 filter.method' : '不填则继承 default(再回退全局)';
+
+    html += `<div class="mt-3 pt-3 border-t border-gray-100">
+        <div class="text-[10px] uppercase tracking-wider font-bold text-gray-400 mb-2">筛选覆盖(可选)</div>
+        <div class="grid grid-cols-1 md:grid-cols-3 gap-2">
+            <div>
+                <label class="block text-[10px] text-gray-400 mb-1">filter_method</label>
+                <select class="text-[10px] w-full border border-gray-200 rounded px-1.5 py-1 bg-white"
+                        onchange="onTlOptionalSelect('${presetName}','${periodKey}','filter_method',this.value)">
+                    <option value="" ${filterMethod === '' ? 'selected' : ''}>继承</option>
+                    <option value="keyword" ${filterMethod === 'keyword' ? 'selected' : ''}>keyword</option>
+                    <option value="ai" ${filterMethod === 'ai' ? 'selected' : ''}>ai</option>
+                </select>
+            </div>
+            <div>
+                <label class="block text-[10px] text-gray-400 mb-1">frequency_file</label>
+                <input type="text" value="${frequencyFile}" placeholder="如 tech.txt"
+                       class="text-[10px] w-full border border-gray-200 rounded px-1.5 py-1 bg-white"
+                       onchange="onTlOptionalInput('${presetName}','${periodKey}','frequency_file',this.value)">
+            </div>
+            <div>
+                <label class="block text-[10px] text-gray-400 mb-1">interests_file</label>
+                <input type="text" value="${interestsFile}" placeholder="如 geopolitics.txt"
+                       class="text-[10px] w-full border border-gray-200 rounded px-1.5 py-1 bg-white"
+                       onchange="onTlOptionalInput('${presetName}','${periodKey}','interests_file',this.value)">
+            </div>
+        </div>
+        <div class="text-[10px] text-gray-400 mt-2">
+            <i class="fa-solid fa-lightbulb mr-1"></i>${methodHint}。<code>frequency_file</code> 从 <code>config/custom/keyword/</code> 查找,
+            <code>interests_file</code> 从 <code>config/custom/ai/</code> 查找;留空会删除该字段并恢复继承。
+        </div>
+    </div>`;
+
     return html;
 }
 
@@ -4190,6 +4305,27 @@ window.onTlSelect = function(presetName, periodKey, field, value) {
     updateTimelineField(presetName, periodKey, field, value);
 }
 
+window.onTlOptionalInput = function(presetName, periodKey, field, rawValue) {
+    const value = (rawValue || '').trim();
+    if (!value) {
+        removeTimelineField(presetName, periodKey, field);
+        return;
+    }
+    updateTimelineField(presetName, periodKey, field, value);
+}
+
+window.onTlOptionalSelect = function(presetName, periodKey, field, value) {
+    if (!value) {
+        removeTimelineField(presetName, periodKey, field);
+        return;
+    }
+    updateTimelineField(presetName, periodKey, field, value);
+}
+
+window.onTlCustomOverlapPolicy = function(value) {
+    updateTimelineSectionField('custom', 'overlap.policy', value);
+}
+
 /**
  * 周映射下拉变更 → 更新 timeline YAML 中的 week_map.N
  */
@@ -4366,6 +4502,168 @@ function updateTimelineField(presetName, periodKey, field, value) {
     window._tlRenderTimer = setTimeout(() => syncTimelineToUI(), 300);
 }
 
+function resolveTimelineSection(lines, presetName) {
+    const isCustom = presetName === 'custom';
+    let sectionStart = -1;
+    let sectionIndent = 0;
+
+    if (isCustom) {
+        for (let i = 0; i < lines.length; i++) {
+            if (/^custom:\s*/.test(lines[i])) {
+                sectionStart = i;
+                sectionIndent = 0;
+                break;
+            }
+        }
+    } else {
+        let inPresets = false;
+        for (let i = 0; i < lines.length; i++) {
+            const line = lines[i];
+            if (/^presets:\s*/.test(line)) {
+                inPresets = true;
+                continue;
+            }
+            if (inPresets && /^\S/.test(line) && !line.startsWith('#')) {
+                break;
+            }
+            if (inPresets) {
+                const m = line.match(/^(\s+)(\S+):\s*/);
+                if (m && m[2] === presetName) {
+                    sectionStart = i;
+                    sectionIndent = m[1].length;
+                    break;
+                }
+            }
+        }
+    }
+
+    if (sectionStart < 0) return null;
+
+    let sectionEnd = lines.length;
+    for (let i = sectionStart + 1; i < lines.length; i++) {
+        const line = lines[i];
+        if (line.trim() === '' || line.trim().startsWith('#')) continue;
+        const indent = line.search(/\S/);
+        if (indent <= sectionIndent) {
+            sectionEnd = i;
+            break;
+        }
+    }
+
+    return { sectionStart, sectionEnd, sectionIndent };
+}
+
+function resolveTimelineTarget(lines, presetName, periodKey) {
+    const section = resolveTimelineSection(lines, presetName);
+    if (!section) return null;
+
+    const { sectionStart, sectionEnd, sectionIndent } = section;
+    let targetStart = -1;
+
+    if (periodKey === 'default') {
+        targetStart = findChildKey(lines, sectionStart, sectionEnd, sectionIndent, 'default');
+    } else {
+        const periodsLine = findChildKey(lines, sectionStart, sectionEnd, sectionIndent, 'periods');
+        if (periodsLine < 0) return null;
+        const periodsIndent = lines[periodsLine].search(/\S/);
+        const periodsEnd = findBlockEnd(lines, periodsLine, periodsIndent, sectionEnd);
+        targetStart = findChildKey(lines, periodsLine, periodsEnd, periodsIndent, periodKey);
+    }
+
+    if (targetStart < 0) return null;
+
+    const targetIndent = lines[targetStart].search(/\S/);
+    const targetEnd = findBlockEnd(lines, targetStart, targetIndent, sectionEnd);
+
+    return { sectionStart, sectionEnd, sectionIndent, targetStart, targetEnd, targetIndent };
+}
+
+function applyTimelineEditorChanges(editor, lines) {
+    editor.value = lines.join('\n');
+    currentTimeline = editor.value;
+    updateBackdrop('timeline-editor', 'timeline-backdrop');
+    debounceSaveTimeline();
+    clearTimeout(window._tlRenderTimer);
+    window._tlRenderTimer = setTimeout(() => syncTimelineToUI(), 300);
+}
+
+function removeTimelineField(presetName, periodKey, field) {
+    const editor = document.getElementById('timeline-editor');
+    const lines = editor.value.split('\n');
+    const target = resolveTimelineTarget(lines, presetName, periodKey);
+    if (!target) return;
+
+    const { targetStart, targetEnd, targetIndent } = target;
+    const fieldParts = field.split('.');
+
+    if (fieldParts.length === 1) {
+        const lineIdx = findChildKey(lines, targetStart, targetEnd, targetIndent, fieldParts[0]);
+        if (lineIdx < 0) return;
+        const lineIndent = lines[lineIdx].search(/\S/);
+        const lineEnd = findBlockEnd(lines, lineIdx, lineIndent, targetEnd);
+        lines.splice(lineIdx, lineEnd - lineIdx);
+        applyTimelineEditorChanges(editor, lines);
+        return;
+    }
+
+    const parentLine = findChildKey(lines, targetStart, targetEnd, targetIndent, fieldParts[0]);
+    if (parentLine < 0) return;
+    const parentIndent = lines[parentLine].search(/\S/);
+    const parentEnd = findBlockEnd(lines, parentLine, parentIndent, targetEnd);
+    const childLine = findChildKey(lines, parentLine, parentEnd, parentIndent, fieldParts[1]);
+    if (childLine < 0) return;
+
+    const childIndent = lines[childLine].search(/\S/);
+    const childEnd = findBlockEnd(lines, childLine, childIndent, parentEnd);
+    lines.splice(childLine, childEnd - childLine);
+
+    const parentEndAfter = findBlockEnd(lines, parentLine, parentIndent, targetEnd);
+    let hasChild = false;
+    for (let i = parentLine + 1; i < parentEndAfter; i++) {
+        const line = lines[i];
+        if (line.trim() === '' || line.trim().startsWith('#')) continue;
+        if (line.search(/\S/) > parentIndent) {
+            hasChild = true;
+            break;
+        }
+    }
+    if (!hasChild) {
+        lines.splice(parentLine, 1);
+    }
+
+    applyTimelineEditorChanges(editor, lines);
+}
+
+function updateTimelineSectionField(presetName, field, value) {
+    const editor = document.getElementById('timeline-editor');
+    const lines = editor.value.split('\n');
+    const section = resolveTimelineSection(lines, presetName);
+    if (!section) return;
+
+    const { sectionStart, sectionEnd, sectionIndent } = section;
+    const fieldParts = field.split('.');
+    let lineIdx = -1;
+
+    if (fieldParts.length === 1) {
+        lineIdx = findChildKey(lines, sectionStart, sectionEnd, sectionIndent, fieldParts[0]);
+    } else {
+        const parentLine = findChildKey(lines, sectionStart, sectionEnd, sectionIndent, fieldParts[0]);
+        if (parentLine >= 0) {
+            const parentIndent = lines[parentLine].search(/\S/);
+            const parentEnd = findBlockEnd(lines, parentLine, parentIndent, sectionEnd);
+            lineIdx = findChildKey(lines, parentLine, parentEnd, parentIndent, fieldParts[1]);
+        }
+    }
+
+    if (lineIdx < 0) {
+        insertTimelineField(lines, sectionStart, sectionEnd, sectionIndent, field, value, fieldParts);
+    } else {
+        replaceLineValue(lines, lineIdx, value);
+    }
+
+    applyTimelineEditorChanges(editor, lines);
+}
+
 /**
  * 查找子级 key 行
  */
@@ -4569,11 +4867,11 @@ function scrollTimelineEditorToPreset(presetName) {
     editor.setSelectionRange(charCount, charCount + lines[targetLine].length);
     editor.scrollTop = scrollPosition - 50;
 
-    // 高亮闪烁
+    // 高亮闪烁(防止快速点击竞态)
+    clearTimeout(window._tlEditorFlashTimer);
     editor.style.transition = 'background-color 0.3s';
-    const originalBg = editor.style.backgroundColor;
     editor.style.backgroundColor = '#2d4a7c';
-    setTimeout(() => { editor.style.backgroundColor = originalBg; }, 300);
+    window._tlEditorFlashTimer = setTimeout(() => { editor.style.backgroundColor = ''; }, 300);
 }
 
 // ==========================================
@@ -5471,3 +5769,14 @@ function reorderDayPlanPeriods(presetName, planKey, orderedKeys) {
     clearTimeout(window._tlRenderTimer);
     window._tlRenderTimer = setTimeout(() => syncTimelineToUI(), 500);
 }
+
+// ==========================================
+// 支持侧栏 折叠/展开
+// ==========================================
+function toggleSupportSidebar() {
+    const wrap = document.querySelector('.support-sidebar-wrap');
+    const btn = document.getElementById('sidebar-toggle-btn');
+    const isCollapsed = wrap.classList.toggle('collapsed');
+    btn.classList.toggle('is-collapsed', isCollapsed);
+    btn.title = isCollapsed ? '展开侧栏' : '收起侧栏';
+}

+ 189 - 66
docs/assets/style.css

@@ -501,72 +501,6 @@ input[type="checkbox"]:disabled {
     border-radius: 1.5rem;
 }
 
-/* 柔软卡片设计 */
-.support-card {
-    position: relative;
-    display: flex;
-    flex-direction: column;
-    align-items: center;
-    padding: 1.5rem;
-    background: #fdfdfd;
-    border: 1px solid #f3f4f6;
-    border-radius: 1.25rem;
-    transition: all 0.4s cubic-bezier(0.175, 0.885, 0.32, 1.275);
-    text-decoration: none;
-    cursor: pointer;
-    overflow: hidden;
-}
-
-.support-card:hover {
-    transform: translateY(-8px) scale(1.02);
-    background: white;
-    box-shadow: 0 20px 25px -5px rgba(0, 0, 0, 0.05), 0 10px 10px -5px rgba(0, 0, 0, 0.02);
-    border-color: #e5e7eb;
-}
-
-.support-card-num {
-    position: absolute;
-    top: 1rem;
-    right: 1.25rem;
-    font-size: 0.75rem;
-    font-weight: 800;
-    color: #f3f4f6;
-    font-style: italic;
-    transition: color 0.3s;
-}
-
-.support-card:hover .support-card-num {
-    color: #e5e7eb;
-}
-
-.support-icon {
-    width: 3.5rem;
-    height: 3.5rem;
-    border-radius: 1rem;
-    display: flex;
-    align-items: center;
-    justify-content: center;
-    font-size: 1.5rem;
-    margin-bottom: 1rem;
-    transition: all 0.3s ease;
-}
-
-.support-card:hover .support-icon {
-    transform: rotate(12deg) scale(1.1);
-}
-
-.support-btn {
-    margin-top: auto;
-    width: 100%;
-    text-align: center;
-    padding: 0.5rem;
-    border-radius: 0.75rem;
-    font-size: 0.75rem;
-    font-weight: bold;
-    color: white;
-    transition: all 0.3s;
-}
-
 /* ==========================================
    Timeline 编辑器样式
    ========================================== */
@@ -1097,3 +1031,192 @@ input[type="checkbox"]:disabled {
     transform: rotate(2deg);
     box-shadow: 0 4px 12px rgba(0,0,0,0.15);
 }
+
+/* ==========================================
+   支持侧栏
+   ========================================== */
+/* 外层容器:承担宽度和 flex 布局角色 */
+.support-sidebar-wrap {
+    width: 20%;
+    min-width: 180px;
+    max-width: 280px;
+    overflow: visible;
+    transition: width 0.3s ease, min-width 0.3s ease, max-width 0.3s ease;
+}
+.support-sidebar-wrap.collapsed {
+    width: 0;
+    min-width: 0;
+    max-width: 0;
+}
+
+/* 内层侧栏:填满 wrap */
+.support-sidebar {
+    width: 100%;
+    height: 100%;
+    overflow: hidden;
+    transition: opacity 0.3s ease;
+}
+.support-sidebar-wrap.collapsed .support-sidebar {
+    opacity: 0;
+    pointer-events: none;
+}
+
+/* 折叠/展开按钮 */
+.sidebar-toggle-btn {
+    position: absolute;
+    left: 0;
+    top: 50%;
+    transform: translate(-100%, -50%);
+    width: 20px;
+    height: 40px;
+    background: white;
+    border: 1px solid #e5e7eb;
+    border-radius: 6px 0 0 6px;
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    cursor: pointer;
+    z-index: 10;
+    opacity: 0;
+    transition: opacity 0.2s ease, background 0.2s ease;
+    color: #9ca3af;
+}
+.support-sidebar-wrap:hover .sidebar-toggle-btn {
+    opacity: 1;
+}
+.sidebar-toggle-btn:hover {
+    background: #f3f4f6;
+    color: #6b7280;
+}
+/* 折叠后按钮始终可见,箭头朝左 */
+.sidebar-toggle-btn.is-collapsed {
+    opacity: 1;
+}
+.sidebar-toggle-btn.is-collapsed i {
+    transform: rotate(180deg);
+}
+
+/* 侧栏滚动条 */
+.sidebar-scroll::-webkit-scrollbar {
+    width: 4px;
+}
+.sidebar-scroll::-webkit-scrollbar-track {
+    background: transparent;
+}
+.sidebar-scroll::-webkit-scrollbar-thumb {
+    background: #e5e7eb;
+    border-radius: 2px;
+}
+.sidebar-scroll::-webkit-scrollbar-thumb:hover {
+    background: #d1d5db;
+}
+
+/* 侧栏卡片 */
+.sidebar-card {
+    background: white;
+    border: 1px solid #f3f4f6;
+    border-radius: 0.75rem;
+    padding: 0.75rem;
+    transition: all 0.3s cubic-bezier(0.175, 0.885, 0.32, 1.275);
+    text-decoration: none;
+    display: block;
+    cursor: pointer;
+}
+.sidebar-card:hover {
+    border-color: #e5e7eb;
+    box-shadow: 0 4px 12px rgba(0, 0, 0, 0.06);
+    transform: translateY(-2px);
+}
+
+/* 侧栏卡片图标 */
+.sidebar-card-icon {
+    width: 2rem;
+    height: 2rem;
+    border-radius: 0.5rem;
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    font-size: 0.75rem;
+    flex-shrink: 0;
+    transition: all 0.2s ease;
+}
+.sidebar-card:hover .sidebar-card-icon {
+    transform: rotate(8deg) scale(1.1);
+}
+
+/* 侧栏 CTA 按钮 */
+.sidebar-cta {
+    text-align: center;
+    padding: 0.375rem 0.5rem;
+    border-radius: 0.5rem;
+    font-size: 0.625rem;
+    font-weight: 700;
+    transition: all 0.2s ease;
+    letter-spacing: 0.02em;
+}
+
+/* 侧栏二维码 */
+.sidebar-qr {
+    width: 100%;
+    max-width: 120px;
+    aspect-ratio: 1;
+    background: white;
+    border: 1px solid #f3f4f6;
+    border-radius: 0.625rem;
+    padding: 0.375rem;
+    transition: all 0.3s ease;
+}
+
+/* 链接样式重置 */
+a.sidebar-card {
+    color: inherit;
+}
+a.sidebar-card:hover {
+    color: inherit;
+    text-decoration: none;
+}
+
+/* 侧栏标题区引语 */
+.sidebar-quote {
+    max-height: 0;
+    overflow: hidden;
+    opacity: 0;
+    margin-top: 0;
+    transition: max-height 0.4s ease, opacity 0.3s ease, margin-top 0.3s ease;
+}
+.sidebar-header-hover:hover .sidebar-quote {
+    max-height: 3rem;
+    opacity: 1;
+    margin-top: 0.375rem;
+}
+
+/* 可点击的二维码卡片 */
+.sidebar-card-clickable {
+    cursor: pointer;
+    position: relative;
+}
+.sidebar-card-clickable:hover {
+    border-color: #d1d5db;
+    box-shadow: 0 6px 16px rgba(0, 0, 0, 0.08);
+}
+
+/* 点击放大提示 */
+.sidebar-enlarge-hint {
+    position: absolute;
+    bottom: 0;
+    left: 50%;
+    transform: translateX(-50%) translateY(4px);
+    background: rgba(0, 0, 0, 0.65);
+    color: white;
+    font-size: 0.5625rem;
+    padding: 0.125rem 0.5rem;
+    border-radius: 0.25rem;
+    white-space: nowrap;
+    opacity: 0;
+    transition: all 0.2s ease;
+    pointer-events: none;
+}
+.sidebar-card-clickable:hover .sidebar-enlarge-hint {
+    opacity: 1;
+    transform: translateX(-50%) translateY(-4px);
+}

+ 125 - 69
docs/index.html

@@ -38,9 +38,6 @@
                 <button onclick="copyResult()" class="bg-blue-600 hover:bg-blue-700 text-white px-4 py-1.5 rounded text-sm font-medium transition-colors shadow-sm">
                     <i class="fa-regular fa-copy mr-1.5"></i>复制配置
                 </button>
-                <button onclick="openSupportModal()" class="bg-gradient-to-r from-orange-400 to-pink-500 hover:from-orange-500 hover:to-pink-600 text-white px-4 py-1.5 rounded text-sm font-medium transition-all shadow-md hover:shadow-lg flex items-center gap-1.5">
-                    <i class="fa-solid fa-heart-pulse"></i>支持一下
-                </button>
             </div>
         </div>
     </nav>
@@ -99,8 +96,10 @@
             </div>
         </div>
 
-        <!-- 右侧:可视化配置 (Visual) -->
-        <div class="w-1/2 flex flex-col bg-gray-50">
+        <!-- 右侧:可视化配置 + 支持侧栏 -->
+        <div class="w-1/2 flex">
+        <!-- 可视化配置 (Visual) -->
+        <div class="flex-1 flex flex-col bg-gray-50 min-w-0">
             <div class="flex items-center justify-between px-6 py-3 bg-white border-b border-gray-200">
                 <div class="flex items-center gap-3">
                     <span class="text-sm font-bold text-gray-700"><i class="fa-solid fa-list-check mr-2"></i><span id="right-panel-title">配置模块</span></span>
@@ -130,6 +129,112 @@
             <div id="timeline-panel" class="tab-content hidden flex-grow overflow-y-auto p-6 space-y-6">
             </div>
         </div>
+
+        <!-- 支持侧栏 (固定,不随内容滚动) -->
+        <div class="support-sidebar-wrap flex-shrink-0 relative">
+            <!-- 折叠/展开按钮 (侧栏左边缘) -->
+            <button id="sidebar-toggle-btn" class="sidebar-toggle-btn" onclick="toggleSupportSidebar()" title="收起侧栏">
+                <i class="fa-solid fa-chevron-right text-[10px]"></i>
+            </button>
+            <div id="support-sidebar" class="support-sidebar border-l border-gray-200 bg-gradient-to-b from-orange-50/30 via-white to-pink-50/20 flex flex-col">
+            <!-- 侧栏标题 -->
+            <div class="px-3 py-3 border-b border-gray-100 bg-white/80 sidebar-header-hover group/header">
+                <div class="flex items-center gap-2">
+                    <div class="w-6 h-6 bg-gradient-to-br from-orange-400 to-pink-500 rounded-lg flex items-center justify-center">
+                        <i class="fa-solid fa-heart text-white text-[10px]"></i>
+                    </div>
+                    <span class="text-sm font-bold text-gray-700 tracking-tight">支持项目</span>
+                </div>
+                <p class="sidebar-quote text-[10px] text-gray-400 mt-1.5 leading-relaxed italic">若 TrendRadar 曾为你捕捉价值,不妨为它注入动力,助其持续进化</p>
+            </div>
+
+            <!-- 卡片列表 -->
+            <div class="flex-1 p-3 space-y-3 overflow-y-auto sidebar-scroll">
+
+                <!-- 01: 点亮 Star -->
+                <a href="https://github.com/sansan0/TrendRadar" target="_blank" class="sidebar-card group block">
+                    <div class="flex items-center gap-2 mb-2.5">
+                        <div class="sidebar-card-icon bg-orange-100 text-orange-500 group-hover:bg-orange-200">
+                            <i class="fa-solid fa-star"></i>
+                        </div>
+                        <div class="min-w-0">
+                            <div class="text-xs font-bold text-gray-800 leading-tight">点亮 Star</div>
+                            <div class="text-[10px] text-gray-400 leading-tight mt-0.5">让更多人发现它</div>
+                        </div>
+                    </div>
+                    <div class="sidebar-cta bg-gradient-to-r from-orange-400 to-red-500 text-white group-hover:from-orange-500 group-hover:to-red-600 shadow-sm group-hover:shadow-md">
+                        <i class="fa-brands fa-github mr-1"></i>前往 GitHub
+                    </div>
+                </a>
+
+                <!-- 02: 不迷路 (微信) -->
+                <div class="sidebar-card sidebar-card-clickable group" onclick="openQrModal('weixin')">
+                    <div class="flex items-center gap-2 mb-2.5">
+                        <div class="sidebar-card-icon bg-green-100 text-green-600 group-hover:bg-green-200">
+                            <i class="fa-brands fa-weixin"></i>
+                        </div>
+                        <div class="min-w-0">
+                            <div class="text-xs font-bold text-gray-800 leading-tight">不迷路</div>
+                            <div class="text-[10px] text-gray-400 leading-tight mt-0.5">获取更新通知</div>
+                        </div>
+                    </div>
+                    <div class="flex justify-center relative">
+                        <div class="sidebar-qr group-hover:shadow-md">
+                            <img src="./assets/weixin.webp" alt="微信公众号" class="w-full h-full object-contain">
+                        </div>
+                        <div class="sidebar-enlarge-hint">
+                            <i class="fa-solid fa-expand mr-1"></i>点击放大
+                        </div>
+                    </div>
+                    <p class="text-[10px] text-gray-400 text-center mt-2">扫码关注公众号</p>
+                </div>
+
+                <!-- 03: 随心赞赏 -->
+                <div class="sidebar-card sidebar-card-clickable group" onclick="openQrModal('donate')">
+                    <div class="flex items-center gap-2 mb-2.5">
+                        <div class="sidebar-card-icon bg-emerald-100 text-emerald-600 group-hover:bg-emerald-200">
+                            <i class="fa-solid fa-hand-holding-heart"></i>
+                        </div>
+                        <div class="min-w-0">
+                            <div class="text-xs font-bold text-gray-800 leading-tight">随心赞赏</div>
+                            <div class="text-[10px] text-gray-400 leading-tight mt-0.5">1 元也是鼓励</div>
+                        </div>
+                    </div>
+                    <div class="flex justify-center relative">
+                        <div class="sidebar-qr group-hover:shadow-md">
+                            <img src="https://cdn-1258574687.cos.ap-shanghai.myqcloud.com/img/%2F2026%2F01%2F18ecce7c224ce0ea4c59394c29e408f8-e0d1db45.webp" alt="微信支付" class="w-full h-full object-contain">
+                        </div>
+                        <div class="sidebar-enlarge-hint">
+                            <i class="fa-solid fa-expand mr-1"></i>点击放大
+                        </div>
+                    </div>
+                    <p class="text-[10px] text-gray-400 text-center mt-2">微信扫码 · 丰俭由人</p>
+                </div>
+
+                <!-- 04: 探索更多 -->
+                <a href="https://sansan0.github.io/mao-map/" target="_blank" class="sidebar-card group block">
+                    <div class="flex items-center gap-2 mb-2.5">
+                        <div class="sidebar-card-icon bg-red-100 text-red-500 group-hover:bg-red-200">
+                            <i class="fa-solid fa-map-location-dot"></i>
+                        </div>
+                        <div class="min-w-0">
+                            <div class="text-xs font-bold text-gray-800 leading-tight">探索更多</div>
+                            <div class="text-[10px] text-gray-400 leading-tight mt-0.5">另一个用心的作品</div>
+                        </div>
+                    </div>
+                    <div class="sidebar-cta bg-red-50 text-red-600 border border-red-100 group-hover:bg-red-100 group-hover:text-red-700">
+                        <i class="fa-solid fa-arrow-up-right-from-square mr-1"></i>去看看
+                    </div>
+                </a>
+            </div>
+
+            <!-- 底部寄语 -->
+            <div class="px-3 py-2.5 border-t border-gray-100 bg-white/60">
+                <p class="text-[10px] text-gray-300 text-center italic font-serif tracking-wide">"开源不易,感谢支持"</p>
+            </div>
+            </div>
+        </div>
+        </div>
     </main>
 
     <!-- RSS 添加弹窗 -->
@@ -354,76 +459,27 @@
         </div>
     </div>
 
-    <!-- 支持项目弹窗 -->
-    <div id="support-modal" class="modal-overlay hidden">
-        <div class="modal-content support-modal-content max-w-5xl w-[95%] max-h-[90vh] overflow-y-auto p-8">
-            <div class="flex items-center justify-between mb-8">
-                <div class="flex items-center gap-4">
-                    <div class="w-12 h-12 bg-orange-50 rounded-full flex items-center justify-center text-orange-500 shadow-sm relative overflow-hidden">
-                        <div class="absolute inset-0 bg-orange-400 opacity-20 animate-ping"></div>
-                        <i class="fa-solid fa-heart text-2xl animate-pulse relative z-10"></i>
-                    </div>
-                    <div>
-                        <h3 class="text-2xl font-bold text-gray-800 tracking-tight">觉得好用?支持一下 ✨</h3>
-                        <p class="text-sm text-gray-500 mt-1">若 TrendRadar 曾为你捕捉价值,不妨为它注入动力,助其持续进化</p>
+    <!-- 二维码放大弹窗 -->
+    <div id="qr-modal" class="modal-overlay hidden" onclick="if(event.target===this){closeQrModal()}">
+        <div class="modal-content support-modal-content max-w-sm w-[90%] p-6 text-center">
+            <div class="flex items-center justify-between mb-5">
+                <div class="flex items-center gap-3">
+                    <div id="qr-modal-icon" class="w-10 h-10 rounded-xl flex items-center justify-center text-lg"></div>
+                    <div class="text-left">
+                        <h3 id="qr-modal-title" class="text-lg font-bold text-gray-800"></h3>
+                        <p id="qr-modal-subtitle" class="text-xs text-gray-500 mt-0.5"></p>
                     </div>
                 </div>
-                <button onclick="closeSupportModal()" class="w-10 h-10 flex items-center justify-center rounded-full hover:bg-gray-100 text-gray-400 transition-colors">
-                    <i class="fa-solid fa-times text-xl"></i>
+                <button onclick="closeQrModal()" class="w-8 h-8 flex items-center justify-center rounded-full hover:bg-gray-100 text-gray-400 transition-colors">
+                    <i class="fa-solid fa-times"></i>
                 </button>
             </div>
-
-            <div class="grid grid-cols-1 md:grid-cols-4 gap-6">
-                <a href="https://github.com/sansan0/TrendRadar" target="_blank" class="support-card group border-orange-200 bg-orange-50/30">
-                    <div class="support-card-num opacity-50">01</div>
-                    <div class="support-icon text-orange-500 bg-orange-100 group-hover:bg-orange-200 mb-4 group-hover:scale-110 transition-transform">
-                        <i class="fa-solid fa-star text-2xl"></i>
-                    </div>
-                    <h4 class="text-lg font-bold text-gray-800 mb-2">点亮 Star</h4>
-                    <p class="text-sm text-gray-500 mb-6 text-center leading-relaxed">只需 1 秒,让更多人发现它</p>
-                    <span class="support-btn bg-gradient-to-r from-orange-400 to-red-500 shadow-lg shadow-orange-200 group-hover:shadow-xl group-hover:from-orange-500 group-hover:to-red-600">立即前往 GitHub</span>
-                </a>
-
-                <div class="support-card group">
-                    <div class="support-card-num">02</div>
-                    <div class="support-icon text-green-600 bg-green-50 group-hover:bg-green-100 mb-4">
-                        <i class="fa-brands fa-weixin text-2xl"></i>
-                    </div>
-                    <h4 class="text-lg font-bold text-gray-800 mb-2">不迷路</h4>
-                    <p class="text-sm text-gray-500 mb-4 text-center">第一时间获取更新通知</p>
-                    <div class="w-36 h-36 bg-white border border-gray-100 rounded-xl p-2 shadow-sm group-hover:shadow-md transition-shadow">
-                        <img src="./assets/weixin.webp" alt="微信公众号" class="w-full h-full object-contain">
-                    </div>
-                    <p class="text-xs text-gray-400 mt-3">扫码加入社区</p>
-                </div>
-
-                <div class="support-card group">
-                    <div class="support-card-num">03</div>
-                    <div class="support-icon text-emerald-600 bg-emerald-50 group-hover:bg-emerald-100 mb-4">
-                        <i class="fa-solid fa-hand-holding-heart text-2xl"></i>
-                    </div>
-                    <h4 class="text-lg font-bold text-gray-800 mb-2">随心赞赏</h4>
-                    <p class="text-sm text-gray-500 mb-4 text-center">金额随意,1 元也是鼓励 (´▽`ʃ♡ƪ)</p>
-                    <div class="w-36 h-36 bg-white border border-gray-100 rounded-xl p-2 shadow-sm group-hover:shadow-md transition-shadow">
-                        <img src="https://cdn-1258574687.cos.ap-shanghai.myqcloud.com/img/%2F2026%2F01%2F18ecce7c224ce0ea4c59394c29e408f8-e0d1db45.webp" alt="微信支付" class="w-full h-full object-contain">
-                    </div>
-                    <p class="text-xs text-gray-400 mt-3">微信扫码 • 丰俭由人</p>
+            <div class="flex justify-center">
+                <div class="w-56 h-56 bg-white border border-gray-100 rounded-2xl p-3 shadow-sm">
+                    <img id="qr-modal-img" src="" alt="" class="w-full h-full object-contain">
                 </div>
-
-                <a href="https://sansan0.github.io/mao-map/" target="_blank" class="support-card group">
-                    <div class="support-card-num">04</div>
-                    <div class="support-icon text-red-500 bg-red-50 group-hover:bg-red-100 mb-4">
-                        <i class="fa-solid fa-map-location-dot text-2xl"></i>
-                    </div>
-                    <h4 class="text-lg font-bold text-gray-800 mb-2">探索更多</h4>
-                    <p class="text-sm text-gray-500 mb-6 text-center leading-relaxed">另一个用心的作品</p>
-                    <span class="support-btn bg-red-50 text-red-600 group-hover:bg-red-100 group-hover:text-red-700 border border-red-100">去看看</span>
-                </a>
-            </div>
-
-            <div class="mt-8 pt-6 border-t border-gray-100 text-center">
-                <p class="text-sm text-gray-400 font-serif italic tracking-wide">"开源不易,感谢每一份支持"</p>
             </div>
+            <p id="qr-modal-hint" class="text-xs text-gray-400 mt-4"></p>
         </div>
     </div>
 

+ 2 - 1
pyproject.toml

@@ -1,6 +1,6 @@
 [project]
 name = "trendradar"
-version = "6.0.0"
+version = "6.5.0"
 description = "TrendRadar - 热点新闻聚合与分析工具"
 requires-python = ">=3.10"
 dependencies = [
@@ -12,6 +12,7 @@ dependencies = [
     "feedparser>=6.0.0,<7.0.0",
     "boto3>=1.35.0,<2.0.0",
     "litellm>=1.57.0,<2.0.0",
+    "json-repair>=0.58.3,<1.0.0",
     "tenacity==8.5.0"
 ]
 

+ 1 - 1
trendradar/__init__.py

@@ -9,5 +9,5 @@ TrendRadar - 热点新闻聚合与分析工具
 
 from trendradar.context import AppContext
 
-__version__ = "6.0.0"
+__version__ = "6.5.0"
 __all__ = ["AppContext", "__version__"]

+ 539 - 52
trendradar/__main__.py

@@ -7,9 +7,13 @@ TrendRadar 主程序
 """
 
 import argparse
+import copy
+import json
 import os
 import re
+import sys
 import webbrowser
+from datetime import datetime, timezone
 from pathlib import Path
 from typing import Dict, List, Tuple, Optional
 
@@ -17,7 +21,7 @@ import requests
 
 from trendradar.context import AppContext
 from trendradar import __version__
-from trendradar.core import load_config
+from trendradar.core import load_config, parse_multi_account_config, validate_paired_configs
 from trendradar.core.analyzer import convert_keyword_stats_to_platform_stats
 from trendradar.crawler import DataFetcher
 from trendradar.storage import convert_crawl_results_to_news_data
@@ -136,7 +140,9 @@ def check_all_versions(
 
     config_files = [
         Path("config/config.yaml"),
+        Path("config/timeline.yaml"),
         Path("config/frequency_words.txt"),
+        Path("config/ai_interests.txt"),
         Path("config/ai_analysis_prompt.txt"),
         Path("config/ai_translation_prompt.txt"),
     ]
@@ -222,6 +228,9 @@ class NewsAnalyzer:
 
         self.request_interval = self.ctx.config["REQUEST_INTERVAL"]
         self.report_mode = self.ctx.config["REPORT_MODE"]
+        self.frequency_file = None
+        self.filter_method = None  # None=使用全局配置 ctx.filter_method
+        self.interests_file = None  # None=使用全局配置 ai_filter.interests_file
         self.rank_threshold = self.ctx.rank_threshold
         self.is_github_actions = os.environ.get("GITHUB_ACTIONS") == "true"
         self.is_docker_container = self._detect_docker_environment()
@@ -357,7 +366,7 @@ class NewsAnalyzer:
             Tuple[stats, id_to_name]: 统计数据和平台映射
         """
         try:
-            word_groups, filter_words, global_filters = self.ctx.load_frequency_words()
+            word_groups, filter_words, global_filters = self.ctx.load_frequency_words(self.frequency_file)
 
             if ai_mode == "incremental":
                 # incremental 模式:使用当前抓取的数据
@@ -597,7 +606,7 @@ class NewsAnalyzer:
                 print(f"读取到 {total_titles} 个标题(已按当前监控平台过滤)")
 
             new_titles = self.ctx.detect_new_titles(current_platform_ids, quiet=quiet)
-            word_groups, filter_words, global_filters = self.ctx.load_frequency_words()
+            word_groups, filter_words, global_filters = self.ctx.load_frequency_words(self.frequency_file)
 
             return (
                 all_results,
@@ -798,21 +807,44 @@ class NewsAnalyzer:
         rss_new_items: Optional[List[Dict]] = None,
         standalone_data: Optional[Dict] = None,
         schedule: ResolvedSchedule = None,
-    ) -> Tuple[List[Dict], Optional[str], Optional[AIAnalysisResult]]:
-        """统一的分析流水线:数据处理 → 统计计算 → AI分析 → HTML生成"""
-
-        # 统计计算(使用 AppContext)
-        stats, total_titles = self.ctx.count_frequency(
-            data_source,
-            word_groups,
-            filter_words,
-            id_to_name,
-            title_info,
-            new_titles,
-            mode=mode,
-            global_filters=global_filters,
-            quiet=quiet,
-        )
+        rss_new_urls: Optional[set] = None,
+    ) -> Tuple[List[Dict], Optional[str], Optional[AIAnalysisResult], Optional[List[Dict]]]:
+        """统一的分析流水线:数据处理 → 统计计算(关键词/AI筛选)→ AI分析 → HTML生成"""
+
+        # 根据筛选策略选择数据处理方式
+        if self.filter_method == "ai":
+            # === AI 筛选策略 ===
+            print("[筛选] 使用 AI 智能筛选策略")
+            ai_filter_result = self.ctx.run_ai_filter(interests_file=self.interests_file)
+
+            if ai_filter_result and ai_filter_result.success:
+                print(f"[筛选] AI 筛选完成: {ai_filter_result.total_matched} 条匹配, {len(ai_filter_result.tags)} 个标签")
+                # 转换为与关键词匹配相同的数据结构
+                stats, ai_rss_stats = self.ctx.convert_ai_filter_to_report_data(
+                    ai_filter_result, mode=mode,
+                    new_titles=new_titles, rss_new_urls=rss_new_urls,
+                )
+                total_titles = sum(len(titles) for titles in data_source.values())
+
+                # AI 筛选的 RSS 结果替换关键词匹配的 RSS 结果
+                if ai_rss_stats:
+                    rss_items = ai_rss_stats
+            else:
+                # AI 筛选失败,回退到关键词匹配
+                error_msg = ai_filter_result.error if ai_filter_result else "未知错误"
+                print(f"[筛选] AI 筛选失败: {error_msg},回退到关键词匹配")
+                stats, total_titles = self.ctx.count_frequency(
+                    data_source, word_groups, filter_words,
+                    id_to_name, title_info, new_titles,
+                    mode=mode, global_filters=global_filters, quiet=quiet,
+                )
+        else:
+            # === 关键词匹配策略(默认)===
+            stats, total_titles = self.ctx.count_frequency(
+                data_source, word_groups, filter_words,
+                id_to_name, title_info, new_titles,
+                mode=mode, global_filters=global_filters, quiet=quiet,
+            )
 
         # 如果是 platform 模式,转换数据结构
         if self.ctx.display_mode == "platform" and stats:
@@ -850,9 +882,10 @@ class NewsAnalyzer:
                 rss_new_items=rss_new_items,
                 ai_analysis=ai_result,
                 standalone_data=standalone_data,
+                frequency_file=self.frequency_file,
             )
 
-        return stats, html_file, ai_result
+        return stats, html_file, ai_result, rss_items
 
     def _send_notification_if_needed(
         self,
@@ -921,7 +954,7 @@ class NewsAnalyzer:
                     )
 
             # 准备报告数据
-            report_data = self.ctx.prepare_report(stats, failed_ids, new_titles, id_to_name, mode)
+            report_data = self.ctx.prepare_report(stats, failed_ids, new_titles, id_to_name, mode, frequency_file=self.frequency_file)
 
             # 是否发送版本更新信息
             update_info_to_send = self.update_info if cfg["SHOW_VERSION_UPDATE"] else None
@@ -1034,24 +1067,25 @@ class NewsAnalyzer:
 
         return results, id_to_name, failed_ids
 
-    def _crawl_rss_data(self) -> Tuple[Optional[List[Dict]], Optional[List[Dict]], Optional[List[Dict]]]:
+    def _crawl_rss_data(self) -> Tuple[Optional[List[Dict]], Optional[List[Dict]], Optional[List[Dict]], set]:
         """
         执行 RSS 数据抓取
 
         Returns:
-            (rss_items, rss_new_items, raw_rss_items) 元组:
+            (rss_items, rss_new_items, raw_rss_items, rss_new_urls) 元组:
             - rss_items: 统计条目列表(按模式处理,用于统计区块)
             - rss_new_items: 新增条目列表(用于新增区块)
             - raw_rss_items: 原始 RSS 条目列表(用于独立展示区)
-            如果未启用或失败返回 (None, None, None)
+            - rss_new_urls: 原始新增 RSS 条目的 URL 集合(用于 AI 模式 is_new 检测)
+            如果未启用或失败返回 (None, None, None, set())
         """
         if not self.ctx.rss_enabled:
-            return None, None, None
+            return None, None, None, set()
 
         rss_feeds = self.ctx.rss_feeds
         if not rss_feeds:
             print("[RSS] 未配置任何 RSS 源")
-            return None, None, None
+            return None, None, None, set()
 
         try:
             from trendradar.crawler.rss import RSSFetcher, RSSFeedConfig
@@ -1087,7 +1121,7 @@ class NewsAnalyzer:
 
             if not feeds:
                 print("[RSS] 没有启用的 RSS 源")
-                return None, None, None
+                return None, None, None, set()
 
             # 创建抓取器
             rss_config = self.ctx.rss_config
@@ -1122,17 +1156,17 @@ class NewsAnalyzer:
                 return self._process_rss_data_by_mode(rss_data)
             else:
                 print(f"[RSS] 数据保存失败")
-                return None, None, None
+                return None, None, None, set()
 
         except ImportError as e:
             print(f"[RSS] 缺少依赖: {e}")
             print("[RSS] 请安装 feedparser: pip install feedparser")
-            return None, None, None
+            return None, None, None, set()
         except Exception as e:
             print(f"[RSS] 抓取失败: {e}")
-            return None, None, None
+            return None, None, None, set()
 
-    def _process_rss_data_by_mode(self, rss_data) -> Tuple[Optional[List[Dict]], Optional[List[Dict]], Optional[List[Dict]]]:
+    def _process_rss_data_by_mode(self, rss_data) -> Tuple[Optional[List[Dict]], Optional[List[Dict]], Optional[List[Dict]], set]:
         """
         按报告模式处理 RSS 数据,返回与热榜相同格式的统计结构
 
@@ -1145,10 +1179,11 @@ class NewsAnalyzer:
             rss_data: 当前抓取的 RSSData 对象
 
         Returns:
-            (rss_stats, rss_new_stats, raw_rss_items) 元组:
+            (rss_stats, rss_new_stats, raw_rss_items, rss_new_urls) 元组:
             - rss_stats: RSS 关键词统计列表(与热榜 stats 格式一致)
             - rss_new_stats: RSS 新增关键词统计列表(与热榜 stats 格式一致)
             - raw_rss_items: 原始 RSS 条目列表(用于独立展示区)
+            - rss_new_urls: 原始新增 RSS 条目的 URL 集合(未经关键词过滤,用于 AI 模式 is_new 检测)
         """
         from trendradar.core.analyzer import count_rss_frequency
 
@@ -1157,7 +1192,7 @@ class NewsAnalyzer:
 
         # 加载关键词配置
         try:
-            word_groups, filter_words, global_filters = self.ctx.load_frequency_words()
+            word_groups, filter_words, global_filters = self.ctx.load_frequency_words(self.frequency_file)
         except FileNotFoundError:
             word_groups, filter_words, global_filters = [], [], []
 
@@ -1168,6 +1203,7 @@ class NewsAnalyzer:
         rss_stats = None
         rss_new_stats = None
         raw_rss_items = None  # 原始 RSS 条目列表(用于独立展示区)
+        rss_new_urls = set()  # 原始新增 RSS URLs(未经关键词过滤)
 
         # 1. 首先获取原始条目(用于独立展示区,不受 display.regions.rss 影响)
         # 根据模式获取原始条目
@@ -1186,7 +1222,7 @@ class NewsAnalyzer:
 
         # 如果 RSS 展示未启用,跳过关键词分析,只返回原始条目用于独立展示区
         if not rss_display_enabled:
-            return None, None, raw_rss_items
+            return None, None, raw_rss_items, rss_new_urls
 
         # 2. 获取新增条目(用于统计)
         new_items_dict = self.storage_manager.detect_new_rss_items(rss_data)
@@ -1195,13 +1231,15 @@ class NewsAnalyzer:
             new_items_list = self._convert_rss_items_to_list(new_items_dict, rss_data.id_to_name)
             if new_items_list:
                 print(f"[RSS] 检测到 {len(new_items_list)} 条新增")
+                # 收集原始新增 URLs(未经关键词过滤,用于 AI 模式 is_new 检测)
+                rss_new_urls = {item["url"] for item in new_items_list if item.get("url")}
 
         # 3. 根据模式获取统计条目
         if self.report_mode == "incremental":
             # 增量模式:统计条目就是新增条目
             if not new_items_list:
                 print("[RSS] 增量模式:没有新增 RSS 条目")
-                return None, None, raw_rss_items
+                return None, None, raw_rss_items, rss_new_urls
 
             rss_stats, total = count_rss_frequency(
                 rss_items=new_items_list,
@@ -1218,14 +1256,14 @@ class NewsAnalyzer:
             if not rss_stats:
                 print("[RSS] 增量模式:关键词匹配后没有内容")
                 # 即使关键词匹配为空,也返回原始条目用于独立展示区
-                return None, None, raw_rss_items
+                return None, None, raw_rss_items, rss_new_urls
 
         elif self.report_mode == "current":
             # 当前榜单模式:统计=当前榜单所有条目
             # raw_rss_items 已在前面获取
             if not raw_rss_items:
                 print("[RSS] 当前榜单模式:没有 RSS 数据")
-                return None, None, None
+                return None, None, None, rss_new_urls
 
             rss_stats, total = count_rss_frequency(
                 rss_items=raw_rss_items,
@@ -1242,7 +1280,7 @@ class NewsAnalyzer:
             if not rss_stats:
                 print("[RSS] 当前榜单模式:关键词匹配后没有内容")
                 # 即使关键词匹配为空,也返回原始条目用于独立展示区
-                return None, None, raw_rss_items
+                return None, None, raw_rss_items, rss_new_urls
 
             # 生成新增统计
             if new_items_list:
@@ -1264,7 +1302,7 @@ class NewsAnalyzer:
             # raw_rss_items 已在前面获取
             if not raw_rss_items:
                 print("[RSS] 当日汇总模式:没有 RSS 数据")
-                return None, None, None
+                return None, None, None, rss_new_urls
 
             rss_stats, total = count_rss_frequency(
                 rss_items=raw_rss_items,
@@ -1281,7 +1319,7 @@ class NewsAnalyzer:
             if not rss_stats:
                 print("[RSS] 当日汇总模式:关键词匹配后没有内容")
                 # 即使关键词匹配为空,也返回原始条目用于独立展示区
-                return None, None, raw_rss_items
+                return None, None, raw_rss_items, rss_new_urls
 
             # 生成新增统计
             if new_items_list:
@@ -1298,7 +1336,7 @@ class NewsAnalyzer:
                     quiet=True,
                 )
 
-        return rss_stats, rss_new_stats, raw_rss_items
+        return rss_stats, rss_new_stats, raw_rss_items, rss_new_urls
 
     def _convert_rss_items_to_list(self, items_dict: Dict, id_to_name: Dict) -> List[Dict]:
         """将 RSS 条目字典转换为列表格式,并应用新鲜度过滤(用于推送)"""
@@ -1373,9 +1411,9 @@ class NewsAnalyzer:
         return rss_items
 
     def _filter_rss_by_keywords(self, rss_items: List[Dict]) -> List[Dict]:
-        """使用 frequency_words.txt 过滤 RSS 条目"""
+        """使用关键词文件过滤 RSS 条目"""
         try:
-            word_groups, filter_words, global_filters = self.ctx.load_frequency_words()
+            word_groups, filter_words, global_filters = self.ctx.load_frequency_words(self.frequency_file)
             if word_groups or filter_words or global_filters:
                 from trendradar.core.frequency import matches_word_groups
                 filtered_items = []
@@ -1392,7 +1430,7 @@ class NewsAnalyzer:
                     print("[RSS] 关键词过滤后没有匹配内容")
                     return []
         except FileNotFoundError:
-            # frequency_words.txt 不存在时跳过过滤
+            # 关键词文件不存在时跳过过滤
             pass
         return rss_items
 
@@ -1431,6 +1469,7 @@ class NewsAnalyzer:
         rss_items: Optional[List[Dict]] = None,
         rss_new_items: Optional[List[Dict]] = None,
         raw_rss_items: Optional[List[Dict]] = None,
+        rss_new_urls: Optional[set] = None,
     ) -> Optional[str]:
         """执行模式特定逻辑,支持热榜+RSS合并推送
 
@@ -1448,6 +1487,18 @@ class NewsAnalyzer:
             print(f"[调度] 报告模式覆盖: {self.report_mode} -> {effective_mode}")
         self.report_mode = effective_mode
 
+        # 重新获取 mode_strategy,确保 report_type 与覆盖后的 report_mode 一致
+        mode_strategy = self._get_mode_strategy()
+
+        # 使用 schedule 决定的 frequency_file 覆盖默认值
+        self.frequency_file = schedule.frequency_file
+
+        # 使用 schedule 决定的筛选策略覆盖默认值
+        self.filter_method = schedule.filter_method or self.ctx.filter_method
+
+        # 使用 schedule 决定的 AI 筛选兴趣文件覆盖默认值
+        self.interests_file = schedule.interests_file
+
         # 如果调度器说不采集,则直接跳过
         if not schedule.collect:
             print("[调度] 当前时间段不执行数据采集,跳过分析流水线")
@@ -1457,7 +1508,7 @@ class NewsAnalyzer:
 
         new_titles = self.ctx.detect_new_titles(current_platform_ids)
         time_info = self.ctx.format_time()
-        word_groups, filter_words, global_filters = self.ctx.load_frequency_words()
+        word_groups, filter_words, global_filters = self.ctx.load_frequency_words(self.frequency_file)
 
         html_file = None
         stats = []
@@ -1487,7 +1538,7 @@ class NewsAnalyzer:
                     all_results, historical_id_to_name, historical_title_info, raw_rss_items
                 )
 
-                stats, html_file, ai_result = self._run_analysis_pipeline(
+                stats, html_file, ai_result, rss_items = self._run_analysis_pipeline(
                     all_results,
                     self.report_mode,
                     historical_title_info,
@@ -1501,6 +1552,7 @@ class NewsAnalyzer:
                     rss_new_items=rss_new_items,
                     standalone_data=standalone_data,
                     schedule=schedule,
+                    rss_new_urls=rss_new_urls,
                 )
 
                 combined_id_to_name = {**historical_id_to_name, **id_to_name}
@@ -1530,7 +1582,7 @@ class NewsAnalyzer:
                     all_results, historical_id_to_name, historical_title_info, raw_rss_items
                 )
 
-                stats, html_file, ai_result = self._run_analysis_pipeline(
+                stats, html_file, ai_result, rss_items = self._run_analysis_pipeline(
                     all_results,
                     self.report_mode,
                     historical_title_info,
@@ -1544,6 +1596,7 @@ class NewsAnalyzer:
                     rss_new_items=rss_new_items,
                     standalone_data=standalone_data,
                     schedule=schedule,
+                    rss_new_urls=rss_new_urls,
                 )
 
                 combined_id_to_name = {**historical_id_to_name, **id_to_name}
@@ -1557,7 +1610,7 @@ class NewsAnalyzer:
                 standalone_data = self._prepare_standalone_data(
                     results, id_to_name, title_info, raw_rss_items
                 )
-                stats, html_file, ai_result = self._run_analysis_pipeline(
+                stats, html_file, ai_result, rss_items = self._run_analysis_pipeline(
                     results,
                     self.report_mode,
                     title_info,
@@ -1571,6 +1624,7 @@ class NewsAnalyzer:
                     rss_new_items=rss_new_items,
                     standalone_data=standalone_data,
                     schedule=schedule,
+                    rss_new_urls=rss_new_urls,
                 )
         else:
             # incremental 模式:只使用当前抓取的数据
@@ -1578,7 +1632,7 @@ class NewsAnalyzer:
             standalone_data = self._prepare_standalone_data(
                 results, id_to_name, title_info, raw_rss_items
             )
-            stats, html_file, ai_result = self._run_analysis_pipeline(
+            stats, html_file, ai_result, rss_items = self._run_analysis_pipeline(
                 results,
                 self.report_mode,
                 title_info,
@@ -1592,6 +1646,7 @@ class NewsAnalyzer:
                 rss_new_items=rss_new_items,
                 standalone_data=standalone_data,
                 schedule=schedule,
+                rss_new_urls=rss_new_urls,
             )
 
         if html_file:
@@ -1640,13 +1695,13 @@ class NewsAnalyzer:
             results, id_to_name, failed_ids = self._crawl_data()
 
             # 抓取 RSS 数据(如果启用),返回统计条目、新增条目和原始条目
-            rss_items, rss_new_items, raw_rss_items = self._crawl_rss_data()
+            rss_items, rss_new_items, raw_rss_items, rss_new_urls = self._crawl_rss_data()
 
             # 执行模式策略,传递 RSS 数据用于合并推送
             self._execute_mode_strategy(
                 mode_strategy, results, id_to_name, failed_ids,
                 rss_items=rss_items, rss_new_items=rss_new_items,
-                raw_rss_items=raw_rss_items
+                raw_rss_items=raw_rss_items, rss_new_urls=rss_new_urls
             )
 
         except Exception as e:
@@ -1658,6 +1713,409 @@ class NewsAnalyzer:
             self.ctx.cleanup()
 
 
+def _record_doctor_result(results: List[Tuple[str, str, str]], status: str, item: str, detail: str) -> None:
+    """记录并打印 doctor 检查结果"""
+    icon_map = {
+        "pass": "✅",
+        "warn": "⚠️",
+        "fail": "❌",
+    }
+    icon = icon_map.get(status, "•")
+    results.append((status, item, detail))
+    print(f"{icon} {item}: {detail}")
+
+
+def _save_doctor_report(
+    results: List[Tuple[str, str, str]],
+    pass_count: int,
+    warn_count: int,
+    fail_count: int,
+    config_path: Optional[str],
+) -> None:
+    """保存 doctor 体检报告到 JSON 文件"""
+    report = {
+        "version": __version__,
+        "generated_at": datetime.now(timezone.utc).isoformat(),
+        "config_path": config_path or os.environ.get("CONFIG_PATH", "config/config.yaml"),
+        "summary": {
+            "pass": pass_count,
+            "warn": warn_count,
+            "fail": fail_count,
+            "ok": fail_count == 0,
+        },
+        "checks": [
+            {"status": status, "item": item, "detail": detail}
+            for status, item, detail in results
+        ],
+    }
+
+    try:
+        output_dir = Path("output") / "meta"
+        output_dir.mkdir(parents=True, exist_ok=True)
+        output_path = output_dir / "doctor_report.json"
+        output_path.write_text(
+            json.dumps(report, ensure_ascii=False, indent=2),
+            encoding="utf-8",
+        )
+        print(f"体检报告已保存: {output_path}")
+    except Exception as e:
+        print(f"⚠️ 体检报告保存失败: {e}")
+
+
+def _run_doctor(config_path: Optional[str] = None) -> bool:
+    """运行环境体检"""
+    print("=" * 60)
+    print(f"TrendRadar v{__version__} 环境体检")
+    print("=" * 60)
+
+    results: List[Tuple[str, str, str]] = []
+    config = None
+
+    # 1) Python 版本检查
+    py_ok = sys.version_info >= (3, 10)
+    py_version = f"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}"
+    if py_ok:
+        _record_doctor_result(results, "pass", "Python版本", f"{py_version} (满足 >= 3.10)")
+    else:
+        _record_doctor_result(results, "fail", "Python版本", f"{py_version} (不满足 >= 3.10)")
+
+    # 2) 关键文件检查
+    if config_path is None:
+        config_path = os.environ.get("CONFIG_PATH", "config/config.yaml")
+
+    required_files = [
+        (config_path, "主配置文件"),
+        ("config/frequency_words.txt", "关键词文件"),
+    ]
+    optional_files = [
+        ("config/timeline.yaml", "调度文件"),
+    ]
+
+    for path_str, desc in required_files:
+        if Path(path_str).exists():
+            _record_doctor_result(results, "pass", desc, f"已找到: {path_str}")
+        else:
+            _record_doctor_result(results, "fail", desc, f"缺失: {path_str}")
+
+    for path_str, desc in optional_files:
+        if Path(path_str).exists():
+            _record_doctor_result(results, "pass", desc, f"已找到: {path_str}")
+        else:
+            _record_doctor_result(results, "warn", desc, f"未找到: {path_str}(将使用默认调度模板)")
+
+    # 3) 配置加载检查
+    try:
+        config = load_config(config_path)
+        _record_doctor_result(results, "pass", "配置加载", f"加载成功: {config_path}")
+    except Exception as e:
+        _record_doctor_result(results, "fail", "配置加载", f"加载失败: {e}")
+
+    # 后续检查依赖配置对象
+    if config:
+        # 4) 调度配置检查
+        try:
+            ctx = AppContext(config)
+            schedule = ctx.create_scheduler().resolve()
+            detail = f"调度解析成功(report_mode={schedule.report_mode}, ai_mode={schedule.ai_mode})"
+            _record_doctor_result(results, "pass", "调度配置", detail)
+        except Exception as e:
+            _record_doctor_result(results, "fail", "调度配置", f"解析失败: {e}")
+
+        # 5) AI 配置检查(按功能场景区分严重级别)
+        ai_analysis_enabled = config.get("AI_ANALYSIS", {}).get("ENABLED", False)
+        ai_translation_enabled = config.get("AI_TRANSLATION", {}).get("ENABLED", False)
+        ai_filter_enabled = config.get("FILTER", {}).get("METHOD", "keyword") == "ai"
+        ai_enabled = ai_analysis_enabled or ai_translation_enabled or ai_filter_enabled
+
+        if ai_enabled:
+            try:
+                from trendradar.ai.client import AIClient
+                valid, message = AIClient(config.get("AI", {})).validate_config()
+                if valid:
+                    _record_doctor_result(results, "pass", "AI配置", f"模型: {config.get('AI', {}).get('MODEL', '')}")
+                else:
+                    # AI 分析/翻译是硬依赖;AI 筛选缺失时会自动回退关键词匹配
+                    if ai_analysis_enabled or ai_translation_enabled:
+                        _record_doctor_result(results, "fail", "AI配置", message)
+                    else:
+                        _record_doctor_result(results, "warn", "AI配置", f"{message}(AI 筛选将回退关键词模式)")
+            except Exception as e:
+                _record_doctor_result(results, "fail", "AI配置", f"校验异常: {e}")
+        else:
+            _record_doctor_result(results, "warn", "AI配置", "未启用 AI 功能,跳过校验")
+
+        # 6) 存储配置检查
+        try:
+            storage_cfg = config.get("STORAGE", {})
+            backend = storage_cfg.get("BACKEND", "auto")
+            remote = storage_cfg.get("REMOTE", {})
+            missing_remote_keys = [
+                k for k in ("BUCKET_NAME", "ACCESS_KEY_ID", "SECRET_ACCESS_KEY", "ENDPOINT_URL")
+                if not remote.get(k)
+            ]
+
+            if backend == "remote" and missing_remote_keys:
+                _record_doctor_result(
+                    results, "fail", "存储配置",
+                    f"remote 模式缺少配置: {', '.join(missing_remote_keys)}"
+                )
+            elif backend == "auto" and os.environ.get("GITHUB_ACTIONS") == "true" and missing_remote_keys:
+                _record_doctor_result(
+                    results, "warn", "存储配置",
+                    "GitHub Actions + auto 模式未完整配置远程存储,可能导致数据丢失"
+                )
+            else:
+                sm = AppContext(config).get_storage_manager()
+                _record_doctor_result(results, "pass", "存储配置", f"当前后端: {sm.backend_name}")
+        except Exception as e:
+            _record_doctor_result(results, "fail", "存储配置", f"检查失败: {e}")
+
+        # 7) 通知渠道配置检查
+        channel_details = []
+        channel_issues = []
+        max_accounts = config.get("MAX_ACCOUNTS_PER_CHANNEL", 3)
+
+        # 普通单值/多值渠道
+        for key, name in [
+            ("FEISHU_WEBHOOK_URL", "飞书"),
+            ("DINGTALK_WEBHOOK_URL", "钉钉"),
+            ("WEWORK_WEBHOOK_URL", "企业微信"),
+            ("BARK_URL", "Bark"),
+            ("SLACK_WEBHOOK_URL", "Slack"),
+            ("GENERIC_WEBHOOK_URL", "通用Webhook"),
+        ]:
+            values = parse_multi_account_config(config.get(key, ""))
+            if values:
+                channel_details.append(f"{name}({min(len(values), max_accounts)}个)")
+
+        # Telegram 配对校验
+        tg_tokens = parse_multi_account_config(config.get("TELEGRAM_BOT_TOKEN", ""))
+        tg_chats = parse_multi_account_config(config.get("TELEGRAM_CHAT_ID", ""))
+        if tg_tokens or tg_chats:
+            valid, count = validate_paired_configs(
+                {"bot_token": tg_tokens, "chat_id": tg_chats},
+                "Telegram",
+                required_keys=["bot_token", "chat_id"],
+            )
+            if valid and count > 0:
+                channel_details.append(f"Telegram({min(count, max_accounts)}个)")
+            else:
+                channel_issues.append("Telegram bot_token/chat_id 配置不完整或数量不一致")
+
+        # ntfy 配对校验(token 可选)
+        ntfy_server = config.get("NTFY_SERVER_URL", "")
+        ntfy_topics = parse_multi_account_config(config.get("NTFY_TOPIC", ""))
+        ntfy_tokens = parse_multi_account_config(config.get("NTFY_TOKEN", ""))
+        if ntfy_server and ntfy_topics:
+            if ntfy_tokens:
+                valid, count = validate_paired_configs(
+                    {"topic": ntfy_topics, "token": ntfy_tokens},
+                    "ntfy",
+                )
+                if valid and count > 0:
+                    channel_details.append(f"ntfy({min(count, max_accounts)}个)")
+                else:
+                    channel_issues.append("ntfy topic/token 数量不一致")
+            else:
+                channel_details.append(f"ntfy({min(len(ntfy_topics), max_accounts)}个)")
+
+        # 邮件配置完整性
+        email_ready = all(
+            [
+                config.get("EMAIL_FROM"),
+                config.get("EMAIL_PASSWORD"),
+                config.get("EMAIL_TO"),
+            ]
+        )
+        if email_ready:
+            channel_details.append("邮件")
+        elif any([config.get("EMAIL_FROM"), config.get("EMAIL_PASSWORD"), config.get("EMAIL_TO")]):
+            channel_issues.append("邮件配置不完整(需要 from/password/to 同时配置)")
+
+        if channel_issues and not channel_details:
+            _record_doctor_result(results, "fail", "通知配置", ";".join(channel_issues))
+        elif channel_issues and channel_details:
+            detail = f"可用渠道: {', '.join(channel_details)};问题: {';'.join(channel_issues)}"
+            _record_doctor_result(results, "warn", "通知配置", detail)
+        elif channel_details:
+            _record_doctor_result(results, "pass", "通知配置", f"可用渠道: {', '.join(channel_details)}")
+        else:
+            _record_doctor_result(results, "warn", "通知配置", "未配置任何通知渠道")
+
+        # 8) 输出目录可写检查
+        try:
+            output_dir = Path("output")
+            output_dir.mkdir(parents=True, exist_ok=True)
+            probe_file = output_dir / ".doctor_write_probe"
+            probe_file.write_text("ok", encoding="utf-8")
+            probe_file.unlink(missing_ok=True)
+            _record_doctor_result(results, "pass", "输出目录", f"可写: {output_dir}")
+        except Exception as e:
+            _record_doctor_result(results, "fail", "输出目录", f"不可写: {e}")
+
+    pass_count = sum(1 for status, _, _ in results if status == "pass")
+    warn_count = sum(1 for status, _, _ in results if status == "warn")
+    fail_count = sum(1 for status, _, _ in results if status == "fail")
+
+    _save_doctor_report(results, pass_count, warn_count, fail_count, config_path)
+
+    print("-" * 60)
+    print(f"体检结果: ✅ {pass_count} 项通过  ⚠️ {warn_count} 项警告  ❌ {fail_count} 项失败")
+    print("=" * 60)
+
+    if fail_count == 0:
+        print("体检通过。")
+        return True
+
+    print("体检未通过,请先修复失败项。")
+    return False
+
+
+def _build_test_report_data(ctx: AppContext) -> Dict:
+    """构造通知测试用报告数据"""
+    now = ctx.get_time()
+    time_display = now.strftime("%H:%M")
+    title = f"TrendRadar 通知测试消息({now.strftime('%Y-%m-%d %H:%M:%S')})"
+
+    return {
+        "stats": [
+            {
+                "word": "连通性测试",
+                "count": 1,
+                "titles": [
+                    {
+                        "title": title,
+                        "source_name": "TrendRadar",
+                        "url": "https://github.com/sansan0/TrendRadar",
+                        "mobile_url": "",
+                        "ranks": [1],
+                        "rank_threshold": ctx.rank_threshold,
+                        "count": 1,
+                        "is_new": True,
+                        "time_display": time_display,
+                        "matched_keyword": "连通性测试",
+                    }
+                ],
+            }
+        ],
+        "failed_ids": [],
+        "new_titles": [],
+        "id_to_name": {},
+    }
+
+
+def _create_test_html_file(ctx: AppContext) -> Optional[str]:
+    """创建邮件测试用 HTML 文件"""
+    try:
+        now = ctx.get_time()
+        output_dir = Path("output") / "html" / ctx.format_date()
+        output_dir.mkdir(parents=True, exist_ok=True)
+        html_path = output_dir / f"notification_test_{ctx.format_time()}.html"
+        html_content = f"""<!DOCTYPE html>
+<html lang="zh-CN">
+<head><meta charset="UTF-8"><title>TrendRadar 通知测试</title></head>
+<body>
+<h2>TrendRadar 通知连通性测试</h2>
+<p>测试时间:{now.strftime('%Y-%m-%d %H:%M:%S')} ({ctx.timezone})</p>
+<p>这是一条测试消息,用于验证邮件渠道是否可达。</p>
+</body>
+</html>"""
+        html_path.write_text(html_content, encoding="utf-8")
+        return str(html_path)
+    except Exception as e:
+        print(f"[测试通知] 创建测试 HTML 失败: {e}")
+        return None
+
+
+def _run_test_notification(config: Dict) -> bool:
+    """发送测试通知到已配置渠道"""
+    from trendradar.notification import NotificationDispatcher
+
+    ctx = AppContext(config)
+
+    try:
+        # 检查是否配置了通知渠道
+        has_notification = any(
+            [
+                config.get("FEISHU_WEBHOOK_URL"),
+                config.get("DINGTALK_WEBHOOK_URL"),
+                config.get("WEWORK_WEBHOOK_URL"),
+                (config.get("TELEGRAM_BOT_TOKEN") and config.get("TELEGRAM_CHAT_ID")),
+                (config.get("EMAIL_FROM") and config.get("EMAIL_PASSWORD") and config.get("EMAIL_TO")),
+                (config.get("NTFY_SERVER_URL") and config.get("NTFY_TOPIC")),
+                config.get("BARK_URL"),
+                config.get("SLACK_WEBHOOK_URL"),
+                config.get("GENERIC_WEBHOOK_URL"),
+            ]
+        )
+        if not has_notification:
+            print("未检测到可用通知渠道,请先在 config.yaml 或环境变量中配置。")
+            return False
+
+        # 测试时固定展示区域,避免用户关闭 HOTLIST 导致测试内容为空
+        test_config = copy.deepcopy(config)
+        test_display = test_config.setdefault("DISPLAY", {})
+        test_regions = test_display.setdefault("REGIONS", {})
+        test_regions.update(
+            {
+                "HOTLIST": True,
+                "NEW_ITEMS": False,
+                "RSS": False,
+                "STANDALONE": False,
+                "AI_ANALYSIS": False,
+            }
+        )
+
+        # 测试时禁用翻译,避免触发额外 AI 调用
+        if "AI_TRANSLATION" in test_config:
+            test_config["AI_TRANSLATION"]["ENABLED"] = False
+
+        proxy_url = test_config.get("DEFAULT_PROXY", "") if test_config.get("USE_PROXY") else None
+        if proxy_url:
+            print("[测试通知] 检测到代理配置,将使用代理发送")
+
+        dispatcher = NotificationDispatcher(
+            config=test_config,
+            get_time_func=ctx.get_time,
+            split_content_func=ctx.split_content,
+            translator=None,
+        )
+
+        report_data = _build_test_report_data(ctx)
+        html_file_path = _create_test_html_file(ctx)
+
+        print("=" * 60)
+        print("通知连通性测试")
+        print("=" * 60)
+
+        results = dispatcher.dispatch_all(
+            report_data=report_data,
+            report_type="通知连通性测试",
+            proxy_url=proxy_url,
+            mode="daily",
+            html_file_path=html_file_path,
+        )
+
+        if not results:
+            print("没有可测试的有效通知渠道(可能配置不完整)。")
+            return False
+
+        print("-" * 60)
+        success_count = 0
+        for channel, ok in results.items():
+            if ok:
+                success_count += 1
+                print(f"✅ {channel}: 测试成功")
+            else:
+                print(f"❌ {channel}: 测试失败")
+
+        print("-" * 60)
+        print(f"测试结果: {success_count}/{len(results)} 个渠道成功")
+        return success_count > 0
+    finally:
+        ctx.cleanup()
+
+
 def main():
     """主程序入口"""
     # 解析命令行参数
@@ -1667,10 +2125,15 @@ def main():
         epilog="""
 调度状态命令:
   --show-schedule        显示当前调度状态(时间段、行为开关)
+诊断命令:
+  --doctor               运行环境与配置体检
+  --test-notification    发送测试通知到已配置渠道
 
 示例:
   python -m trendradar                    # 正常运行
   python -m trendradar --show-schedule    # 查看当前调度状态
+  python -m trendradar --doctor           # 运行一键体检
+  python -m trendradar --test-notification # 测试通知渠道连通性
 """
     )
     parser.add_argument(
@@ -1678,17 +2141,41 @@ def main():
         action="store_true",
         help="显示当前调度状态"
     )
+    parser.add_argument(
+        "--doctor",
+        action="store_true",
+        help="运行环境与配置体检"
+    )
+    parser.add_argument(
+        "--test-notification",
+        action="store_true",
+        help="发送测试通知到已配置渠道"
+    )
 
     args = parser.parse_args()
 
     debug_mode = False
     try:
+        # 处理 doctor 命令(不依赖完整运行流程)
+        if args.doctor:
+            ok = _run_doctor()
+            if not ok:
+                raise SystemExit(1)
+            return
+
         # 先加载配置
         config = load_config()
 
         # 处理状态查看命令
         if args.show_schedule:
-            _handle_status_commands(config, args)
+            _handle_status_commands(config)
+            return
+
+        # 处理通知测试命令
+        if args.test_notification:
+            ok = _run_test_notification(config)
+            if not ok:
+                raise SystemExit(1)
             return
 
         version_url = config.get("VERSION_CHECK_URL", "")
@@ -1725,7 +2212,7 @@ def main():
             raise
 
 
-def _handle_status_commands(config: Dict, args) -> None:
+def _handle_status_commands(config: Dict) -> None:
     """处理状态查看命令 - 显示当前调度状态"""
     from trendradar.context import AppContext
 

+ 4 - 0
trendradar/ai/__init__.py

@@ -6,6 +6,7 @@ TrendRadar AI 模块
 """
 
 from .analyzer import AIAnalyzer, AIAnalysisResult
+from .filter import AIFilter, AIFilterResult
 from .translator import AITranslator, TranslationResult, BatchTranslationResult
 from .formatter import (
     get_ai_analysis_renderer,
@@ -21,6 +22,9 @@ __all__ = [
     # 分析器
     "AIAnalyzer",
     "AIAnalysisResult",
+    # 智能筛选
+    "AIFilter",
+    "AIFilterResult",
     # 翻译器
     "AITranslator",
     "TranslationResult",

+ 113 - 36
trendradar/ai/analyzer.py

@@ -219,6 +219,17 @@ class AIAnalyzer:
             response = self._call_ai(user_prompt)
             result = self._parse_response(response)
 
+            # JSON 解析失败时的重试兜底(仅重试一次)
+            if result.error and "JSON 解析错误" in result.error:
+                print(f"[AI] JSON 解析失败,尝试让 AI 修复...")
+                retry_result = self._retry_fix_json(response, result.error)
+                if retry_result and retry_result.success and not retry_result.error:
+                    print("[AI] JSON 修复成功")
+                    retry_result.raw_response = response
+                    result = retry_result
+                else:
+                    print("[AI] JSON 修复失败,使用原始文本兜底")
+
             # 如果配置未启用 RSS 分析,强制清空 AI 返回的 RSS 洞察
             if not self.include_rss:
                 result.rss_insights = ""
@@ -376,6 +387,49 @@ class AIAnalyzer:
 
         return self.client.chat(messages)
 
+    def _retry_fix_json(self, original_response: str, error_msg: str) -> Optional[AIAnalysisResult]:
+        """
+        JSON 解析失败时,请求 AI 修复 JSON(仅重试一次)
+
+        使用轻量 prompt,不重复原始分析的 system prompt,节省 token。
+
+        Args:
+            original_response: AI 原始响应(JSON 格式有误)
+            error_msg: JSON 解析的错误信息
+
+        Returns:
+            修复后的分析结果,失败时返回 None
+        """
+        messages = [
+            {
+                "role": "system",
+                "content": (
+                    "你是一个 JSON 修复助手。用户会提供一段格式有误的 JSON 和错误信息,"
+                    "你需要修复 JSON 格式错误并返回正确的 JSON。\n"
+                    "常见问题:字符串值内的双引号未转义、缺少逗号、字符串未正确闭合等。\n"
+                    "只返回纯 JSON,不要包含 markdown 代码块标记(如 ```json)或任何说明文字。"
+                ),
+            },
+            {
+                "role": "user",
+                "content": (
+                    f"以下 JSON 解析失败:\n\n"
+                    f"错误:{error_msg}\n\n"
+                    f"原始内容:\n{original_response}\n\n"
+                    f"请修复以上 JSON 中的格式问题(如值中的双引号改用中文引号「」或转义 \\\"、"
+                    f"缺少逗号、不完整的字符串等),保持原始内容语义不变,只修复格式。"
+                    f"直接返回修复后的纯 JSON。"
+                ),
+            },
+        ]
+
+        try:
+            response = self.client.chat(messages)
+            return self._parse_response(response)
+        except Exception as e:
+            print(f"[AI] 重试修复 JSON 异常: {type(e).__name__}: {e}")
+            return None
+
     def _format_time_range(self, first_time: str, last_time: str) -> str:
         """格式化时间范围(简化显示,只保留时分)"""
         def extract_time(time_str: str) -> str:
@@ -511,30 +565,66 @@ class AIAnalyzer:
             result.error = "AI 返回空响应"
             return result
 
-        try:
-            json_str = response
-
-            if "```json" in response:
-                parts = response.split("```json", 1)
-                if len(parts) > 1:
-                    code_block = parts[1]
-                    end_idx = code_block.find("```")
-                    if end_idx != -1:
-                        json_str = code_block[:end_idx]
-                    else:
-                        json_str = code_block
-            elif "```" in response:
-                parts = response.split("```", 2)
-                if len(parts) >= 2:
-                    json_str = parts[1]
+        # 提取 JSON 文本(去掉 markdown 代码块标记)
+        json_str = response
+
+        if "```json" in response:
+            parts = response.split("```json", 1)
+            if len(parts) > 1:
+                code_block = parts[1]
+                end_idx = code_block.find("```")
+                if end_idx != -1:
+                    json_str = code_block[:end_idx]
+                else:
+                    json_str = code_block
+        elif "```" in response:
+            parts = response.split("```", 2)
+            if len(parts) >= 2:
+                json_str = parts[1]
+
+        json_str = json_str.strip()
+        if not json_str:
+            result.error = "提取的 JSON 内容为空"
+            result.core_trends = response[:500] + "..." if len(response) > 500 else response
+            result.success = True
+            return result
 
-            json_str = json_str.strip()
-            if not json_str:
-                raise ValueError("提取的 JSON 内容为空")
+        # 第一步:标准 JSON 解析
+        data = None
+        parse_error = None
 
+        try:
             data = json.loads(json_str)
+        except json.JSONDecodeError as e:
+            parse_error = e
+
+        # 第二步:json_repair 本地修复
+        if data is None:
+            try:
+                from json_repair import repair_json
+                repaired = repair_json(json_str, return_objects=True)
+                if isinstance(repaired, dict):
+                    data = repaired
+                    print("[AI] JSON 本地修复成功(json_repair)")
+            except Exception:
+                pass
+
+        # 两步都失败,记录错误(后续由 analyze 方法的重试机制处理)
+        if data is None:
+            if parse_error:
+                error_context = json_str[max(0, parse_error.pos - 30):parse_error.pos + 30] if json_str and parse_error.pos else ""
+                result.error = f"JSON 解析错误 (位置 {parse_error.pos}): {parse_error.msg}"
+                if error_context:
+                    result.error += f",上下文: ...{error_context}..."
+            else:
+                result.error = "JSON 解析失败"
+            # 兜底:使用已提取的 json_str(不含 markdown 标记),避免推送中出现 ```json
+            result.core_trends = json_str[:500] + "..." if len(json_str) > 500 else json_str
+            result.success = True
+            return result
 
-            # 新版字段解析
+        # 解析成功,提取字段
+        try:
             result.core_trends = data.get("core_trends", "")
             result.sentiment_controversy = data.get("sentiment_controversy", "")
             result.signals = data.get("signals", "")
@@ -547,24 +637,11 @@ class AIAnalyzer:
                 result.standalone_summaries = {
                     str(k): str(v) for k, v in summaries.items()
                 }
-            
-            result.success = True
 
-        except json.JSONDecodeError as e:
-            error_context = json_str[max(0, e.pos - 30):e.pos + 30] if json_str and e.pos else ""
-            result.error = f"JSON 解析错误 (位置 {e.pos}): {e.msg}"
-            if error_context:
-                result.error += f",上下文: ...{error_context}..."
-            # 使用原始响应填充 core_trends,确保有输出
-            result.core_trends = response[:500] + "..." if len(response) > 500 else response
-            result.success = True
-        except (IndexError, KeyError, TypeError, ValueError) as e:
-            result.error = f"响应解析错误: {type(e).__name__}: {str(e)}"
-            result.core_trends = response[:500] if len(response) > 500 else response
             result.success = True
-        except Exception as e:
-            result.error = f"解析时发生未知错误: {type(e).__name__}: {str(e)}"
-            result.core_trends = response[:500] if len(response) > 500 else response
+        except (KeyError, TypeError, AttributeError) as e:
+            result.error = f"字段提取错误: {type(e).__name__}: {e}"
+            result.core_trends = json_str[:500] + "..." if len(json_str) > 500 else json_str
             result.success = True
 
         return result

+ 9 - 2
trendradar/ai/client.py

@@ -7,7 +7,7 @@ AI 客户端模块
 """
 
 import os
-from typing import Any, Dict, List, Optional
+from typing import Any, Dict, List
 
 from litellm import completion
 
@@ -92,7 +92,14 @@ class AIClient:
         response = completion(**params)
 
         # 提取响应内容
-        return response.choices[0].message.content
+        # 某些模型/提供商返回 list(内容块)而非 str,统一转为 str
+        content = response.choices[0].message.content
+        if isinstance(content, list):
+            content = "\n".join(
+                item.get("text", str(item)) if isinstance(item, dict) else str(item)
+                for item in content
+            )
+        return content or ""
 
     def validate_config(self) -> tuple[bool, str]:
         """

+ 586 - 0
trendradar/ai/filter.py

@@ -0,0 +1,586 @@
+# coding=utf-8
+"""
+AI 智能筛选模块
+
+通过 AI 对新闻进行标签分类:
+1. 阶段 A:从用户兴趣描述中提取结构化标签
+2. 阶段 B:对新闻标题按标签进行批量分类
+"""
+
+import hashlib
+import json
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any, Callable, Dict, List, Optional
+
+from trendradar.ai.client import AIClient
+
+
+@dataclass
+class AIFilterResult:
+    """AI 筛选结果,传给报告和通知模块"""
+    tags: List[Dict] = field(default_factory=list)
+    # [{"tag": str, "description": str, "count": int, "items": [
+    #     {"title": str, "source_id": str, "source_name": str,
+    #      "url": str, "mobile_url": str, "rank": int, "ranks": [...],
+    #      "first_time": str, "last_time": str, "count": int,
+    #      "relevance_score": float, "source_type": str}
+    # ]}]
+    total_matched: int = 0       # 匹配新闻总数
+    total_processed: int = 0     # 处理新闻总数
+    success: bool = False
+    error: str = ""
+
+
+class AIFilter:
+    """AI 智能筛选器"""
+
+    def __init__(
+        self,
+        ai_config: Dict[str, Any],
+        filter_config: Dict[str, Any],
+        get_time_func: Callable,
+        debug: bool = False,
+    ):
+        self.client = AIClient(ai_config)
+        self.filter_config = filter_config
+        self.batch_size = filter_config.get("BATCH_SIZE", 200)
+        self.get_time_func = get_time_func
+        self.debug = debug
+
+        # 加载提示词模板
+        self.classify_system, self.classify_user = self._load_prompt(
+            filter_config.get("PROMPT_FILE", "ai_filter_prompt.txt")
+        )
+        self.extract_system, self.extract_user = self._load_prompt(
+            filter_config.get("EXTRACT_PROMPT_FILE", "ai_filter_extract_prompt.txt")
+        )
+        self.update_tags_system, self.update_tags_user = self._load_prompt(
+            filter_config.get("UPDATE_TAGS_PROMPT_FILE", "update_tags_prompt.txt")
+        )
+
+    def _load_prompt(self, filename: str) -> tuple:
+        """加载提示词文件,返回 (system_prompt, user_prompt_template)"""
+        config_dir = Path(__file__).parent.parent.parent / "config" / "ai_filter"
+        prompt_path = config_dir / filename
+
+        if not prompt_path.exists():
+            print(f"[AI筛选] 提示词文件不存在: {prompt_path}")
+            return "", ""
+
+        content = prompt_path.read_text(encoding="utf-8")
+
+        system_prompt = ""
+        user_prompt = ""
+
+        if "[system]" in content and "[user]" in content:
+            parts = content.split("[user]")
+            system_part = parts[0]
+            user_part = parts[1] if len(parts) > 1 else ""
+
+            if "[system]" in system_part:
+                system_prompt = system_part.split("[system]")[1].strip()
+            user_prompt = user_part.strip()
+        else:
+            user_prompt = content
+
+        return system_prompt, user_prompt
+
+    def compute_interests_hash(self, interests_content: str, filename: str = "ai_interests.txt") -> str:
+        """计算兴趣描述的 hash,格式为 filename:md5"""
+        # 去除前后空白和注释行,确保内容变化才改变 hash
+        lines = []
+        for line in interests_content.strip().splitlines():
+            line = line.strip()
+            if line and not line.startswith("#"):
+                lines.append(line)
+        normalized = "\n".join(lines)
+        content_hash = hashlib.md5(normalized.encode("utf-8")).hexdigest()
+        return f"{filename}:{content_hash}"
+
+    def load_interests_content(self, interests_file: Optional[str] = None) -> Optional[str]:
+        """加载兴趣描述文件内容
+
+        解析逻辑:
+        - interests_file 为 None:使用默认 config/ai_interests.txt
+        - interests_file 有值:仅查 config/custom/ai/{filename}
+
+        注意:调用方(context.py)已完成 config/timeline 的合并决策,
+        此处不再二次读取 filter_config,避免语义冲突。
+        """
+        config_dir = Path(__file__).parent.parent.parent / "config"
+        configured_file = interests_file
+
+        if configured_file:
+            # 自定义兴趣文件:仅查 custom/ai 目录
+            filename = configured_file
+            interests_path = config_dir / "custom" / "ai" / filename
+            if not interests_path.exists():
+                print(f"[AI筛选] 自定义兴趣描述文件不存在: {filename}")
+                print(f"[AI筛选]   已查找: {interests_path}")
+                return None
+        else:
+            # 默认兴趣文件:固定使用 config/ai_interests.txt
+            filename = "ai_interests.txt"
+            interests_path = config_dir / filename
+            if not interests_path.exists():
+                print(f"[AI筛选] 默认兴趣描述文件不存在: {filename}")
+                print(f"[AI筛选]   已查找: {interests_path}")
+                return None
+
+        if not interests_path.exists():
+            print(f"[AI筛选] 兴趣描述文件不存在: {interests_path}")
+            return None
+
+        content = interests_path.read_text(encoding="utf-8").strip()
+        if not content:
+            print("[AI筛选] 兴趣描述文件为空")
+            return None
+
+        return content
+
+    def extract_tags(self, interests_content: str) -> List[Dict]:
+        """
+        阶段 A:从兴趣描述中提取结构化标签
+
+        Args:
+            interests_content: 用户的兴趣描述文本
+
+        Returns:
+            [{"tag": str, "description": str}, ...]
+        """
+        if not self.extract_user:
+            print("[AI筛选] 标签提取提示词模板为空")
+            return []
+
+        user_prompt = self.extract_user.replace("{interests_content}", interests_content)
+
+        messages = []
+        if self.extract_system:
+            messages.append({"role": "system", "content": self.extract_system})
+        messages.append({"role": "user", "content": user_prompt})
+
+        if self.debug:
+            print(f"\n[AI筛选][DEBUG] === 标签提取 Prompt ===")
+            for m in messages:
+                print(f"[{m['role']}]\n{m['content']}")
+            print(f"[AI筛选][DEBUG] === Prompt 结束 ===")
+
+        try:
+            response = self.client.chat(messages)
+
+            if self.debug:
+                print(f"\n[AI筛选][DEBUG] === 标签提取 AI 原始响应 ===")
+                # 尝试格式化 JSON 便于阅读
+                self._print_formatted_json(response)
+                print(f"[AI筛选][DEBUG] === 响应结束 ===")
+
+            tags = self._parse_tags_response(response)
+            print(f"[AI筛选] 提取到 {len(tags)} 个标签")
+            for t in tags:
+                print(f"   {t['tag']}: {t.get('description', '')}")
+
+            if self.debug:
+                json_str = self._extract_json(response)
+                if not json_str:
+                    print(f"[AI筛选][DEBUG] 无法从响应中提取 JSON")
+                else:
+                    raw_data = json.loads(json_str)
+                    raw_tags = raw_data.get("tags", [])
+                    skipped = len(raw_tags) - len(tags)
+                    if skipped > 0:
+                        print(f"[AI筛选][DEBUG] 原始标签 {len(raw_tags)} 个,有效 {len(tags)} 个,跳过 {skipped} 个(缺少 tag 字段或格式无效)")
+
+            return tags
+        except json.JSONDecodeError as e:
+            print(f"[AI筛选] 标签提取失败: JSON 解析错误: {e}")
+            if self.debug:
+                print(f"[AI筛选][DEBUG] 尝试解析的 JSON 内容: {self._extract_json(response) if response else '(空响应)'}")
+            return []
+        except Exception as e:
+            print(f"[AI筛选] 标签提取失败: {type(e).__name__}: {e}")
+            return []
+
+    def update_tags(self, old_tags: List[Dict], interests_content: str) -> Optional[Dict]:
+        """
+        阶段 A':AI 对比旧标签和新兴趣描述,给出更新方案
+
+        Args:
+            old_tags: [{"tag": str, "description": str, "id": int}, ...]
+            interests_content: 新的兴趣描述文本
+
+        Returns:
+            {"keep": [{"tag": str, "description": str}],
+             "add": [{"tag": str, "description": str}],
+             "remove": [str],
+             "change_ratio": float}
+            失败返回 None
+        """
+        if not self.update_tags_user:
+            print("[AI筛选] 标签更新提示词模板为空,回退到重新提取")
+            return None
+
+        # 构造旧标签 JSON
+        old_tags_json = json.dumps(
+            [{"tag": t["tag"], "description": t.get("description", "")} for t in old_tags],
+            ensure_ascii=False, indent=2
+        )
+
+        user_prompt = self.update_tags_user.replace(
+            "{old_tags_json}", old_tags_json
+        ).replace(
+            "{interests_content}", interests_content
+        )
+
+        messages = []
+        if self.update_tags_system:
+            messages.append({"role": "system", "content": self.update_tags_system})
+        messages.append({"role": "user", "content": user_prompt})
+
+        if self.debug:
+            print(f"\n[AI筛选][DEBUG] === 标签更新 Prompt ===")
+            for m in messages:
+                print(f"[{m['role']}]\n{m['content']}")
+            print(f"[AI筛选][DEBUG] === Prompt 结束 ===")
+
+        try:
+            response = self.client.chat(messages)
+
+            if self.debug:
+                print(f"\n[AI筛选][DEBUG] === 标签更新 AI 原始响应 ===")
+                self._print_formatted_json(response)
+                print(f"[AI筛选][DEBUG] === 响应结束 ===")
+
+            result = self._parse_update_tags_response(response)
+            if result is None:
+                return None
+
+            keep_count = len(result.get("keep", []))
+            add_count = len(result.get("add", []))
+            remove_count = len(result.get("remove", []))
+            ratio = result.get("change_ratio", 0)
+            print(f"[AI筛选] AI 标签更新方案: 保留 {keep_count}, 新增 {add_count}, 移除 {remove_count}, change_ratio={ratio:.2f}")
+
+            return result
+        except Exception as e:
+            print(f"[AI筛选] 标签更新失败: {type(e).__name__}: {e}")
+            return None
+
+    def _parse_update_tags_response(self, response: str) -> Optional[Dict]:
+        """解析标签更新的 AI 响应"""
+        json_str = self._extract_json(response)
+        if not json_str:
+            print("[AI筛选] 无法从标签更新响应中提取 JSON")
+            return None
+
+        data = json.loads(json_str)
+
+        # 校验必需字段
+        keep = data.get("keep", [])
+        add = data.get("add", [])
+        remove = data.get("remove", [])
+        change_ratio = float(data.get("change_ratio", 0))
+
+        # 校验 keep/add 格式
+        validated_keep = []
+        for t in keep:
+            if isinstance(t, dict) and "tag" in t:
+                validated_keep.append({
+                    "tag": str(t["tag"]).strip(),
+                    "description": str(t.get("description", "")).strip(),
+                })
+
+        validated_add = []
+        for t in add:
+            if isinstance(t, dict) and "tag" in t:
+                validated_add.append({
+                    "tag": str(t["tag"]).strip(),
+                    "description": str(t.get("description", "")).strip(),
+                })
+
+        validated_remove = [str(r).strip() for r in remove if r]
+
+        # change_ratio 限制在 0~1
+        change_ratio = max(0.0, min(1.0, change_ratio))
+
+        return {
+            "keep": validated_keep,
+            "add": validated_add,
+            "remove": validated_remove,
+            "change_ratio": change_ratio,
+        }
+
+    def _parse_tags_response(self, response: str) -> List[Dict]:
+        """解析标签提取的 AI 响应"""
+        json_str = self._extract_json(response)
+        if not json_str:
+            return []
+
+        data = json.loads(json_str)
+        tags_raw = data.get("tags", [])
+
+        tags = []
+        for t in tags_raw:
+            if not isinstance(t, dict) or "tag" not in t:
+                continue
+            tags.append({
+                "tag": str(t["tag"]).strip(),
+                "description": str(t.get("description", "")).strip(),
+            })
+
+        return tags
+
+    def classify_batch(
+        self,
+        titles: List[Dict],
+        tags: List[Dict],
+        interests_content: str = "",
+    ) -> List[Dict]:
+        """
+        阶段 B:对一批新闻标题做分类
+
+        Args:
+            titles: [{"id": news_item_id, "title": str, "source": str}]
+            tags: [{"id": tag_id, "tag": str, "description": str}]
+            interests_content: 用户的兴趣描述(含质量过滤要求)
+
+        Returns:
+            [{"news_item_id": int, "tag_id": int, "relevance_score": float}, ...]
+        """
+        if not titles or not tags:
+            return []
+
+        if not self.classify_user:
+            print("[AI筛选] 分类提示词模板为空")
+            return []
+
+        # 构建标签列表文本
+        tags_list = "\n".join(
+            f"{t['id']}. {t['tag']}: {t.get('description', '')}"
+            for t in tags
+        )
+
+        # 构建新闻列表文本
+        news_list = "\n".join(
+            f"{t['id']}. [{t.get('source', '')}] {t['title']}"
+            for t in titles
+        )
+
+        # 填充模板
+        user_prompt = self.classify_user
+        user_prompt = user_prompt.replace("{interests_content}", interests_content)
+        user_prompt = user_prompt.replace("{tags_list}", tags_list)
+        user_prompt = user_prompt.replace("{news_count}", str(len(titles)))
+        user_prompt = user_prompt.replace("{news_list}", news_list)
+
+        messages = []
+        if self.classify_system:
+            messages.append({"role": "system", "content": self.classify_system})
+        messages.append({"role": "user", "content": user_prompt})
+
+        if self.debug:
+            print(f"\n[AI筛选][DEBUG] === 分类 Prompt (标题数={len(titles)}, 标签={len(tags)}) ===")
+            for m in messages:
+                role = m['role']
+                content = m['content']
+                # 截断过长的新闻列表:只显示前5条和后5条
+                lines = content.split('\n')
+                # 找到新闻列表区域并截断
+                if len(lines) > 30:
+                    # 显示前15行 + 省略提示 + 后10行
+                    head = lines[:15]
+                    tail = lines[-10:]
+                    omitted = len(lines) - 25
+                    truncated = '\n'.join(head) + f'\n... (省略 {omitted} 行) ...\n' + '\n'.join(tail)
+                    print(f"[{role}]\n{truncated}")
+                else:
+                    print(f"[{role}]\n{content}")
+            print(f"[AI筛选][DEBUG] === Prompt 结束 (长度: {sum(len(m['content']) for m in messages)} 字符) ===")
+
+        try:
+            response = self.client.chat(messages)
+
+            return self._parse_classify_response(response, titles, tags)
+        except Exception as e:
+            print(f"[AI筛选] 分类请求失败: {type(e).__name__}: {e}")
+            return []
+
+    def _parse_classify_response(
+        self,
+        response: str,
+        titles: List[Dict],
+        tags: List[Dict],
+    ) -> List[Dict]:
+        """解析分类的 AI 响应
+
+        支持两种 JSON 格式:
+        - 新格式(扁平): [{"id": 1, "tag_id": 1, "score": 0.9}, ...]
+        - 旧格式(嵌套): [{"id": 1, "tags": [{"tag_id": 1, "score": 0.9}]}, ...]
+
+        每条新闻只保留一个最高分的 tag,杜绝同一条出现在多个标签下。
+        """
+        json_str = self._extract_json(response)
+        if not json_str:
+            if self.debug:
+                print(f"[AI筛选][DEBUG] 无法从分类响应中提取 JSON,原始响应前 500 字符: {(response or '')[:500]}")
+            return []
+
+        try:
+            data = json.loads(json_str)
+        except json.JSONDecodeError as e:
+            if self.debug:
+                print(f"[AI筛选][DEBUG] 分类响应 JSON 解析失败: {e}")
+                print(f"[AI筛选][DEBUG] 提取的 JSON 文本前 500 字符: {json_str[:500]}")
+            return []
+
+        if not isinstance(data, list):
+            if self.debug:
+                print(f"[AI筛选][DEBUG] 分类响应顶层不是数组,实际类型: {type(data).__name__}")
+            return []
+
+        # 构建 id 映射
+        title_ids = {t["id"] for t in titles}
+        title_map = {t["id"]: t["title"] for t in titles}
+        tag_id_set = {t["id"] for t in tags}
+        tag_name_map = {t["id"]: t["tag"] for t in tags}
+
+        # 每条新闻只保留一个最高分的 tag
+        best_per_news: Dict[int, Dict] = {}  # news_id -> {"tag_id": ..., "score": ...}
+        skipped_news_ids = 0
+        skipped_tag_ids = 0
+        skipped_empty = 0
+
+        for item in data:
+            if not isinstance(item, dict):
+                continue
+            news_id = item.get("id")
+            if news_id not in title_ids:
+                skipped_news_ids += 1
+                continue
+
+            # 收集此条新闻的所有候选 tag
+            candidates = []
+
+            if "tag_id" in item:
+                # 新格式(扁平): {"id": 1, "tag_id": 1, "score": 0.9}
+                candidates.append({"tag_id": item["tag_id"], "score": item.get("score", 0.5)})
+            elif "tags" in item:
+                # 旧格式(嵌套): {"id": 1, "tags": [{"tag_id": 1, "score": 0.9}]}
+                matched_tags = item.get("tags", [])
+                if isinstance(matched_tags, list):
+                    if not matched_tags:
+                        skipped_empty += 1
+                        continue
+                    candidates.extend(matched_tags)
+
+            if not candidates:
+                skipped_empty += 1
+                continue
+
+            # 取最高分的有效 tag
+            best_tag_id = None
+            best_score = -1.0
+
+            for tag_match in candidates:
+                if not isinstance(tag_match, dict):
+                    continue
+                tag_id = tag_match.get("tag_id")
+                if tag_id not in tag_id_set:
+                    skipped_tag_ids += 1
+                    continue
+
+                score = tag_match.get("score", 0.5)
+                try:
+                    score = float(score)
+                    score = max(0.0, min(1.0, score))
+                except (ValueError, TypeError):
+                    score = 0.5
+
+                if score > best_score:
+                    best_score = score
+                    best_tag_id = tag_id
+
+            if best_tag_id is not None:
+                # 如果同一条新闻被多次返回,只保留分数更高的
+                existing = best_per_news.get(news_id)
+                if existing is None or best_score > existing["relevance_score"]:
+                    best_per_news[news_id] = {
+                        "news_item_id": news_id,
+                        "tag_id": best_tag_id,
+                        "relevance_score": best_score,
+                    }
+
+        results = list(best_per_news.values())
+
+        if self.debug:
+            ai_returned = len(data)
+            print(f"[AI筛选][DEBUG] --- 分类解析结果 ---")
+            print(f"[AI筛选][DEBUG] AI 返回 {ai_returned} 条, 有效 {len(results)} 条 (每条新闻仅保留最高分 tag)")
+            if skipped_empty > 0:
+                print(f"[AI筛选][DEBUG] 跳过空 tags: {skipped_empty} 条")
+            if skipped_news_ids > 0:
+                print(f"[AI筛选][DEBUG] !! 跳过无效 news_id: {skipped_news_ids} 条")
+            if skipped_tag_ids > 0:
+                print(f"[AI筛选][DEBUG] !! 跳过无效 tag_id: {skipped_tag_ids} 条")
+
+            # 按标签汇总
+            tag_summary: Dict[int, List[str]] = {}
+            for r in results:
+                tid = r["tag_id"]
+                if tid not in tag_summary:
+                    tag_summary[tid] = []
+                tag_summary[tid].append(
+                    f"  [{r['news_item_id']}] {title_map.get(r['news_item_id'], '?')[:40]} (score={r['relevance_score']:.2f})"
+                )
+
+            for tid, items in tag_summary.items():
+                tname = tag_name_map.get(tid, f"tag_{tid}")
+                print(f"[AI筛选][DEBUG] 标签「{tname}」匹配 {len(items)} 条:")
+                for line in items:
+                    print(line)
+
+        return results
+
+    def _extract_json(self, response: str) -> Optional[str]:
+        """从 AI 响应中提取 JSON 字符串"""
+        if not response or not response.strip():
+            return None
+
+        json_str = response.strip()
+
+        if "```json" in json_str:
+            parts = json_str.split("```json", 1)
+            if len(parts) > 1:
+                code_block = parts[1]
+                end_idx = code_block.find("```")
+                json_str = code_block[:end_idx] if end_idx != -1 else code_block
+        elif "```" in json_str:
+            parts = json_str.split("```", 2)
+            if len(parts) >= 2:
+                json_str = parts[1]
+
+        json_str = json_str.strip()
+        return json_str if json_str else None
+
+    def _print_formatted_json(self, response: str) -> None:
+        """格式化打印 AI 响应中的 JSON,便于 debug 阅读"""
+        if not response:
+            print("(空响应)")
+            return
+
+        json_str = self._extract_json(response)
+        if json_str:
+            try:
+                data = json.loads(json_str)
+                if isinstance(data, list):
+                    # 数组:每个元素压成一行
+                    lines = [json.dumps(item, ensure_ascii=False) for item in data]
+                    print("[\n  " + ",\n  ".join(lines) + "\n]")
+                else:
+                    print(json.dumps(data, ensure_ascii=False, indent=2))
+                return
+            except json.JSONDecodeError:
+                pass
+
+        # JSON 解析失败,直接打印原始响应
+        print(response)

+ 6 - 5
trendradar/ai/formatter.py

@@ -36,13 +36,14 @@ def _format_list_content(text: str) -> str:
     result = re.sub(r'(\d+)\.([^ \d])', r'\1. \2', text)
 
     # 2. 强制换行:匹配 "数字.",且前面不是换行符
-    result = re.sub(r'(?<=[^\n])\s+(\d+\.)', r'\n\1', result)
+    #    (?!\d) 排除版本号/小数(如 2.0、3.5),避免将其误判为列表序号
+    result = re.sub(r'(?<=[^\n])\s+(\d+\.)(?!\d)', r'\n\1', result)
     
     # 3. 处理 "1.**粗体**" 这种情况(虽然 Prompt 要求不输出 Markdown,但防御性处理)
     result = re.sub(r'(?<=[^\n])(\d+\.\*\*)', r'\n\1', result)
 
-    # 4. 处理中文标点后的换行
-    result = re.sub(r'([::;,。;,])\s*(\d+\.)', r'\1\n\2', result)
+    # 4. 处理中文标点后的换行(排除版本号/小数)
+    result = re.sub(r'([::;,。;,])\s*(\d+\.)(?!\d)', r'\1\n\2', result)
 
     # 5. 处理 "XX方面:"、"XX领域:" 等子标题换行
     # 只有在中文标点(句号、逗号、分号等)后才触发换行,避免破坏 "1. XX领域:" 格式
@@ -57,9 +58,9 @@ def _format_list_content(text: str) -> str:
     # 用 (?=[^\s::]) 避免正则回溯将冒号误判为"内容"而拆开 【tag】:
     result = re.sub(r'(【[^】]+】[::]?)[ \t]*(?=[^\s::])', r'\1\n', result)
 
-    # 7. 在列表项之间增加视觉空行
+    # 7. 在列表项之间增加视觉空行(排除版本号/小数)
     # 排除 【标签】 行(以】结尾)和子标题行(以冒号结尾)之后的情况,避免标题与首项之间出现空行
-    result = re.sub(r'(?<![::】])\n(\d+\.)', r'\n\n\1', result)
+    result = re.sub(r'(?<![::】])\n(\d+\.)(?!\d)', r'\n\n\1', result)
 
     return result
 

+ 21 - 6
trendradar/ai/translator.py

@@ -6,10 +6,9 @@ AI 翻译器模块
 基于 LiteLLM 统一接口,支持 100+ AI 提供商
 """
 
-import json
 from dataclasses import dataclass, field
 from pathlib import Path
-from typing import Any, Dict, List, Optional
+from typing import Any, Dict, List
 
 from trendradar.ai.client import AIClient
 
@@ -30,6 +29,9 @@ class BatchTranslationResult:
     success_count: int = 0
     fail_count: int = 0
     total_count: int = 0
+    prompt: str = ""                # debug: 发送给 AI 的完整 prompt
+    raw_response: str = ""          # debug: AI 原始响应
+    parsed_count: int = 0           # debug: AI 响应解析出的条目数
 
 
 class AITranslator:
@@ -49,6 +51,7 @@ class AITranslator:
         # 翻译配置
         self.enabled = translation_config.get("ENABLED", False)
         self.target_language = translation_config.get("LANGUAGE", "English")
+        self.scope = translation_config.get("SCOPE", {"HOTLIST": True, "RSS": True, "STANDALONE": True})
 
         # 创建 AI 客户端(基于 LiteLLM)
         self.client = AIClient(ai_config)
@@ -196,11 +199,21 @@ class AITranslator:
             user_prompt = user_prompt.replace("{target_language}", self.target_language)
             user_prompt = user_prompt.replace("{content}", batch_content)
 
+            # 记录 debug 信息(包含完整的 system + user prompt)
+            if self.system_prompt:
+                batch_result.prompt = f"[system]\n{self.system_prompt}\n\n[user]\n{user_prompt}"
+            else:
+                batch_result.prompt = user_prompt
+
             # 调用 AI API
             response = self._call_ai(user_prompt)
 
+            # 记录 AI 原始响应
+            batch_result.raw_response = response
+
             # 解析批量翻译结果
-            translated_texts = self._parse_batch_response(response, len(non_empty_texts))
+            translated_texts, raw_parsed_count = self._parse_batch_response(response, len(non_empty_texts))
+            batch_result.parsed_count = raw_parsed_count
 
             # 填充结果
             for idx, translated in zip(non_empty_indices, translated_texts):
@@ -223,7 +236,7 @@ class AITranslator:
             lines.append(f"[{i}] {text}")
         return "\n".join(lines)
 
-    def _parse_batch_response(self, response: str, expected_count: int) -> List[str]:
+    def _parse_batch_response(self, response: str, expected_count: int) -> tuple:
         """
         解析批量翻译响应
 
@@ -232,7 +245,7 @@ class AITranslator:
             expected_count: 期望的翻译数量
 
         Returns:
-            List[str]: 翻译结果列表
+            tuple: (翻译结果列表, AI 原始解析出的条目数)
         """
         results = []
         lines = response.strip().split("\n")
@@ -266,6 +279,7 @@ class AITranslator:
         # 按索引排序并提取文本
         results.sort(key=lambda x: x[0])
         translated = [text for _, text in results]
+        raw_parsed_count = len(translated)
 
         # 如果解析结果数量不匹配,尝试简单按行分割
         if len(translated) != expected_count:
@@ -278,12 +292,13 @@ class AITranslator:
                     translated.append(stripped[bracket_end + 1:].strip())
                 elif stripped:
                     translated.append(stripped)
+            raw_parsed_count = len(translated)
 
         # 确保返回正确数量
         while len(translated) < expected_count:
             translated.append("")
 
-        return translated[:expected_count]
+        return translated[:expected_count], raw_parsed_count
 
     def _call_ai(self, user_prompt: str) -> str:
         """调用 AI API(使用 LiteLLM)"""

+ 637 - 3
trendradar/context.py

@@ -16,6 +16,8 @@ from trendradar.utils.time import (
     format_time_filename,
     get_current_time_display,
     convert_time_for_display,
+    format_iso_time_friendly,
+    is_within_days,
 )
 from trendradar.core import (
     load_frequency_words,
@@ -26,7 +28,6 @@ from trendradar.core import (
     Scheduler,
 )
 from trendradar.report import (
-    clean_title,
     prepare_report_data,
     generate_html_report,
     render_html_content,
@@ -38,6 +39,7 @@ from trendradar.notification import (
     NotificationDispatcher,
 )
 from trendradar.ai import AITranslator
+from trendradar.ai.filter import AIFilter, AIFilterResult
 from trendradar.storage import get_storage_manager
 
 
@@ -132,6 +134,26 @@ class AppContext:
         default_order = ["hotlist", "rss", "new_items", "standalone", "ai_analysis"]
         return self.config.get("DISPLAY", {}).get("REGION_ORDER", default_order)
 
+    @property
+    def filter_method(self) -> str:
+        """获取筛选策略: keyword | ai"""
+        return self.config.get("FILTER", {}).get("METHOD", "keyword")
+
+    @property
+    def ai_priority_sort_enabled(self) -> bool:
+        """AI 模式标签排序开关(与 keyword 的 sort_by_position_first 解耦)"""
+        return self.config.get("FILTER", {}).get("PRIORITY_SORT_ENABLED", False)
+
+    @property
+    def ai_filter_config(self) -> Dict:
+        """获取 AI 筛选配置"""
+        return self.config.get("AI_FILTER", {})
+
+    @property
+    def ai_filter_enabled(self) -> bool:
+        """AI 筛选是否启用(基于 filter.method 判断)"""
+        return self.filter_method == "ai"
+
     # === 时间操作 ===
 
     def get_time(self) -> datetime:
@@ -269,6 +291,7 @@ class AppContext:
         new_titles: Optional[Dict] = None,
         id_to_name: Optional[Dict] = None,
         mode: str = "daily",
+        frequency_file: Optional[str] = None,
     ) -> Dict:
         """准备报告数据"""
         return prepare_report_data(
@@ -279,7 +302,7 @@ class AppContext:
             mode=mode,
             rank_threshold=self.rank_threshold,
             matches_word_groups_func=self.matches_word_groups,
-            load_frequency_words_func=self.load_frequency_words,
+            load_frequency_words_func=lambda: self.load_frequency_words(frequency_file),
             show_new_section=self.show_new_section,
         )
 
@@ -296,6 +319,7 @@ class AppContext:
         rss_new_items: Optional[List[Dict]] = None,
         ai_analysis: Optional[Any] = None,
         standalone_data: Optional[Dict] = None,
+        frequency_file: Optional[str] = None,
     ) -> str:
         """生成HTML报告"""
         return generate_html_report(
@@ -312,7 +336,7 @@ class AppContext:
             time_filename=self.format_time(),
             render_html_func=lambda *args, **kwargs: self.render_html(*args, rss_items=rss_items, rss_new_items=rss_new_items, ai_analysis=ai_analysis, standalone_data=standalone_data, **kwargs),
             matches_word_groups_func=self.matches_word_groups,
-            load_frequency_words_func=self.load_frequency_words,
+            load_frequency_words_func=lambda: self.load_frequency_words(frequency_file),
         )
 
     def render_html(
@@ -468,9 +492,619 @@ class AppContext:
                 timeline_data=timeline_data,
                 storage_backend=self.get_storage_manager(),
                 get_time_func=self.get_time,
+                fallback_report_mode=self.config.get("REPORT_MODE", "current"),
             )
         return self._scheduler
 
+    # === AI 智能筛选 ===
+
+    @staticmethod
+    def _with_ordered_priorities(tags: List[Dict], start_priority: int = 1) -> List[Dict]:
+        """按当前列表顺序补齐优先级(值越小优先级越高)"""
+        normalized: List[Dict] = []
+        priority = start_priority
+        for tag_data in tags:
+            if not isinstance(tag_data, dict):
+                continue
+            tag_name = str(tag_data.get("tag", "")).strip()
+            if not tag_name:
+                continue
+            item = dict(tag_data)
+            item["tag"] = tag_name
+            item["priority"] = priority
+            normalized.append(item)
+            priority += 1
+        return normalized
+
+    def run_ai_filter(self, interests_file: Optional[str] = None) -> Optional[AIFilterResult]:
+        """
+        执行 AI 智能筛选完整流程
+
+        Args:
+            interests_file: 兴趣描述文件名(位于 config/custom/ai/),None=使用默认 config/ai_interests.txt
+
+        1. 读取兴趣描述文件,计算 hash
+        2. 对比数据库 prompt_hash,决定是否重新提取标签
+        3. 收集待分类新闻(去重)
+        4. 按 batch_size 分组调用 AI 分类
+        5. 保存结果
+        6. 查询 active 结果,按标签分组返回
+
+        Returns:
+            AIFilterResult 或 None(未启用或出错)
+        """
+        if not self.ai_filter_enabled:
+            return None
+
+        filter_config = self.ai_filter_config
+        ai_config = self.config.get("AI", {})
+        debug = self.config.get("DEBUG", False)
+
+        # 创建 AIFilter 实例
+        ai_filter = AIFilter(ai_config, filter_config, self.get_time, debug)
+
+        # 确定实际使用的兴趣文件名
+        # None = 使用默认 config/ai_interests.txt,指定文件名 = config/custom/ai/{name}
+        configured_interests = interests_file or filter_config.get("INTERESTS_FILE")
+        effective_interests_file = configured_interests or "ai_interests.txt"
+
+        if debug:
+            print(f"[AI筛选][DEBUG] === 配置信息 ===")
+            print(f"[AI筛选][DEBUG] 存储后端: {self.get_storage_manager().backend_name}")
+            print(f"[AI筛选][DEBUG] batch_size={filter_config.get('BATCH_SIZE', 200)}, "
+                  f"batch_interval={filter_config.get('BATCH_INTERVAL', 5)}")
+            print(f"[AI筛选][DEBUG] interests_file={effective_interests_file}")
+            print(f"[AI筛选][DEBUG] prompt_file={filter_config.get('PROMPT_FILE', 'prompt.txt')}")
+            print(f"[AI筛选][DEBUG] extract_prompt_file={filter_config.get('EXTRACT_PROMPT_FILE', 'extract_prompt.txt')}")
+
+        # 1. 读取兴趣描述
+        # 传 configured_interests(可能为 None)给 load_interests_content,
+        # 让它区分"默认文件(config/ai_interests.txt)"和"自定义文件(config/custom/ai/)"
+        interests_content = ai_filter.load_interests_content(configured_interests)
+        if not interests_content:
+            return AIFilterResult(success=False, error="兴趣描述文件为空或不存在")
+
+        current_hash = ai_filter.compute_interests_hash(interests_content, effective_interests_file)
+        storage = self.get_storage_manager()
+
+        if debug:
+            print(f"[AI筛选][DEBUG] 兴趣描述 hash: {current_hash}")
+            print(f"[AI筛选][DEBUG] 兴趣描述内容 ({len(interests_content)} 字符):\n{interests_content}")
+
+        # 2. 开启批量模式(远程后端延迟上传,所有写操作完成后统一上传)
+        storage.begin_batch()
+
+        # 3. 检查提示词是否变更
+        stored_hash = storage.get_latest_prompt_hash(interests_file=effective_interests_file)
+
+        if debug:
+            print(f"[AI筛选][DEBUG] 数据库存储 hash: {stored_hash}")
+            print(f"[AI筛选][DEBUG] hash 对比: stored={stored_hash} vs current={current_hash} → {'匹配' if stored_hash == current_hash else '不匹配'}")
+
+        if stored_hash != current_hash:
+            new_version = storage.get_latest_ai_filter_tag_version() + 1
+            threshold = filter_config.get("RECLASSIFY_THRESHOLD", 0.6)
+
+            if stored_hash is None:
+                # 首次运行,直接提取并保存全部标签
+                print(f"[AI筛选] 首次运行 ({effective_interests_file}),提取标签...")
+                tags_data = ai_filter.extract_tags(interests_content)
+                if not tags_data:
+                    storage.end_batch()
+                    return AIFilterResult(success=False, error="标签提取失败")
+                tags_data = self._with_ordered_priorities(tags_data, start_priority=1)
+                saved_count = storage.save_ai_filter_tags(tags_data, new_version, current_hash, interests_file=effective_interests_file)
+                print(f"[AI筛选] 已保存 {saved_count} 个标签 (版本 {new_version})")
+            else:
+                # 兴趣描述已变更,让 AI 对比旧标签和新兴趣,给出更新方案
+                old_tags = storage.get_active_ai_filter_tags(interests_file=effective_interests_file)
+                update_result = ai_filter.update_tags(old_tags, interests_content)
+
+                if update_result is None:
+                    # AI 标签更新失败,回退到重新提取全部标签
+                    print(f"[AI筛选] AI 标签更新失败,回退到重新提取")
+                    tags_data = ai_filter.extract_tags(interests_content)
+                    if not tags_data:
+                        storage.end_batch()
+                        return AIFilterResult(success=False, error="标签提取失败")
+                    tags_data = self._with_ordered_priorities(tags_data, start_priority=1)
+                    deprecated_count = storage.deprecate_all_ai_filter_tags(interests_file=effective_interests_file)
+                    storage.clear_analyzed_news(interests_file=effective_interests_file)
+                    saved_count = storage.save_ai_filter_tags(tags_data, new_version, current_hash, interests_file=effective_interests_file)
+                    print(f"[AI筛选] 废弃 {deprecated_count} 个旧标签, 保存 {saved_count} 个新标签 (版本 {new_version})")
+                else:
+                    change_ratio = update_result["change_ratio"]
+                    keep_tags = update_result["keep"]
+                    add_tags = update_result["add"]
+                    remove_tags = update_result["remove"]
+
+                    if debug:
+                        print(f"[AI筛选][DEBUG] AI 标签更新: keep={len(keep_tags)}, add={len(add_tags)}, remove={len(remove_tags)}, change_ratio={change_ratio:.2f}, threshold={threshold:.2f}")
+
+                    if change_ratio >= threshold:
+                        # 全量重分类:废弃所有旧标签,用 extract_tags 重新提取
+                        print(f"[AI筛选] 兴趣文件变更: {effective_interests_file} (AI change_ratio={change_ratio:.2f} >= threshold={threshold:.2f} → 全量重分类)")
+                        tags_data = ai_filter.extract_tags(interests_content)
+                        if not tags_data:
+                            storage.end_batch()
+                            return AIFilterResult(success=False, error="标签提取失败")
+                        tags_data = self._with_ordered_priorities(tags_data, start_priority=1)
+                        deprecated_count = storage.deprecate_all_ai_filter_tags(interests_file=effective_interests_file)
+                        storage.clear_analyzed_news(interests_file=effective_interests_file)
+                        saved_count = storage.save_ai_filter_tags(tags_data, new_version, current_hash, interests_file=effective_interests_file)
+                        print(f"[AI筛选] 废弃 {deprecated_count} 个旧标签, 保存 {saved_count} 个新标签 (版本 {new_version})")
+                    else:
+                        # 增量更新:按 AI 指示操作
+                        print(f"[AI筛选] 兴趣文件变更: {effective_interests_file} (AI change_ratio={change_ratio:.2f} < threshold={threshold:.2f} → 增量更新)")
+                        print(f"[AI筛选]   保留 {len(keep_tags)} 个标签, 新增 {len(add_tags)} 个, 废弃 {len(remove_tags)} 个")
+
+                        # 废弃 AI 标记移除的标签
+                        if remove_tags:
+                            remove_set = set(remove_tags)
+                            removed_ids = [t["id"] for t in old_tags if t["tag"] in remove_set]
+                            if removed_ids:
+                                storage.deprecate_specific_ai_filter_tags(removed_ids)
+                                if debug:
+                                    print(f"[AI筛选][DEBUG] 废弃标签 IDs: {removed_ids}")
+
+                        # 更新保留标签的描述
+                        keep_with_priority = []
+                        if keep_tags:
+                            storage.update_ai_filter_tag_descriptions(keep_tags, interests_file=effective_interests_file)
+                            keep_with_priority = self._with_ordered_priorities(keep_tags, start_priority=1)
+                            storage.update_ai_filter_tag_priorities(keep_with_priority, interests_file=effective_interests_file)
+
+                        # 保存新增标签
+                        if add_tags:
+                            add_start = keep_with_priority[-1]["priority"] + 1 if keep_with_priority else 1
+                            add_with_priority = self._with_ordered_priorities(add_tags, start_priority=add_start)
+                            saved_count = storage.save_ai_filter_tags(add_with_priority, new_version, current_hash, interests_file=effective_interests_file)
+                            if debug:
+                                print(f"[AI筛选][DEBUG] 新增保存 {saved_count} 个标签")
+
+                        # 更新保留标签的 hash(标记为已处理)
+                        storage.update_ai_filter_tags_hash(effective_interests_file, current_hash)
+
+                        # 增量更新:清除不匹配新闻的分析记录,让它们有机会被新标签集重新分析
+                        if add_tags:
+                            cleared = storage.clear_unmatched_analyzed_news(interests_file=effective_interests_file)
+                            if cleared > 0:
+                                print(f"[AI筛选]   清除 {cleared} 条不匹配记录,将在新标签下重新分析")
+
+        # 3. 获取当前 active 标签
+        active_tags = storage.get_active_ai_filter_tags(interests_file=effective_interests_file)
+        if debug:
+            print(f"[AI筛选][DEBUG] 从数据库获取 active 标签: {len(active_tags)} 个")
+            for t in active_tags:
+                print(f"[AI筛选][DEBUG]   id={t['id']} tag={t['tag']} priority={t.get('priority', 9999)} version={t.get('version')} hash={t.get('prompt_hash', '')[:8]}...")
+
+        if not active_tags:
+            storage.end_batch()
+            return AIFilterResult(success=False, error="没有可用的标签")
+
+        print(f"[AI筛选] 使用 {len(active_tags)} 个标签")
+
+        # 4. 收集待分类新闻
+        # 热榜
+        all_news = storage.get_all_news_ids()
+        analyzed_hotlist = storage.get_analyzed_news_ids("hotlist", interests_file=effective_interests_file)
+        pending_news = [n for n in all_news if n["id"] not in analyzed_hotlist]
+
+        # RSS(先做新鲜度过滤,再去除已分类的)
+        pending_rss = []
+        freshness_filtered_rss = 0
+        if self.rss_enabled:
+            all_rss = storage.get_all_rss_ids()
+
+            # 应用新鲜度过滤(与推送阶段一致)
+            rss_config = self.rss_config
+            freshness_config = rss_config.get("FRESHNESS_FILTER", {})
+            freshness_enabled = freshness_config.get("ENABLED", True)
+            default_max_age_days = freshness_config.get("MAX_AGE_DAYS", 3)
+            timezone = self.config.get("TIMEZONE", DEFAULT_TIMEZONE)
+
+            # 构建 feed_id -> max_age_days 的映射
+            feed_max_age_map = {}
+            for feed_cfg in self.rss_feeds:
+                feed_id = feed_cfg.get("id", "")
+                max_age = feed_cfg.get("max_age_days")
+                if max_age is not None:
+                    try:
+                        feed_max_age_map[feed_id] = int(max_age)
+                    except (ValueError, TypeError):
+                        pass
+
+            fresh_rss = []
+            for n in all_rss:
+                published_at = n.get("published_at", "")
+                feed_id = n.get("source_id", "")
+                max_days = feed_max_age_map.get(feed_id, default_max_age_days)
+                if freshness_enabled and max_days > 0 and published_at:
+                    if not is_within_days(published_at, max_days, timezone):
+                        freshness_filtered_rss += 1
+                        continue
+                fresh_rss.append(n)
+
+            analyzed_rss = storage.get_analyzed_news_ids("rss", interests_file=effective_interests_file)
+            pending_rss = [n for n in fresh_rss if n["id"] not in analyzed_rss]
+
+        # 始终打印总量/已分析/待分析 的详细数据
+        hotlist_total = len(all_news)
+        hotlist_skipped = len(analyzed_hotlist)
+        hotlist_pending = len(pending_news)
+        print(f"[AI筛选] 热榜: 总计 {hotlist_total} 条, 已分析跳过 {hotlist_skipped} 条, 本次发送AI分析 {hotlist_pending} 条")
+        if self.rss_enabled:
+            rss_total = len(all_rss)
+            rss_skipped = len(analyzed_rss)
+            rss_pending = len(pending_rss)
+            freshness_info = f", 新鲜度过滤 {freshness_filtered_rss} 条" if freshness_filtered_rss > 0 else ""
+            print(f"[AI筛选] RSS: 总计 {rss_total} 条{freshness_info}, 已分析跳过 {rss_skipped} 条, 本次发送AI分析 {rss_pending} 条")
+
+        total_pending = len(pending_news) + len(pending_rss)
+        if total_pending == 0:
+            print("[AI筛选] 没有新增新闻需要分类")
+
+        # 5. 批量分类
+        batch_size = filter_config.get("BATCH_SIZE", 200)
+        batch_interval = filter_config.get("BATCH_INTERVAL", 5)
+        total_results = []
+        batch_count = 0  # 跨热榜和 RSS 的全局批次计数
+
+        # 处理热榜
+        for i in range(0, len(pending_news), batch_size):
+            if batch_count > 0 and batch_interval > 0:
+                import time
+                print(f"[AI筛选] 批次间隔等待 {batch_interval} 秒...")
+                time.sleep(batch_interval)
+            batch = pending_news[i:i + batch_size]
+            titles_for_ai = [
+                {"id": n["id"], "title": n["title"], "source": n.get("source_name", "")}
+                for n in batch
+            ]
+            batch_results = ai_filter.classify_batch(titles_for_ai, active_tags, interests_content)
+            for r in batch_results:
+                r["source_type"] = "hotlist"
+            total_results.extend(batch_results)
+            batch_count += 1
+            print(f"[AI筛选] 热榜批次 {i // batch_size + 1}: {len(batch)} 条 → {len(batch_results)} 条匹配")
+
+        # 处理 RSS
+        for i in range(0, len(pending_rss), batch_size):
+            if batch_count > 0 and batch_interval > 0:
+                import time
+                print(f"[AI筛选] 批次间隔等待 {batch_interval} 秒...")
+                time.sleep(batch_interval)
+            batch = pending_rss[i:i + batch_size]
+            titles_for_ai = [
+                {"id": n["id"], "title": n["title"], "source": n.get("source_name", "")}
+                for n in batch
+            ]
+            batch_results = ai_filter.classify_batch(titles_for_ai, active_tags, interests_content)
+            for r in batch_results:
+                r["source_type"] = "rss"
+            total_results.extend(batch_results)
+            batch_count += 1
+            print(f"[AI筛选] RSS 批次 {i // batch_size + 1}: {len(batch)} 条 → {len(batch_results)} 条匹配")
+
+        # 6. 保存结果
+        if total_results:
+            saved = storage.save_ai_filter_results(total_results)
+            print(f"[AI筛选] 保存 {saved} 条分类结果")
+            if debug and saved != len(total_results):
+                print(f"[AI筛选][DEBUG] !! 保存数量不一致: 期望 {len(total_results)}, 实际 {saved}(可能有重复记录被跳过)")
+
+        # 6.5 记录所有已分析的新闻(匹配+不匹配,用于去重)
+        matched_hotlist_ids = {r["news_item_id"] for r in total_results if r.get("source_type") == "hotlist"}
+        matched_rss_ids = {r["news_item_id"] for r in total_results if r.get("source_type") == "rss"}
+
+        if pending_news:
+            hotlist_ids = [n["id"] for n in pending_news]
+            storage.save_analyzed_news(
+                hotlist_ids, "hotlist", effective_interests_file,
+                current_hash, matched_hotlist_ids
+            )
+
+        if pending_rss:
+            rss_ids = [n["id"] for n in pending_rss]
+            storage.save_analyzed_news(
+                rss_ids, "rss", effective_interests_file,
+                current_hash, matched_rss_ids
+            )
+
+        if pending_news or pending_rss:
+            total_analyzed = len(pending_news) + len(pending_rss)
+            total_matched = len(matched_hotlist_ids) + len(matched_rss_ids)
+            print(f"[AI筛选] 已记录 {total_analyzed} 条新闻分析状态 (匹配 {total_matched}, 不匹配 {total_analyzed - total_matched})")
+
+        # 7. 结束批量模式(统一上传数据库到远程存储)
+        storage.end_batch()
+
+        # 8. 查询并组装返回结果
+        all_results = storage.get_active_ai_filter_results(interests_file=effective_interests_file)
+
+        if debug:
+            print(f"[AI筛选][DEBUG] === 最终汇总 ===")
+            print(f"[AI筛选][DEBUG] 数据库 active 分类结果: {len(all_results)} 条")
+            # 按标签统计
+            tag_counts: dict = {}
+            for r in all_results:
+                tag_name = r.get("tag", "?")
+                src_type = r.get("source_type", "?")
+                key = f"{tag_name}({src_type})"
+                tag_counts[key] = tag_counts.get(key, 0) + 1
+            for key, count in sorted(tag_counts.items()):
+                print(f"[AI筛选][DEBUG]   {key}: {count} 条")
+
+        return self._build_filter_result(all_results, active_tags, total_pending)
+
+    def _build_filter_result(
+        self,
+        raw_results: List[Dict],
+        tags: List[Dict],
+        total_processed: int,
+    ) -> AIFilterResult:
+        """将数据库查询结果组装为 AIFilterResult"""
+        priority_sort_enabled = self.ai_priority_sort_enabled
+        tag_priority_map = {}
+        for idx, t in enumerate(tags, start=1):
+            tag_name = str(t.get("tag", "")).strip() if isinstance(t, dict) else ""
+            if not tag_name:
+                continue
+            try:
+                tag_priority_map[tag_name] = int(t.get("priority", idx))
+            except (TypeError, ValueError):
+                tag_priority_map[tag_name] = idx
+
+        # 按标签分组
+        tag_groups: Dict[str, Dict] = {}
+        seen_titles: Dict[str, set] = {}  # 每个标签下去重
+
+        for r in raw_results:
+            tag_name = r["tag"]
+            if tag_name not in tag_groups:
+                raw_priority = r.get("tag_priority", tag_priority_map.get(tag_name, 9999))
+                try:
+                    tag_position = int(raw_priority)
+                except (TypeError, ValueError):
+                    tag_position = 9999
+                tag_groups[tag_name] = {
+                    "tag": tag_name,
+                    "description": r.get("tag_description", ""),
+                    "position": tag_position,
+                    "count": 0,
+                    "items": [],
+                }
+                seen_titles[tag_name] = set()
+
+            title = r["title"]
+            if title in seen_titles[tag_name]:
+                continue
+            seen_titles[tag_name].add(title)
+
+            tag_groups[tag_name]["items"].append({
+                "title": title,
+                "source_id": r.get("source_id", ""),
+                "source_name": r.get("source_name", ""),
+                "url": r.get("url", ""),
+                "mobile_url": r.get("mobile_url", ""),
+                "rank": r.get("rank", 0),
+                "ranks": r.get("ranks", []),
+                "first_time": r.get("first_time", ""),
+                "last_time": r.get("last_time", ""),
+                "count": r.get("count", 1),
+                "relevance_score": r.get("relevance_score", 0),
+                "source_type": r.get("source_type", "hotlist"),
+            })
+            tag_groups[tag_name]["count"] += 1
+
+        # 根据配置排序:位置优先 / 数量优先
+        if priority_sort_enabled:
+            sorted_tags = sorted(
+                tag_groups.values(),
+                key=lambda x: (x.get("position", 9999), -x["count"], x["tag"]),
+            )
+        else:
+            sorted_tags = sorted(
+                tag_groups.values(),
+                key=lambda x: (-x["count"], x.get("position", 9999), x["tag"]),
+            )
+
+        total_matched = sum(t["count"] for t in sorted_tags)
+
+        return AIFilterResult(
+            tags=sorted_tags,
+            total_matched=total_matched,
+            total_processed=total_processed,
+            success=True,
+        )
+
+    def convert_ai_filter_to_report_data(
+        self,
+        ai_filter_result: AIFilterResult,
+        mode: str = "daily",
+        new_titles: Optional[Dict] = None,
+        rss_new_urls: Optional[set] = None,
+    ) -> tuple:
+        """
+        将 AI 筛选结果转换为与关键词匹配相同的数据结构
+
+        AIFilterResult.tags 中每个 tag 对应一个 "word"(关键词组)。
+        tag.items 中 source_type="hotlist" 的条目进入热榜 stats,
+        source_type="rss" 的条目进入 rss_items stats。
+
+        Args:
+            ai_filter_result: AI 筛选结果
+            mode: 报告模式 ("daily" | "current" | "incremental")
+            new_titles: 热榜新增标题 {source_id: {title: data}},用于 is_new 检测
+            rss_new_urls: 新增 RSS 条目的 URL 集合,用于 is_new 检测
+
+        Returns:
+            (hotlist_stats, rss_stats):
+            - hotlist_stats: 与 count_word_frequency() 产出格式一致
+            - rss_stats: 与 rss_items 格式一致
+        """
+        hotlist_stats = []
+        rss_stats = []
+        max_news = self.config.get("MAX_NEWS_PER_KEYWORD", 0)
+        min_score = self.ai_filter_config.get("MIN_SCORE", 0)
+
+        # current 模式:计算最新时间,只保留当前在榜的热榜新闻
+        # 与 count_word_frequency(mode="current") 的过滤逻辑对齐
+        latest_time = None
+        if mode == "current":
+            for tag_data in ai_filter_result.tags:
+                for item in tag_data.get("items", []):
+                    if item.get("source_type", "hotlist") == "hotlist":
+                        last_time = item.get("last_time", "")
+                        if last_time and (latest_time is None or last_time > latest_time):
+                            latest_time = last_time
+            if latest_time:
+                print(f"[AI筛选] current 模式:最新时间 {latest_time},过滤已下榜新闻")
+
+        # RSS 新鲜度过滤配置(与推送阶段一致)
+        rss_config = self.rss_config
+        freshness_config = rss_config.get("FRESHNESS_FILTER", {})
+        freshness_enabled = freshness_config.get("ENABLED", True)
+        default_max_age_days = freshness_config.get("MAX_AGE_DAYS", 3)
+        timezone = self.config.get("TIMEZONE", DEFAULT_TIMEZONE)
+
+        feed_max_age_map = {}
+        for feed_cfg in self.rss_feeds:
+            feed_id = feed_cfg.get("id", "")
+            max_age = feed_cfg.get("max_age_days")
+            if max_age is not None:
+                try:
+                    feed_max_age_map[feed_id] = int(max_age)
+                except (ValueError, TypeError):
+                    pass
+
+        filtered_count = 0
+        for tag_data in ai_filter_result.tags:
+            tag_name = tag_data.get("tag", "")
+            items = tag_data.get("items", [])
+            if not items:
+                continue
+
+            hotlist_titles = []
+            rss_titles = []
+
+            for item in items:
+                source_type = item.get("source_type", "hotlist")
+
+                # current 模式:跳过已下榜的热榜新闻
+                if mode == "current" and latest_time and source_type == "hotlist":
+                    if item.get("last_time", "") != latest_time:
+                        filtered_count += 1
+                        continue
+
+                # 分数阈值过滤:跳过相关度低于 min_score 的新闻
+                if min_score > 0:
+                    score = item.get("relevance_score", 0)
+                    if score < min_score:
+                        continue
+
+                # 构建时间显示
+                first_time = item.get("first_time", "")
+                last_time = item.get("last_time", "")
+                if source_type == "rss":
+                    # RSS 新鲜度过滤:跳过超过 max_age_days 的旧文章
+                    if freshness_enabled and first_time:
+                        feed_id = item.get("source_id", "")
+                        max_days = feed_max_age_map.get(feed_id, default_max_age_days)
+                        if max_days > 0 and not is_within_days(first_time, max_days, timezone):
+                            continue
+
+                    # RSS 条目:first_time 是 ISO 格式,用友好格式显示
+                    if first_time:
+                        time_display = format_iso_time_friendly(first_time, timezone, include_date=True)
+                    else:
+                        time_display = ""
+                else:
+                    # 热榜条目:使用 [HH:MM ~ HH:MM] 格式(与 keyword 模式一致)
+                    if first_time and last_time and first_time != last_time:
+                        first_display = convert_time_for_display(first_time)
+                        last_display = convert_time_for_display(last_time)
+                        time_display = f"[{first_display} ~ {last_display}]"
+                    elif first_time:
+                        time_display = convert_time_for_display(first_time)
+                    else:
+                        time_display = ""
+
+                # 计算 is_new(与 keyword 模式 core/analyzer.py:335-342 对齐)
+                if source_type == "rss":
+                    is_new = False
+                    if rss_new_urls:
+                        item_url = item.get("url", "")
+                        is_new = item_url in rss_new_urls if item_url else False
+                else:
+                    is_new = False
+                    if new_titles:
+                        item_source_id = item.get("source_id", "")
+                        item_title = item.get("title", "")
+                        if item_source_id in new_titles:
+                            is_new = item_title in new_titles[item_source_id]
+
+                title_entry = {
+                    "title": item.get("title", ""),
+                    "source_name": item.get("source_name", ""),
+                    "url": item.get("url", ""),
+                    "mobile_url": item.get("mobile_url", ""),
+                    "ranks": item.get("ranks", []),
+                    "rank_threshold": self.rank_threshold,
+                    "count": item.get("count", 1),
+                    "is_new": is_new,
+                    "time_display": time_display,
+                    "matched_keyword": tag_name,
+                }
+
+                if source_type == "rss":
+                    rss_titles.append(title_entry)
+                else:
+                    hotlist_titles.append(title_entry)
+
+            if hotlist_titles:
+                if max_news > 0:
+                    hotlist_titles = hotlist_titles[:max_news]
+                hotlist_stats.append({
+                    "word": tag_name,
+                    "count": len(hotlist_titles),
+                    "position": tag_data.get("position", 9999),
+                    "titles": hotlist_titles,
+                })
+
+            if rss_titles:
+                if max_news > 0:
+                    rss_titles = rss_titles[:max_news]
+                rss_stats.append({
+                    "word": tag_name,
+                    "count": len(rss_titles),
+                    "position": tag_data.get("position", 9999),
+                    "titles": rss_titles,
+                })
+
+        if mode == "current" and filtered_count > 0:
+            total_kept = sum(s["count"] for s in hotlist_stats)
+            print(f"[AI筛选] current 模式:过滤 {filtered_count} 条已下榜新闻,保留 {total_kept} 条当前在榜")
+
+        if min_score > 0:
+            hotlist_kept = sum(s["count"] for s in hotlist_stats)
+            rss_kept = sum(s["count"] for s in rss_stats)
+            total_kept = hotlist_kept + rss_kept
+            parts = [f"热榜 {hotlist_kept} 条"]
+            if rss_kept > 0:
+                parts.append(f"RSS {rss_kept} 条")
+            print(f"[AI筛选] 分数过滤:min_score={min_score},保留 {total_kept} 条 score≥{min_score} ({', '.join(parts)})")
+
+        priority_sort_enabled = self.ai_priority_sort_enabled
+        if priority_sort_enabled:
+            hotlist_stats.sort(key=lambda x: (x.get("position", 9999), -x["count"], x["word"]))
+            rss_stats.sort(key=lambda x: (x.get("position", 9999), -x["count"], x["word"]))
+        else:
+            hotlist_stats.sort(key=lambda x: (-x["count"], x.get("position", 9999), x["word"]))
+            rss_stats.sort(key=lambda x: (-x["count"], x.get("position", 9999), x["word"]))
+
+        return hotlist_stats, rss_stats
+
     # === 资源清理 ===
 
     def cleanup(self):

+ 7 - 4
trendradar/core/frequency.py

@@ -111,7 +111,7 @@ def load_frequency_words(
     - @数字:该词组最多显示的条数
 
     Args:
-        frequency_file: 频率词配置文件路径,默认从环境变量 FREQUENCY_WORDS_PATH 获取或使用 config/frequency_words.txt
+        frequency_file: 频率词配置文件路径,默认从环境变量 FREQUENCY_WORDS_PATH 获取或使用 config/frequency_words.txt,短文件名从 config/custom/keyword/ 查找
 
     Returns:
         (词组列表, 词组内过滤词, 全局过滤词)
@@ -126,7 +126,12 @@ def load_frequency_words(
 
     frequency_path = Path(frequency_file)
     if not frequency_path.exists():
-        raise FileNotFoundError(f"频率词文件 {frequency_file} 不存在")
+        # 尝试作为短文件名,拼接 config/custom/keyword/ 前缀
+        custom_path = Path("config/custom/keyword") / frequency_file
+        if custom_path.exists():
+            frequency_path = custom_path
+        else:
+            raise FileNotFoundError(f"频率词文件 {frequency_file} 不存在")
 
     with open(frequency_path, "r", encoding="utf-8") as f:
         content = f.read()
@@ -179,7 +184,6 @@ def load_frequency_words(
 
         group_required_words = []
         group_normal_words = []
-        group_filter_words = []
         group_max_count = 0  # 默认不限制
 
         for word in words:
@@ -196,7 +200,6 @@ def load_frequency_words(
                 filter_word = word[1:]
                 parsed = _parse_word(filter_word)
                 filter_words.append(parsed)
-                group_filter_words.append(parsed)
             elif word.startswith("+"):
                 # 必须词(支持正则语法)
                 req_word = word[1:]

+ 53 - 1
trendradar/core/loader.py

@@ -15,7 +15,7 @@ from .config import parse_multi_account_config, validate_paired_configs
 from trendradar.utils.time import DEFAULT_TIMEZONE
 
 
-def _get_env_bool(key: str, default: bool = False) -> Optional[bool]:
+def _get_env_bool(key: str) -> Optional[bool]:
     """从环境变量获取布尔值,如果未设置返回 None"""
     value = os.environ.get(key, "").strip().lower()
     if not value:
@@ -306,10 +306,56 @@ def _load_ai_translation_config(config_data: Dict) -> Dict:
 
     enabled_env = _get_env_bool("AI_TRANSLATION_ENABLED")
 
+    scope = trans_config.get("scope", {})
+
     return {
         "ENABLED": enabled_env if enabled_env is not None else trans_config.get("enabled", False),
         "LANGUAGE": _get_env_str("AI_TRANSLATION_LANGUAGE") or trans_config.get("language", "English"),
         "PROMPT_FILE": trans_config.get("prompt_file", "ai_translation_prompt.txt"),
+        "SCOPE": {
+            "HOTLIST": scope.get("hotlist", True),
+            "RSS": scope.get("rss", True),
+            "STANDALONE": scope.get("standalone", True),
+        },
+    }
+
+
+def _load_ai_filter_config(config_data: Dict) -> Dict:
+    """加载 AI 智能筛选配置(由 filter.method 控制是否启用)"""
+    ai_filter = config_data.get("ai_filter", {})
+
+    return {
+        "BATCH_SIZE": ai_filter.get("batch_size", 200),
+        "BATCH_INTERVAL": ai_filter.get("batch_interval", 5),
+        "INTERESTS_FILE": ai_filter.get("interests_file"),  # None = 使用默认 config/ai_interests.txt
+        "PROMPT_FILE": ai_filter.get("prompt_file", "prompt.txt"),
+        "EXTRACT_PROMPT_FILE": ai_filter.get("extract_prompt_file", "extract_prompt.txt"),
+        "UPDATE_TAGS_PROMPT_FILE": ai_filter.get("update_tags_prompt_file", "update_tags_prompt.txt"),
+        "RECLASSIFY_THRESHOLD": ai_filter.get("reclassify_threshold", 0.6),
+        "MIN_SCORE": float(ai_filter.get("min_score", 0)),
+    }
+
+
+def _load_filter_config(config_data: Dict) -> Dict:
+    """加载筛选策略配置"""
+    filter_cfg = config_data.get("filter", {})
+
+    # 环境变量兼容:AI_FILTER_ENABLED=true → method=ai
+    env_ai_filter = _get_env_bool("AI_FILTER_ENABLED")
+
+    method = filter_cfg.get("method", "keyword")
+    if env_ai_filter is True:
+        method = "ai"
+
+    # 兼容旧配置:如果 ai_filter.enabled=true 且未显式设置 filter.method
+    if method == "keyword" and not filter_cfg.get("method"):
+        ai_filter = config_data.get("ai_filter", {})
+        if ai_filter.get("enabled", False):
+            method = "ai"
+
+    return {
+        "METHOD": method,  # "keyword" | "ai"
+        "PRIORITY_SORT_ENABLED": filter_cfg.get("priority_sort_enabled", False),  # AI 模式标签优先级排序开关
     }
 
 
@@ -544,6 +590,12 @@ def load_config(config_path: Optional[str] = None) -> Dict[str, Any]:
     # AI 翻译配置
     config["AI_TRANSLATION"] = _load_ai_translation_config(config_data)
 
+    # AI 智能筛选配置
+    config["AI_FILTER"] = _load_ai_filter_config(config_data)
+
+    # 筛选策略配置
+    config["FILTER"] = _load_filter_config(config_data)
+
     # 推送内容显示配置
     config["DISPLAY"] = _load_display_config(config_data)
 

+ 13 - 2
trendradar/core/scheduler.py

@@ -27,6 +27,9 @@ class ResolvedSchedule:
     ai_mode: str
     once_analyze: bool
     once_push: bool
+    frequency_file: Optional[str] = None  # 频率词文件路径,None=使用默认
+    filter_method: Optional[str] = None   # 筛选策略: "keyword"|"ai",None=使用全局配置
+    interests_file: Optional[str] = None  # AI 筛选兴趣文件,None=使用默认
 
 
 class Scheduler:
@@ -48,6 +51,7 @@ class Scheduler:
         timeline_data: Dict[str, Any],
         storage_backend: Any,
         get_time_func: Callable[[], datetime],
+        fallback_report_mode: str = "current",
     ):
         """
         初始化调度器
@@ -57,11 +61,13 @@ class Scheduler:
             timeline_data: timeline.yaml 的完整数据
             storage_backend: 存储后端(用于 once 去重记录)
             get_time_func: 获取当前时间的函数(应使用配置的时区)
+            fallback_report_mode: 调度未启用时回退使用的 report_mode(来自 config.yaml 的 report.mode)
         """
         self.schedule_config = schedule_config
         self.storage = storage_backend
         self.get_time = get_time_func
         self.enabled = schedule_config.get("enabled", True)
+        self.fallback_report_mode = fallback_report_mode
 
         # 加载并构建最终 timeline
         self.timeline = self._build_timeline(schedule_config, timeline_data)
@@ -101,7 +107,7 @@ class Scheduler:
             ResolvedSchedule 包含当前应执行的行为
         """
         if not self.enabled:
-            # 调度未启用时返回默认的全功能配置
+            # 调度未启用时返回默认的全功能配置,report_mode 回退使用 config.yaml 的 report.mode
             return ResolvedSchedule(
                 period_key=None,
                 period_name=None,
@@ -109,7 +115,7 @@ class Scheduler:
                 collect=True,
                 analyze=True,
                 push=True,
-                report_mode="current",
+                report_mode=self.fallback_report_mode,
                 ai_mode="follow_report",
                 once_analyze=False,
                 once_push=False,
@@ -162,6 +168,9 @@ class Scheduler:
             ai_mode=self._resolve_ai_mode(merged),
             once_analyze=merged.get("once", {}).get("analyze", False),
             once_push=merged.get("once", {}).get("push", False),
+            frequency_file=merged.get("frequency_file"),
+            filter_method=merged.get("filter_method"),
+            interests_file=merged.get("interests_file"),
         )
 
         # 打印行为摘要
@@ -173,6 +182,8 @@ class Scheduler:
         if resolved.push:
             actions.append(f"推送(模式:{resolved.report_mode})")
         print(f"[调度] 行为: {', '.join(actions) if actions else '无'}")
+        if resolved.frequency_file:
+            print(f"[调度] 频率词文件: {resolved.frequency_file}")
 
         return resolved
 

+ 2 - 3
trendradar/crawler/rss/fetcher.py

@@ -8,12 +8,11 @@ RSS 抓取器
 import time
 import random
 from dataclasses import dataclass
-from datetime import datetime
-from typing import List, Dict, Optional, Tuple, Callable
+from typing import List, Dict, Optional, Tuple
 
 import requests
 
-from .parser import RSSParser, ParsedRSSItem
+from .parser import RSSParser
 from trendradar.storage.base import RSSItem, RSSData
 from trendradar.utils.time import get_configured_time, is_within_days, DEFAULT_TIMEZONE
 

+ 82 - 36
trendradar/notification/dispatcher.py

@@ -78,6 +78,8 @@ class NotificationDispatcher:
         report_data: Dict,
         rss_items: Optional[List[Dict]] = None,
         rss_new_items: Optional[List[Dict]] = None,
+        standalone_data: Optional[Dict] = None,
+        display_regions: Optional[Dict] = None,
     ) -> tuple:
         """
         翻译推送内容
@@ -86,54 +88,74 @@ class NotificationDispatcher:
             report_data: 报告数据
             rss_items: RSS 统计条目
             rss_new_items: RSS 新增条目
+            standalone_data: 独立展示区数据
+            display_regions: 区域显示配置(不展示的区域跳过翻译)
 
         Returns:
-            tuple: (翻译后的 report_data, rss_items, rss_new_items)
+            tuple: (翻译后的 report_data, rss_items, rss_new_items, standalone_data)
         """
         if not self.translator or not self.translator.enabled:
-            return report_data, rss_items, rss_new_items
+            return report_data, rss_items, rss_new_items, standalone_data
 
         import copy
         print(f"[翻译] 开始翻译内容到 {self.translator.target_language}...")
 
+        scope = self.translator.scope
+        display_regions = display_regions or {}
+
         # 深拷贝避免修改原始数据
         report_data = copy.deepcopy(report_data)
         rss_items = copy.deepcopy(rss_items) if rss_items else None
         rss_new_items = copy.deepcopy(rss_new_items) if rss_new_items else None
+        standalone_data = copy.deepcopy(standalone_data) if standalone_data else None
 
         # 收集所有需要翻译的标题
         titles_to_translate = []
         title_locations = []  # 记录标题位置,用于回填
 
-        # 1. 热榜标题
-        for stat_idx, stat in enumerate(report_data.get("stats", [])):
-            for title_idx, title_data in enumerate(stat.get("titles", [])):
-                titles_to_translate.append(title_data.get("title", ""))
-                title_locations.append(("stats", stat_idx, title_idx))
+        # 1. 热榜标题(scope 开启 且 区域展示)
+        if scope.get("HOTLIST", True) and display_regions.get("HOTLIST", True):
+            for stat_idx, stat in enumerate(report_data.get("stats", [])):
+                for title_idx, title_data in enumerate(stat.get("titles", [])):
+                    titles_to_translate.append(title_data.get("title", ""))
+                    title_locations.append(("stats", stat_idx, title_idx))
 
-        # 2. 新增热点标题
-        for source_idx, source in enumerate(report_data.get("new_titles", [])):
-            for title_idx, title_data in enumerate(source.get("titles", [])):
-                titles_to_translate.append(title_data.get("title", ""))
-                title_locations.append(("new_titles", source_idx, title_idx))
+            # 2. 新增热点标题
+            for source_idx, source in enumerate(report_data.get("new_titles", [])):
+                for title_idx, title_data in enumerate(source.get("titles", [])):
+                    titles_to_translate.append(title_data.get("title", ""))
+                    title_locations.append(("new_titles", source_idx, title_idx))
 
         # 3. RSS 统计标题(结构与 stats 一致:[{word, count, titles: [{title, ...}]}])
-        if rss_items:
+        if rss_items and scope.get("RSS", True) and display_regions.get("RSS", True):
             for stat_idx, stat in enumerate(rss_items):
                 for title_idx, title_data in enumerate(stat.get("titles", [])):
                     titles_to_translate.append(title_data.get("title", ""))
                     title_locations.append(("rss_items", stat_idx, title_idx))
 
         # 4. RSS 新增标题(结构与 stats 一致)
-        if rss_new_items:
+        if rss_new_items and scope.get("RSS", True) and display_regions.get("RSS", True) and display_regions.get("NEW_ITEMS", True):
             for stat_idx, stat in enumerate(rss_new_items):
                 for title_idx, title_data in enumerate(stat.get("titles", [])):
                     titles_to_translate.append(title_data.get("title", ""))
                     title_locations.append(("rss_new_items", stat_idx, title_idx))
 
+        # 5. 独立展示区 - 热榜平台
+        if standalone_data and scope.get("STANDALONE", True) and display_regions.get("STANDALONE", False):
+            for plat_idx, platform in enumerate(standalone_data.get("platforms", [])):
+                for item_idx, item in enumerate(platform.get("items", [])):
+                    titles_to_translate.append(item.get("title", ""))
+                    title_locations.append(("standalone_platforms", plat_idx, item_idx))
+
+            # 6. 独立展示区 - RSS 源
+            for feed_idx, feed in enumerate(standalone_data.get("rss_feeds", [])):
+                for item_idx, item in enumerate(feed.get("items", [])):
+                    titles_to_translate.append(item.get("title", ""))
+                    title_locations.append(("standalone_rss", feed_idx, item_idx))
+
         if not titles_to_translate:
             print("[翻译] 没有需要翻译的内容")
-            return report_data, rss_items, rss_new_items
+            return report_data, rss_items, rss_new_items, standalone_data
 
         print(f"[翻译] 共 {len(titles_to_translate)} 条标题待翻译")
 
@@ -142,10 +164,36 @@ class NotificationDispatcher:
 
         if result.success_count == 0:
             print(f"[翻译] 翻译失败: {result.results[0].error if result.results else '未知错误'}")
-            return report_data, rss_items, rss_new_items
+            return report_data, rss_items, rss_new_items, standalone_data
 
         print(f"[翻译] 翻译完成: {result.success_count}/{result.total_count} 成功")
 
+        # debug 模式:输出完整 prompt、AI 原始响应、逐条对照
+        if self.config.get("DEBUG", False):
+            if result.prompt:
+                print(f"[翻译][DEBUG] === 发送给 AI 的 Prompt ===")
+                print(result.prompt)
+                print(f"[翻译][DEBUG] === Prompt 结束 ===")
+            if result.raw_response:
+                print(f"[翻译][DEBUG] === AI 原始响应 ===")
+                print(result.raw_response)
+                print(f"[翻译][DEBUG] === 响应结束 ===")
+            # 行数不匹配警告
+            expected = len(titles_to_translate)
+            if result.parsed_count != expected:
+                print(f"[翻译][DEBUG] ⚠️ 行数不匹配:期望 {expected} 条,AI 返回 {result.parsed_count} 条")
+            # 逐条对照
+            unchanged_count = 0
+            for i, res in enumerate(result.results):
+                if not res.success and res.error:
+                    print(f"[翻译][DEBUG] [{i+1}] !! 失败: {res.error}")
+                elif res.original_text == res.translated_text:
+                    unchanged_count += 1
+                else:
+                    print(f"[翻译][DEBUG] [{i+1}] {res.original_text} => {res.translated_text}")
+            if unchanged_count > 0:
+                print(f"[翻译][DEBUG] (另有 {unchanged_count} 条未变化,已省略)")
+
         # 回填翻译结果
         for i, (loc_type, idx1, idx2) in enumerate(title_locations):
             if i < len(result.results) and result.results[i].success:
@@ -158,8 +206,12 @@ class NotificationDispatcher:
                     rss_items[idx1]["titles"][idx2]["title"] = translated
                 elif loc_type == "rss_new_items" and rss_new_items:
                     rss_new_items[idx1]["titles"][idx2]["title"] = translated
+                elif loc_type == "standalone_platforms" and standalone_data:
+                    standalone_data["platforms"][idx1]["items"][idx2]["title"] = translated
+                elif loc_type == "standalone_rss" and standalone_data:
+                    standalone_data["rss_feeds"][idx1]["items"][idx2]["title"] = translated
 
-        return report_data, rss_items, rss_new_items
+        return report_data, rss_items, rss_new_items, standalone_data
 
     def dispatch_all(
         self,
@@ -179,7 +231,7 @@ class NotificationDispatcher:
 
         Args:
             report_data: 报告数据(由 prepare_report_data 生成)
-            report_type: 报告类型(如 "当日汇总"、"实时增量")
+            report_type: 报告类型(如 "全天汇总"、"当前榜单"、"增量分析")
             update_info: 版本更新信息(可选)
             proxy_url: 代理 URL(可选)
             mode: 报告模式 (daily/current/incremental)
@@ -197,9 +249,9 @@ class NotificationDispatcher:
         # 获取区域显示配置
         display_regions = self.config.get("DISPLAY", {}).get("REGIONS", {})
 
-        # 执行翻译(如果启用)
-        report_data, rss_items, rss_new_items = self._translate_content(
-            report_data, rss_items, rss_new_items
+        # 执行翻译(如果启用,根据 display_regions 跳过不展示的区域
+        report_data, rss_items, rss_new_items, standalone_data = self._translate_content(
+            report_data, rss_items, rss_new_items, standalone_data, display_regions
         )
 
         # 飞书
@@ -317,7 +369,6 @@ class NotificationDispatcher:
     ) -> bool:
         """发送到飞书(多账号,支持热榜+RSS合并+AI分析+独立展示区)"""
         display_regions = display_regions or {}
-        # 根据区域开关决定是否发送对应内容
         if not display_regions.get("HOTLIST", True):
             report_data = {"stats": [], "failed_ids": [], "new_titles": [], "id_to_name": {}}
 
@@ -337,7 +388,7 @@ class NotificationDispatcher:
                 split_content_func=self.split_content_func,
                 get_time_func=self.get_time_func,
                 rss_items=rss_items if display_regions.get("RSS", True) else None,
-                rss_new_items=rss_new_items if display_regions.get("RSS", True) else None,
+                rss_new_items=rss_new_items if (display_regions.get("RSS", True) and display_regions.get("NEW_ITEMS", True)) else None,
                 ai_analysis=ai_analysis if display_regions.get("AI_ANALYSIS", True) else None,
                 display_regions=display_regions,
                 standalone_data=standalone_data if display_regions.get("STANDALONE", False) else None,
@@ -377,7 +428,7 @@ class NotificationDispatcher:
                 batch_interval=self.config.get("BATCH_SEND_INTERVAL", 1.0),
                 split_content_func=self.split_content_func,
                 rss_items=rss_items if display_regions.get("RSS", True) else None,
-                rss_new_items=rss_new_items if display_regions.get("RSS", True) else None,
+                rss_new_items=rss_new_items if (display_regions.get("RSS", True) and display_regions.get("NEW_ITEMS", True)) else None,
                 ai_analysis=ai_analysis if display_regions.get("AI_ANALYSIS", True) else None,
                 display_regions=display_regions,
                 standalone_data=standalone_data if display_regions.get("STANDALONE", False) else None,
@@ -418,7 +469,7 @@ class NotificationDispatcher:
                 msg_type=self.config.get("WEWORK_MSG_TYPE", "markdown"),
                 split_content_func=self.split_content_func,
                 rss_items=rss_items if display_regions.get("RSS", True) else None,
-                rss_new_items=rss_new_items if display_regions.get("RSS", True) else None,
+                rss_new_items=rss_new_items if (display_regions.get("RSS", True) and display_regions.get("NEW_ITEMS", True)) else None,
                 ai_analysis=ai_analysis if display_regions.get("AI_ANALYSIS", True) else None,
                 display_regions=display_regions,
                 standalone_data=standalone_data if display_regions.get("STANDALONE", False) else None,
@@ -449,7 +500,6 @@ class NotificationDispatcher:
         if not telegram_tokens or not telegram_chat_ids:
             return False
 
-        # 验证配对
         valid, count = validate_paired_configs(
             {"bot_token": telegram_tokens, "chat_id": telegram_chat_ids},
             "Telegram",
@@ -458,7 +508,6 @@ class NotificationDispatcher:
         if not valid or count == 0:
             return False
 
-        # 限制账号数量
         telegram_tokens = limit_accounts(telegram_tokens, self.max_accounts, "Telegram")
         telegram_chat_ids = telegram_chat_ids[: len(telegram_tokens)]
 
@@ -481,7 +530,7 @@ class NotificationDispatcher:
                     batch_interval=self.config.get("BATCH_SEND_INTERVAL", 1.0),
                     split_content_func=self.split_content_func,
                     rss_items=rss_items if display_regions.get("RSS", True) else None,
-                    rss_new_items=rss_new_items if display_regions.get("RSS", True) else None,
+                    rss_new_items=rss_new_items if (display_regions.get("RSS", True) and display_regions.get("NEW_ITEMS", True)) else None,
                     ai_analysis=ai_analysis if display_regions.get("AI_ANALYSIS", True) else None,
                     display_regions=display_regions,
                     standalone_data=standalone_data if display_regions.get("STANDALONE", False) else None,
@@ -515,14 +564,12 @@ class NotificationDispatcher:
         if not ntfy_server_url or not ntfy_topics:
             return False
 
-        # 验证 token 和 topic 数量一致(如果配置了 token)
         if ntfy_tokens and len(ntfy_tokens) != len(ntfy_topics):
             print(
                 f"❌ ntfy 配置错误:topic 数量({len(ntfy_topics)})与 token 数量({len(ntfy_tokens)})不一致,跳过 ntfy 推送"
             )
             return False
 
-        # 限制账号数量
         ntfy_topics = limit_accounts(ntfy_topics, self.max_accounts, "ntfy")
         if ntfy_tokens:
             ntfy_tokens = ntfy_tokens[: len(ntfy_topics)]
@@ -545,7 +592,7 @@ class NotificationDispatcher:
                     batch_size=3800,
                     split_content_func=self.split_content_func,
                     rss_items=rss_items if display_regions.get("RSS", True) else None,
-                    rss_new_items=rss_new_items if display_regions.get("RSS", True) else None,
+                    rss_new_items=rss_new_items if (display_regions.get("RSS", True) and display_regions.get("NEW_ITEMS", True)) else None,
                     ai_analysis=ai_analysis if display_regions.get("AI_ANALYSIS", True) else None,
                     display_regions=display_regions,
                     standalone_data=standalone_data if display_regions.get("STANDALONE", False) else None,
@@ -587,7 +634,7 @@ class NotificationDispatcher:
                 batch_interval=self.config.get("BATCH_SEND_INTERVAL", 1.0),
                 split_content_func=self.split_content_func,
                 rss_items=rss_items if display_regions.get("RSS", True) else None,
-                rss_new_items=rss_new_items if display_regions.get("RSS", True) else None,
+                rss_new_items=rss_new_items if (display_regions.get("RSS", True) and display_regions.get("NEW_ITEMS", True)) else None,
                 ai_analysis=ai_analysis if display_regions.get("AI_ANALYSIS", True) else None,
                 display_regions=display_regions,
                 standalone_data=standalone_data if display_regions.get("STANDALONE", False) else None,
@@ -627,7 +674,7 @@ class NotificationDispatcher:
                 batch_interval=self.config.get("BATCH_SEND_INTERVAL", 1.0),
                 split_content_func=self.split_content_func,
                 rss_items=rss_items if display_regions.get("RSS", True) else None,
-                rss_new_items=rss_new_items if display_regions.get("RSS", True) else None,
+                rss_new_items=rss_new_items if (display_regions.get("RSS", True) and display_regions.get("NEW_ITEMS", True)) else None,
                 ai_analysis=ai_analysis if display_regions.get("AI_ANALYSIS", True) else None,
                 display_regions=display_regions,
                 standalone_data=standalone_data if display_regions.get("STANDALONE", False) else None,
@@ -670,7 +717,7 @@ class NotificationDispatcher:
                 if i < len(templates):
                     template = templates[i]
                 elif len(templates) == 1:
-                    template = templates[0] # 共用一个模板
+                    template = templates[0]
 
             account_label = f"账号{i+1}" if len(urls) > 1 else ""
 
@@ -687,7 +734,7 @@ class NotificationDispatcher:
                 batch_interval=self.config.get("BATCH_SEND_INTERVAL", 1.0),
                 split_content_func=self.split_content_func,
                 rss_items=rss_items if display_regions.get("RSS", True) else None,
-                rss_new_items=rss_new_items if display_regions.get("RSS", True) else None,
+                rss_new_items=rss_new_items if (display_regions.get("RSS", True) and display_regions.get("NEW_ITEMS", True)) else None,
                 ai_analysis=ai_analysis if display_regions.get("AI_ANALYSIS", True) else None,
                 display_regions=display_regions,
                 standalone_data=standalone_data if display_regions.get("STANDALONE", False) else None,
@@ -922,7 +969,6 @@ class NotificationDispatcher:
         channel: str,
     ) -> bool:
         """发送 RSS 到 Markdown 兼容渠道(企业微信、Telegram、ntfy、Bark、Slack)"""
-        import requests
 
         content = render_rss_markdown_content(
             rss_items=rss_items,

+ 18 - 5
trendradar/notification/formatters.py

@@ -17,20 +17,29 @@ def strip_markdown(text: str) -> str:
     Returns:
         纯文本内容
     """
+    # 转换链接 [text](url) -> text url(保留 URL)
+    text = re.sub(r'\[([^\]]+)\]\(([^)]+)\)', r'\1 \2', text)
+
+    # 先保护 URL,避免后续 markdown 清洗误伤链接中的下划线等字符
+    protected_urls: list[str] = []
+
+    def _protect_url(match: re.Match) -> str:
+        protected_urls.append(match.group(0))
+        return f"@@URLTOKEN{len(protected_urls) - 1}@@"
+
+    text = re.sub(r'https?://[^\s<>\]]+', _protect_url, text)
+
     # 去除粗体 **text** 或 __text__
     text = re.sub(r'\*\*(.+?)\*\*', r'\1', text)
-    text = re.sub(r'__(.+?)__', r'\1', text)
+    text = re.sub(r'(?<!\w)__(?!\s)(.+?)(?<!\s)__(?!\w)', r'\1', text)
 
     # 去除斜体 *text* 或 _text_
     text = re.sub(r'\*(.+?)\*', r'\1', text)
-    text = re.sub(r'_(.+?)_', r'\1', text)
+    text = re.sub(r'(?<!\w)_(?!\s)(.+?)(?<!\s)_(?!\w)', r'\1', text)
 
     # 去除删除线 ~~text~~
     text = re.sub(r'~~(.+?)~~', r'\1', text)
 
-    # 转换链接 [text](url) -> text url(保留 URL)
-    text = re.sub(r'\[([^\]]+)\]\(([^)]+)\)', r'\1 \2', text)
-
     # 去除图片 ![alt](url) -> alt
     text = re.sub(r'!\[(.+?)\]\(.+?\)', r'\1', text)
 
@@ -53,6 +62,10 @@ def strip_markdown(text: str) -> str:
     # 清理多余的空行(保留最多两个连续空行)
     text = re.sub(r'\n{3,}', '\n\n', text)
 
+    # 还原之前保护的 URL
+    for idx, url in enumerate(protected_urls):
+        text = text.replace(f"@@URLTOKEN{idx}@@", url)
+
     return text.strip()
 
 

+ 7 - 8
trendradar/notification/senders.py

@@ -24,7 +24,7 @@ from email.mime.multipart import MIMEMultipart
 from email.mime.text import MIMEText
 from email.utils import formataddr, formatdate, make_msgid
 from pathlib import Path
-from typing import Any, Callable, Dict, List, Optional
+from typing import Any, Callable, Dict, Optional
 from urllib.parse import urlparse
 
 import requests
@@ -168,7 +168,7 @@ def send_to_feishu(
 
         # 飞书 webhook 只显示 content.text,所有信息都整合到 text 中
         payload = {
-            "msg_type": "text",
+            "msg_type": "interactive",
             "content": {
                 "text": batch_content,
             },
@@ -804,11 +804,10 @@ def send_to_ntfy(
 
     # 避免 HTTP header 编码问题
     report_type_en_map = {
-        "当日汇总": "Daily Summary",
-        "当前榜单汇总": "Current Ranking",
-        "增量更新": "Incremental Update",
-        "实时增量": "Realtime Incremental",
-        "实时当前榜单": "Realtime Current Ranking",
+        "全天汇总": "Daily Summary",
+        "当前榜单": "Current Ranking",
+        "增量分析": "Incremental Update",
+        "通知连通性测试": "Notification Test",
     }
     report_type_en = report_type_en_map.get(report_type, "News Report")
 
@@ -1325,7 +1324,7 @@ def send_to_generic_webhook(
     # 获取分批内容
     # 使用 'wework' 作为 format_type 以获取 markdown 格式的通用输出
     # 预留一定空间给模板外壳
-    template_overhead = 200 
+    template_overhead = 200
     batches = split_content_func(
         report_data, "wework", update_info, max_bytes=batch_size - template_overhead, mode=mode,
         rss_items=rss_items,

+ 66 - 0
trendradar/storage/ai_filter_schema.sql

@@ -0,0 +1,66 @@
+-- AI 智能筛选相关表结构
+-- 在 news 库中创建,与 news_items 同库
+
+-- ============================================
+-- AI 筛选兴趣标签表
+-- 存储从用户兴趣描述中 AI 提取的结构化标签
+-- 按版本管理,提示词变更时旧版本标记 deprecated
+-- 支持多兴趣文件隔离(interests_file 区分不同文件的标签集)
+-- ============================================
+CREATE TABLE IF NOT EXISTS ai_filter_tags (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    tag TEXT NOT NULL,                    -- 标签名,如 "AI/大模型"
+    description TEXT DEFAULT '',          -- 标签描述,AI 分类时参考
+    priority INTEGER NOT NULL DEFAULT 9999, -- 标签优先级(值越小优先级越高)
+    status TEXT DEFAULT 'active',        -- active / deprecated
+    deprecated_at TEXT,                   -- 废弃时间
+    version INTEGER NOT NULL,            -- 版本号,提示词变更时 +1
+    prompt_hash TEXT NOT NULL,           -- 兴趣描述文件的 hash(格式: filename:md5)
+    interests_file TEXT NOT NULL DEFAULT 'ai_interests.txt',  -- 关联的兴趣文件名
+    created_at TEXT NOT NULL
+);
+
+-- ============================================
+-- AI 筛选分类结果表
+-- 每条新闻 × 每个标签 = 一行
+-- 引用 news_items.id 或 rss_items.id(通过 source_type 区分)
+-- ============================================
+CREATE TABLE IF NOT EXISTS ai_filter_results (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    news_item_id INTEGER NOT NULL,       -- 引用 news_items.id 或 rss_items.id
+    source_type TEXT NOT NULL DEFAULT 'hotlist',  -- hotlist / rss
+    tag_id INTEGER NOT NULL,             -- 引用 ai_filter_tags.id
+    relevance_score REAL DEFAULT 0,      -- 相关度 0.0 ~ 1.0
+    status TEXT DEFAULT 'active',        -- active / deprecated
+    deprecated_at TEXT,
+    created_at TEXT NOT NULL,
+    UNIQUE(news_item_id, source_type, tag_id)
+);
+
+-- ============================================
+-- AI 筛选已分析新闻记录表
+-- 记录所有已被 AI 分析过的新闻(无论匹配与否)
+-- 用于去重,避免重复发送给 AI 浪费 token
+-- ============================================
+CREATE TABLE IF NOT EXISTS ai_filter_analyzed_news (
+    news_item_id INTEGER NOT NULL,       -- 引用 news_items.id 或 rss_items.id
+    source_type TEXT NOT NULL DEFAULT 'hotlist',  -- hotlist / rss
+    interests_file TEXT NOT NULL DEFAULT 'ai_interests.txt',  -- 关联的兴趣文件
+    prompt_hash TEXT NOT NULL,           -- 分析时使用的标签集 hash
+    matched INTEGER NOT NULL DEFAULT 0,  -- 是否匹配: 0=不匹配, 1=匹配
+    created_at TEXT NOT NULL,
+    PRIMARY KEY (news_item_id, source_type, interests_file)
+);
+
+-- ============================================
+-- 索引
+-- ============================================
+CREATE INDEX IF NOT EXISTS idx_ai_filter_tags_status ON ai_filter_tags(status);
+CREATE INDEX IF NOT EXISTS idx_ai_filter_tags_version ON ai_filter_tags(version);
+CREATE INDEX IF NOT EXISTS idx_ai_filter_tags_file ON ai_filter_tags(interests_file, status);
+CREATE INDEX IF NOT EXISTS idx_ai_filter_tags_priority ON ai_filter_tags(interests_file, status, priority);
+CREATE INDEX IF NOT EXISTS idx_ai_filter_results_status ON ai_filter_results(status);
+CREATE INDEX IF NOT EXISTS idx_ai_filter_results_news ON ai_filter_results(news_item_id, source_type);
+CREATE INDEX IF NOT EXISTS idx_ai_filter_results_tag ON ai_filter_results(tag_id);
+CREATE INDEX IF NOT EXISTS idx_analyzed_news_lookup ON ai_filter_analyzed_news(source_type, interests_file);
+CREATE INDEX IF NOT EXISTS idx_analyzed_news_hash ON ai_filter_analyzed_news(interests_file, prompt_hash);

+ 63 - 3
trendradar/storage/base.py

@@ -7,7 +7,7 @@
 
 from abc import ABC, abstractmethod
 from dataclasses import dataclass, field
-from typing import Dict, List, Optional, Any
+from typing import Dict, List, Optional, Any, Set
 
 
 @dataclass
@@ -372,14 +372,13 @@ class StorageBackend(ABC):
         pass
 
     @abstractmethod
-    def save_html_report(self, html_content: str, filename: str, is_summary: bool = False) -> Optional[str]:
+    def save_html_report(self, html_content: str, filename: str) -> Optional[str]:
         """
         保存 HTML 报告
 
         Args:
             html_content: HTML 内容
             filename: 文件名
-            is_summary: 是否为汇总报告
 
         Returns:
             保存的文件路径
@@ -465,6 +464,67 @@ class StorageBackend(ABC):
         """
         return False
 
+    # === AI 智能筛选(默认实现,子类通过 mixin 覆盖) ===
+
+    def begin_batch(self) -> None:
+        """开启批量模式(远程后端延迟上传,本地后端无操作)"""
+        pass
+
+    def end_batch(self) -> None:
+        """结束批量模式"""
+        pass
+
+    def get_active_ai_filter_tags(self, date: Optional[str] = None, interests_file: str = "ai_interests.txt") -> List[Dict]:
+        return []
+
+    def get_latest_prompt_hash(self, date: Optional[str] = None, interests_file: str = "ai_interests.txt") -> Optional[str]:
+        return None
+
+    def get_latest_ai_filter_tag_version(self, date: Optional[str] = None) -> int:
+        return 0
+
+    def deprecate_all_ai_filter_tags(self, date: Optional[str] = None, interests_file: str = "ai_interests.txt") -> int:
+        return 0
+
+    def save_ai_filter_tags(self, tags: List[Dict], version: int, prompt_hash: str, date: Optional[str] = None, interests_file: str = "ai_interests.txt") -> int:
+        return 0
+
+    def save_ai_filter_results(self, results: List[Dict], date: Optional[str] = None) -> int:
+        return 0
+
+    def get_active_ai_filter_results(self, date: Optional[str] = None, interests_file: str = "ai_interests.txt") -> List[Dict]:
+        return []
+
+    def deprecate_specific_ai_filter_tags(self, tag_ids: List[int], date: Optional[str] = None) -> int:
+        return 0
+
+    def update_ai_filter_tags_hash(self, interests_file: str, new_hash: str, date: Optional[str] = None) -> int:
+        return 0
+
+    def update_ai_filter_tag_descriptions(self, tag_updates: List[Dict], date: Optional[str] = None, interests_file: str = "ai_interests.txt") -> int:
+        return 0
+
+    def update_ai_filter_tag_priorities(self, tag_priorities: List[Dict], date: Optional[str] = None, interests_file: str = "ai_interests.txt") -> int:
+        return 0
+
+    def save_analyzed_news(self, news_ids: List[str], source_type: str, interests_file: str, prompt_hash: str, matched_ids: Set[str], date: Optional[str] = None) -> int:
+        return 0
+
+    def get_analyzed_news_ids(self, source_type: str = "hotlist", date: Optional[str] = None, interests_file: str = "ai_interests.txt") -> Set[str]:
+        return set()
+
+    def clear_analyzed_news(self, date: Optional[str] = None, interests_file: str = "ai_interests.txt") -> int:
+        return 0
+
+    def clear_unmatched_analyzed_news(self, date: Optional[str] = None, interests_file: str = "ai_interests.txt") -> int:
+        return 0
+
+    def get_all_news_ids(self, date: Optional[str] = None) -> List[Dict]:
+        return []
+
+    def get_all_rss_ids(self, date: Optional[str] = None) -> List[Dict]:
+        return []
+
 
 def convert_crawl_results_to_news_data(
     results: Dict[str, Dict],

+ 57 - 3
trendradar/storage/local.py

@@ -13,7 +13,7 @@ from datetime import datetime, timedelta
 from pathlib import Path
 from typing import Dict, List, Optional
 
-from trendradar.storage.base import StorageBackend, NewsItem, NewsData, RSSItem, RSSData
+from trendradar.storage.base import StorageBackend, NewsData, RSSItem, RSSData
 from trendradar.storage.sqlite_mixin import SQLiteStorageMixin
 from trendradar.utils.time import (
     DEFAULT_TIMEZONE,
@@ -227,6 +227,61 @@ class LocalStorageBackend(SQLiteStorageMixin, StorageBackend):
             return None
         return self._get_latest_rss_data_impl(date)
 
+    # ========================================
+    # AI 智能筛选
+    # ========================================
+
+    def get_active_ai_filter_tags(self, date=None, interests_file="ai_interests.txt"):
+        return self._get_active_tags_impl(date, interests_file)
+
+    def get_latest_prompt_hash(self, date=None, interests_file="ai_interests.txt"):
+        return self._get_latest_prompt_hash_impl(date, interests_file)
+
+    def get_latest_ai_filter_tag_version(self, date=None):
+        return self._get_latest_tag_version_impl(date)
+
+    def deprecate_all_ai_filter_tags(self, date=None, interests_file="ai_interests.txt"):
+        return self._deprecate_all_tags_impl(date, interests_file)
+
+    def save_ai_filter_tags(self, tags, version, prompt_hash, date=None, interests_file="ai_interests.txt"):
+        return self._save_tags_impl(date, tags, version, prompt_hash, interests_file)
+
+    def save_ai_filter_results(self, results, date=None):
+        return self._save_filter_results_impl(date, results)
+
+    def get_active_ai_filter_results(self, date=None, interests_file="ai_interests.txt"):
+        return self._get_active_filter_results_impl(date, interests_file)
+
+    def deprecate_specific_ai_filter_tags(self, tag_ids, date=None):
+        return self._deprecate_specific_tags_impl(date, tag_ids)
+
+    def update_ai_filter_tags_hash(self, interests_file, new_hash, date=None):
+        return self._update_tags_hash_impl(date, interests_file, new_hash)
+
+    def update_ai_filter_tag_descriptions(self, tag_updates, date=None, interests_file="ai_interests.txt"):
+        return self._update_tag_descriptions_impl(date, tag_updates, interests_file)
+
+    def update_ai_filter_tag_priorities(self, tag_priorities, date=None, interests_file="ai_interests.txt"):
+        return self._update_tag_priorities_impl(date, tag_priorities, interests_file)
+
+    def save_analyzed_news(self, news_ids, source_type, interests_file, prompt_hash, matched_ids, date=None):
+        return self._save_analyzed_news_impl(date, news_ids, source_type, interests_file, prompt_hash, matched_ids)
+
+    def get_analyzed_news_ids(self, source_type="hotlist", date=None, interests_file="ai_interests.txt"):
+        return self._get_analyzed_news_ids_impl(date, source_type, interests_file)
+
+    def clear_analyzed_news(self, date=None, interests_file="ai_interests.txt"):
+        return self._clear_analyzed_news_impl(date, interests_file)
+
+    def clear_unmatched_analyzed_news(self, date=None, interests_file="ai_interests.txt"):
+        return self._clear_unmatched_analyzed_news_impl(date, interests_file)
+
+    def get_all_news_ids(self, date=None):
+        return self._get_all_news_ids_impl(date)
+
+    def get_all_rss_ids(self, date=None):
+        return self._get_all_rss_ids_impl(date)
+
     # ========================================
     # 本地特有功能:TXT/HTML 快照
     # ========================================
@@ -289,7 +344,7 @@ class LocalStorageBackend(SQLiteStorageMixin, StorageBackend):
             print(f"[本地存储] 保存 TXT 快照失败: {e}")
             return None
 
-    def save_html_report(self, html_content: str, filename: str, is_summary: bool = False) -> Optional[str]:
+    def save_html_report(self, html_content: str, filename: str) -> Optional[str]:
         """
         保存 HTML 报告
 
@@ -298,7 +353,6 @@ class LocalStorageBackend(SQLiteStorageMixin, StorageBackend):
         Args:
             html_content: HTML 内容
             filename: 文件名
-            is_summary: 是否为汇总报告
 
         Returns:
             保存的文件路径

+ 80 - 2
trendradar/storage/manager.py

@@ -234,9 +234,9 @@ class StorageManager:
         """保存 TXT 快照"""
         return self.get_backend().save_txt_snapshot(data)
 
-    def save_html_report(self, html_content: str, filename: str, is_summary: bool = False) -> Optional[str]:
+    def save_html_report(self, html_content: str, filename: str) -> Optional[str]:
         """保存 HTML 报告"""
-        return self.get_backend().save_html_report(html_content, filename, is_summary)
+        return self.get_backend().save_html_report(html_content, filename)
 
     def is_first_crawl_today(self, date: Optional[str] = None) -> bool:
         """检查是否是当天第一次抓取"""
@@ -289,6 +289,84 @@ class StorageManager:
         """记录时间段的 action 执行"""
         return self.get_backend().record_period_execution(date_str, period_key, action)
 
+    # === AI 智能筛选存储操作 ===
+
+    def begin_batch(self):
+        """开启批量模式(远程后端延迟上传)"""
+        self.get_backend().begin_batch()
+
+    def end_batch(self):
+        """结束批量模式(统一上传脏数据库)"""
+        self.get_backend().end_batch()
+
+    def get_active_ai_filter_tags(self, date=None, interests_file="ai_interests.txt"):
+        """获取指定兴趣文件的 active 标签"""
+        return self.get_backend().get_active_ai_filter_tags(date, interests_file)
+
+    def get_latest_prompt_hash(self, date=None, interests_file="ai_interests.txt"):
+        """获取指定兴趣文件的最新 prompt_hash"""
+        return self.get_backend().get_latest_prompt_hash(date, interests_file)
+
+    def get_latest_ai_filter_tag_version(self, date=None):
+        """获取最新标签版本号"""
+        return self.get_backend().get_latest_ai_filter_tag_version(date)
+
+    def deprecate_all_ai_filter_tags(self, date=None, interests_file="ai_interests.txt"):
+        """废弃指定兴趣文件的 active 标签和分类结果"""
+        return self.get_backend().deprecate_all_ai_filter_tags(date, interests_file)
+
+    def save_ai_filter_tags(self, tags, version, prompt_hash, date=None, interests_file="ai_interests.txt"):
+        """保存新提取的标签"""
+        return self.get_backend().save_ai_filter_tags(tags, version, prompt_hash, date, interests_file)
+
+    def save_ai_filter_results(self, results, date=None):
+        """保存分类结果"""
+        return self.get_backend().save_ai_filter_results(results, date)
+
+    def get_active_ai_filter_results(self, date=None, interests_file="ai_interests.txt"):
+        """获取指定兴趣文件的 active 分类结果"""
+        return self.get_backend().get_active_ai_filter_results(date, interests_file)
+
+    def deprecate_specific_ai_filter_tags(self, tag_ids, date=None):
+        """废弃指定 ID 的标签及其关联分类结果"""
+        return self.get_backend().deprecate_specific_ai_filter_tags(tag_ids, date)
+
+    def update_ai_filter_tags_hash(self, interests_file, new_hash, date=None):
+        """更新指定兴趣文件所有 active 标签的 prompt_hash"""
+        return self.get_backend().update_ai_filter_tags_hash(interests_file, new_hash, date)
+
+    def update_ai_filter_tag_descriptions(self, tag_updates, date=None, interests_file="ai_interests.txt"):
+        """按 tag 名匹配,更新 active 标签的 description"""
+        return self.get_backend().update_ai_filter_tag_descriptions(tag_updates, date, interests_file)
+
+    def update_ai_filter_tag_priorities(self, tag_priorities, date=None, interests_file="ai_interests.txt"):
+        """按 tag 名匹配,更新 active 标签的 priority"""
+        return self.get_backend().update_ai_filter_tag_priorities(tag_priorities, date, interests_file)
+
+    def save_analyzed_news(self, news_ids, source_type, interests_file, prompt_hash, matched_ids, date=None):
+        """批量记录已分析的新闻(匹配与不匹配都记录)"""
+        return self.get_backend().save_analyzed_news(news_ids, source_type, interests_file, prompt_hash, matched_ids, date)
+
+    def get_analyzed_news_ids(self, source_type="hotlist", date=None, interests_file="ai_interests.txt"):
+        """获取已分析过的新闻 ID 集合"""
+        return self.get_backend().get_analyzed_news_ids(source_type, date, interests_file)
+
+    def clear_analyzed_news(self, date=None, interests_file="ai_interests.txt"):
+        """清除指定兴趣文件的所有已分析记录"""
+        return self.get_backend().clear_analyzed_news(date, interests_file)
+
+    def clear_unmatched_analyzed_news(self, date=None, interests_file="ai_interests.txt"):
+        """清除不匹配的已分析记录"""
+        return self.get_backend().clear_unmatched_analyzed_news(date, interests_file)
+
+    def get_all_news_ids(self, date=None):
+        """获取所有新闻 ID 和标题"""
+        return self.get_backend().get_all_news_ids(date)
+
+    def get_all_rss_ids(self, date=None):
+        """获取所有 RSS ID 和标题"""
+        return self.get_backend().get_all_rss_ids(date)
+
 
 
 def get_storage_manager(

+ 108 - 2
trendradar/storage/remote.py

@@ -28,7 +28,7 @@ except ImportError:
     BotoConfig = None
     ClientError = Exception
 
-from trendradar.storage.base import StorageBackend, NewsItem, NewsData, RSSItem, RSSData
+from trendradar.storage.base import StorageBackend, NewsData, RSSItem, RSSData
 from trendradar.storage.sqlite_mixin import SQLiteStorageMixin
 from trendradar.utils.time import (
     DEFAULT_TIMEZONE,
@@ -119,6 +119,10 @@ class RemoteStorageBackend(SQLiteStorageMixin, StorageBackend):
         self._downloaded_files: List[Path] = []
         self._db_connections: Dict[str, sqlite3.Connection] = {}
 
+        # 批量模式:延迟上传,避免频繁上传同一文件
+        self._batch_mode = False
+        self._batch_dirty: set = set()  # 待上传的 (date, db_type) 集合
+
         print(f"[远程存储] 初始化完成,存储桶: {bucket_name},签名版本: {signature_version}")
 
     @property
@@ -248,10 +252,24 @@ class RemoteStorageBackend(SQLiteStorageMixin, StorageBackend):
             print(f"[远程存储] 下载异常: {e}")
             raise
 
+    def begin_batch(self):
+        """开启批量模式:延迟上传,避免频繁上传同一文件"""
+        self._batch_mode = True
+        self._batch_dirty.clear()
+
+    def end_batch(self):
+        """结束批量模式:统一上传所有脏数据库"""
+        self._batch_mode = False
+        for date, db_type in self._batch_dirty:
+            self._upload_sqlite(date, db_type)
+        self._batch_dirty.clear()
+
     def _upload_sqlite(self, date: Optional[str] = None, db_type: str = "news") -> bool:
         """
         上传本地 SQLite 文件到远程存储
 
+        批量模式下延迟上传,由 end_batch() 统一触发。
+
         Args:
             date: 日期字符串
             db_type: 数据库类型 ("news" 或 "rss")
@@ -259,6 +277,9 @@ class RemoteStorageBackend(SQLiteStorageMixin, StorageBackend):
         Returns:
             是否上传成功
         """
+        if self._batch_mode:
+            self._batch_dirty.add((date, db_type))
+            return True
         local_path = self._get_local_db_path(date, db_type)
         r2_key = self._get_remote_db_key(date, db_type)
 
@@ -461,6 +482,91 @@ class RemoteStorageBackend(SQLiteStorageMixin, StorageBackend):
         """获取最新一次抓取的 RSS 数据"""
         return self._get_latest_rss_data_impl(date)
 
+    # ========================================
+    # AI 智能筛选存储方法
+    # ========================================
+
+    def get_active_ai_filter_tags(self, date=None, interests_file="ai_interests.txt"):
+        return self._get_active_tags_impl(date, interests_file)
+
+    def get_latest_prompt_hash(self, date=None, interests_file="ai_interests.txt"):
+        return self._get_latest_prompt_hash_impl(date, interests_file)
+
+    def get_latest_ai_filter_tag_version(self, date=None):
+        return self._get_latest_tag_version_impl(date)
+
+    def deprecate_all_ai_filter_tags(self, date=None, interests_file="ai_interests.txt"):
+        count = self._deprecate_all_tags_impl(date, interests_file)
+        if count > 0:
+            self._upload_sqlite(date)
+        return count
+
+    def save_ai_filter_tags(self, tags, version, prompt_hash, date=None, interests_file="ai_interests.txt"):
+        count = self._save_tags_impl(date, tags, version, prompt_hash, interests_file)
+        if count > 0:
+            self._upload_sqlite(date)
+        return count
+
+    def save_ai_filter_results(self, results, date=None):
+        count = self._save_filter_results_impl(date, results)
+        if count > 0:
+            self._upload_sqlite(date)
+        return count
+
+    def get_active_ai_filter_results(self, date=None, interests_file="ai_interests.txt"):
+        return self._get_active_filter_results_impl(date, interests_file)
+
+    def deprecate_specific_ai_filter_tags(self, tag_ids, date=None):
+        count = self._deprecate_specific_tags_impl(date, tag_ids)
+        if count > 0:
+            self._upload_sqlite(date)
+        return count
+
+    def update_ai_filter_tags_hash(self, interests_file, new_hash, date=None):
+        count = self._update_tags_hash_impl(date, interests_file, new_hash)
+        if count > 0:
+            self._upload_sqlite(date)
+        return count
+
+    def update_ai_filter_tag_descriptions(self, tag_updates, date=None, interests_file="ai_interests.txt"):
+        count = self._update_tag_descriptions_impl(date, tag_updates, interests_file)
+        if count > 0:
+            self._upload_sqlite(date)
+        return count
+
+    def update_ai_filter_tag_priorities(self, tag_priorities, date=None, interests_file="ai_interests.txt"):
+        count = self._update_tag_priorities_impl(date, tag_priorities, interests_file)
+        if count > 0:
+            self._upload_sqlite(date)
+        return count
+
+    def save_analyzed_news(self, news_ids, source_type, interests_file, prompt_hash, matched_ids, date=None):
+        count = self._save_analyzed_news_impl(date, news_ids, source_type, interests_file, prompt_hash, matched_ids)
+        if count > 0:
+            self._upload_sqlite(date)
+        return count
+
+    def get_analyzed_news_ids(self, source_type="hotlist", date=None, interests_file="ai_interests.txt"):
+        return self._get_analyzed_news_ids_impl(date, source_type, interests_file)
+
+    def clear_analyzed_news(self, date=None, interests_file="ai_interests.txt"):
+        count = self._clear_analyzed_news_impl(date, interests_file)
+        if count > 0:
+            self._upload_sqlite(date)
+        return count
+
+    def clear_unmatched_analyzed_news(self, date=None, interests_file="ai_interests.txt"):
+        count = self._clear_unmatched_analyzed_news_impl(date, interests_file)
+        if count > 0:
+            self._upload_sqlite(date)
+        return count
+
+    def get_all_news_ids(self, date=None):
+        return self._get_all_news_ids_impl(date)
+
+    def get_all_rss_ids(self, date=None):
+        return self._get_all_rss_ids_impl(date)
+
     # ========================================
     # 远程特有功能:TXT/HTML 快照(临时目录)
     # ========================================
@@ -511,7 +617,7 @@ class RemoteStorageBackend(SQLiteStorageMixin, StorageBackend):
             print(f"[远程存储] 保存 TXT 快照失败: {e}")
             return None
 
-    def save_html_report(self, html_content: str, filename: str, is_summary: bool = False) -> Optional[str]:
+    def save_html_report(self, html_content: str, filename: str) -> Optional[str]:
         """保存 HTML 报告到临时目录"""
         if not self.enable_html:
             return None

+ 568 - 0
trendradar/storage/sqlite_mixin.py

@@ -68,6 +68,10 @@ class SQLiteStorageMixin:
             return Path(__file__).parent / "rss_schema.sql"
         return Path(__file__).parent / "schema.sql"
 
+    def _get_ai_filter_schema_path(self) -> Path:
+        """获取 AI 筛选 schema 文件路径"""
+        return Path(__file__).parent / "ai_filter_schema.sql"
+
     def _init_tables(self, conn: sqlite3.Connection, db_type: str = "news") -> None:
         """
         从 schema.sql 初始化数据库表结构
@@ -85,6 +89,13 @@ class SQLiteStorageMixin:
         else:
             raise FileNotFoundError(f"Schema file not found: {schema_path}")
 
+        # news 库额外加载 AI 筛选表结构
+        if db_type == "news":
+            ai_filter_schema = self._get_ai_filter_schema_path()
+            if ai_filter_schema.exists():
+                with open(ai_filter_schema, "r", encoding="utf-8") as f:
+                    conn.executescript(f.read())
+
         conn.commit()
 
     # ========================================
@@ -1149,3 +1160,560 @@ class SQLiteStorageMixin:
         except Exception as e:
             print(f"[存储] 获取最新 RSS 数据失败: {e}")
             return None
+
+    # ========================================
+    # AI 智能筛选 - 标签管理
+    # ========================================
+
+    def _get_active_tags_impl(self, date: Optional[str] = None, interests_file: str = "ai_interests.txt") -> List[Dict[str, Any]]:
+        """获取指定兴趣文件的 active 标签列表"""
+        try:
+            conn = self._get_connection(date)
+            cursor = conn.cursor()
+
+            cursor.execute("""
+                SELECT id, tag, description, version, prompt_hash, priority
+                FROM ai_filter_tags
+                WHERE status = 'active' AND interests_file = ?
+                ORDER BY priority ASC, id ASC
+            """, (interests_file,))
+
+            return [
+                {
+                    "id": row[0], "tag": row[1], "description": row[2],
+                    "version": row[3], "prompt_hash": row[4], "priority": row[5],
+                }
+                for row in cursor.fetchall()
+            ]
+        except Exception as e:
+            print(f"[AI筛选] 获取标签失败: {e}")
+            return []
+
+    def _get_latest_prompt_hash_impl(self, date: Optional[str] = None, interests_file: str = "ai_interests.txt") -> Optional[str]:
+        """获取指定兴趣文件最新版本标签的 prompt_hash"""
+        try:
+            conn = self._get_connection(date)
+            cursor = conn.cursor()
+
+            cursor.execute("""
+                SELECT prompt_hash FROM ai_filter_tags
+                WHERE status = 'active' AND interests_file = ?
+                ORDER BY version DESC
+                LIMIT 1
+            """, (interests_file,))
+            row = cursor.fetchone()
+            return row[0] if row else None
+        except Exception as e:
+            print(f"[AI筛选] 获取 prompt_hash 失败: {e}")
+            return None
+
+    def _get_latest_tag_version_impl(self, date: Optional[str] = None) -> int:
+        """获取最新版本号"""
+        try:
+            conn = self._get_connection(date)
+            cursor = conn.cursor()
+
+            cursor.execute("""
+                SELECT MAX(version) FROM ai_filter_tags
+            """)
+            row = cursor.fetchone()
+            return row[0] if row and row[0] is not None else 0
+        except Exception as e:
+            print(f"[AI筛选] 获取版本号失败: {e}")
+            return 0
+
+    def _deprecate_all_tags_impl(self, date: Optional[str] = None, interests_file: str = "ai_interests.txt") -> int:
+        """将指定兴趣文件的 active 标签和关联的分类结果标记为 deprecated"""
+        try:
+            conn = self._get_connection(date)
+            cursor = conn.cursor()
+            now_str = self._get_configured_time().strftime("%Y-%m-%d %H:%M:%S")
+
+            # 获取该兴趣文件的 active 标签 id
+            cursor.execute(
+                "SELECT id FROM ai_filter_tags WHERE status = 'active' AND interests_file = ?",
+                (interests_file,)
+            )
+            tag_ids = [row[0] for row in cursor.fetchall()]
+
+            if not tag_ids:
+                return 0
+
+            # 废弃标签
+            placeholders = ",".join("?" * len(tag_ids))
+            cursor.execute(f"""
+                UPDATE ai_filter_tags
+                SET status = 'deprecated', deprecated_at = ?
+                WHERE id IN ({placeholders})
+            """, [now_str] + tag_ids)
+            tag_count = cursor.rowcount
+
+            # 废弃关联的分类结果
+            placeholders = ",".join("?" * len(tag_ids))
+            cursor.execute(f"""
+                UPDATE ai_filter_results
+                SET status = 'deprecated', deprecated_at = ?
+                WHERE tag_id IN ({placeholders}) AND status = 'active'
+            """, [now_str] + tag_ids)
+
+            conn.commit()
+            print(f"[AI筛选] 已废弃 {tag_count} 个标签及关联分类结果")
+            return tag_count
+        except Exception as e:
+            print(f"[AI筛选] 废弃标签失败: {e}")
+            return 0
+
+    def _save_tags_impl(
+        self, date: Optional[str], tags: List[Dict], version: int, prompt_hash: str,
+        interests_file: str = "ai_interests.txt"
+    ) -> int:
+        """保存新提取的标签"""
+        try:
+            conn = self._get_connection(date)
+            cursor = conn.cursor()
+            now_str = self._get_configured_time().strftime("%Y-%m-%d %H:%M:%S")
+
+            count = 0
+            for idx, tag_data in enumerate(tags, start=1):
+                priority = tag_data.get("priority", idx)
+                try:
+                    priority = int(priority)
+                except (TypeError, ValueError):
+                    priority = idx
+                cursor.execute("""
+                    INSERT INTO ai_filter_tags
+                    (tag, description, priority, version, prompt_hash, interests_file, created_at)
+                    VALUES (?, ?, ?, ?, ?, ?, ?)
+                """, (
+                    tag_data["tag"],
+                    tag_data.get("description", ""),
+                    priority,
+                    version,
+                    prompt_hash,
+                    interests_file,
+                    now_str,
+                ))
+                count += 1
+
+            conn.commit()
+            return count
+        except Exception as e:
+            print(f"[AI筛选] 保存标签失败: {e}")
+            return 0
+
+    def _deprecate_specific_tags_impl(
+        self, date: Optional[str], tag_ids: List[int]
+    ) -> int:
+        """废弃指定 ID 的标签及其关联分类结果(增量更新时使用)"""
+        if not tag_ids:
+            return 0
+        try:
+            conn = self._get_connection(date)
+            cursor = conn.cursor()
+            now_str = self._get_configured_time().strftime("%Y-%m-%d %H:%M:%S")
+
+            placeholders = ",".join("?" * len(tag_ids))
+
+            cursor.execute(f"""
+                UPDATE ai_filter_tags
+                SET status = 'deprecated', deprecated_at = ?
+                WHERE id IN ({placeholders})
+            """, [now_str] + tag_ids)
+            tag_count = cursor.rowcount
+
+            cursor.execute(f"""
+                UPDATE ai_filter_results
+                SET status = 'deprecated', deprecated_at = ?
+                WHERE tag_id IN ({placeholders}) AND status = 'active'
+            """, [now_str] + tag_ids)
+
+            conn.commit()
+            return tag_count
+        except Exception as e:
+            print(f"[AI筛选] 废弃指定标签失败: {e}")
+            return 0
+
+    def _update_tags_hash_impl(
+        self, date: Optional[str], interests_file: str, new_hash: str
+    ) -> int:
+        """更新指定兴趣文件所有 active 标签的 prompt_hash(增量更新时使用)"""
+        try:
+            conn = self._get_connection(date)
+            cursor = conn.cursor()
+
+            cursor.execute("""
+                UPDATE ai_filter_tags
+                SET prompt_hash = ?
+                WHERE interests_file = ? AND status = 'active'
+            """, (new_hash, interests_file))
+            count = cursor.rowcount
+
+            conn.commit()
+            return count
+        except Exception as e:
+            print(f"[AI筛选] 更新标签 hash 失败: {e}")
+            return 0
+
+    # ========================================
+    # AI 智能筛选 - 分类结果管理
+    # ========================================
+
+    def _update_tag_descriptions_impl(
+        self, date: Optional[str], tag_updates: List[Dict],
+        interests_file: str = "ai_interests.txt"
+    ) -> int:
+        """按 tag 名匹配,更新 active 标签的 description 字段"""
+        try:
+            conn = self._get_connection(date)
+            cursor = conn.cursor()
+
+            count = 0
+            for t in tag_updates:
+                tag_name = t.get("tag", "")
+                description = t.get("description", "")
+                if not tag_name:
+                    continue
+                cursor.execute("""
+                    UPDATE ai_filter_tags
+                    SET description = ?
+                    WHERE tag = ? AND interests_file = ? AND status = 'active'
+                """, (description, tag_name, interests_file))
+                count += cursor.rowcount
+
+            conn.commit()
+            return count
+        except Exception as e:
+            print(f"[AI筛选] 更新标签描述失败: {e}")
+            return 0
+
+    def _update_tag_priorities_impl(
+        self, date: Optional[str], tag_priorities: List[Dict],
+        interests_file: str = "ai_interests.txt"
+    ) -> int:
+        """按 tag 名匹配,更新 active 标签的 priority 字段"""
+        try:
+            conn = self._get_connection(date)
+            cursor = conn.cursor()
+
+            count = 0
+            for t in tag_priorities:
+                tag_name = t.get("tag", "")
+                priority = t.get("priority")
+                if not tag_name:
+                    continue
+                try:
+                    priority = int(priority)
+                except (TypeError, ValueError):
+                    continue
+                cursor.execute("""
+                    UPDATE ai_filter_tags
+                    SET priority = ?
+                    WHERE tag = ? AND interests_file = ? AND status = 'active'
+                """, (priority, tag_name, interests_file))
+                count += cursor.rowcount
+
+            conn.commit()
+            return count
+        except Exception as e:
+            print(f"[AI筛选] 更新标签优先级失败: {e}")
+            return 0
+
+    # ========================================
+    # AI 智能筛选 - 已分析新闻追踪
+    # ========================================
+
+    def _save_analyzed_news_impl(
+        self, date: Optional[str], news_ids: List[int], source_type: str,
+        interests_file: str, prompt_hash: str, matched_ids: set
+    ) -> int:
+        """批量记录已分析的新闻(匹配与不匹配都记录)"""
+        try:
+            conn = self._get_connection(date)
+            cursor = conn.cursor()
+            now_str = self._get_configured_time().strftime("%Y-%m-%d %H:%M:%S")
+
+            count = 0
+            for nid in news_ids:
+                try:
+                    cursor.execute("""
+                        INSERT OR REPLACE INTO ai_filter_analyzed_news
+                        (news_item_id, source_type, interests_file, prompt_hash, matched, created_at)
+                        VALUES (?, ?, ?, ?, ?, ?)
+                    """, (
+                        nid, source_type, interests_file, prompt_hash,
+                        1 if nid in matched_ids else 0,
+                        now_str,
+                    ))
+                    count += 1
+                except Exception:
+                    pass
+
+            conn.commit()
+            return count
+        except Exception as e:
+            print(f"[AI筛选] 保存已分析记录失败: {e}")
+            return 0
+
+    def _get_analyzed_news_ids_impl(
+        self, date: Optional[str] = None, source_type: str = "hotlist",
+        interests_file: str = "ai_interests.txt"
+    ) -> set:
+        """获取已分析过的新闻 ID 集合(用于去重)"""
+        try:
+            conn = self._get_connection(date)
+            cursor = conn.cursor()
+
+            cursor.execute("""
+                SELECT news_item_id FROM ai_filter_analyzed_news
+                WHERE source_type = ? AND interests_file = ?
+            """, (source_type, interests_file))
+
+            return {row[0] for row in cursor.fetchall()}
+        except Exception as e:
+            print(f"[AI筛选] 获取已分析ID失败: {e}")
+            return set()
+
+    def _clear_analyzed_news_impl(
+        self, date: Optional[str] = None, interests_file: str = "ai_interests.txt"
+    ) -> int:
+        """清除指定兴趣文件的所有已分析记录(全量重分类时使用)"""
+        try:
+            conn = self._get_connection(date)
+            cursor = conn.cursor()
+
+            cursor.execute("""
+                DELETE FROM ai_filter_analyzed_news
+                WHERE interests_file = ?
+            """, (interests_file,))
+
+            count = cursor.rowcount
+            conn.commit()
+            return count
+        except Exception as e:
+            print(f"[AI筛选] 清除已分析记录失败: {e}")
+            return 0
+
+    def _clear_unmatched_analyzed_news_impl(
+        self, date: Optional[str] = None, interests_file: str = "ai_interests.txt"
+    ) -> int:
+        """清除不匹配的已分析记录,让这些新闻有机会被新标签重新分析"""
+        try:
+            conn = self._get_connection(date)
+            cursor = conn.cursor()
+
+            cursor.execute("""
+                DELETE FROM ai_filter_analyzed_news
+                WHERE interests_file = ? AND matched = 0
+            """, (interests_file,))
+
+            count = cursor.rowcount
+            conn.commit()
+            return count
+        except Exception as e:
+            print(f"[AI筛选] 清除不匹配记录失败: {e}")
+            return 0
+
+    # ========================================
+    # AI 智能筛选 - 分类结果管理(原有)
+    # ========================================
+
+    def _save_filter_results_impl(
+        self, date: Optional[str], results: List[Dict]
+    ) -> int:
+        """批量保存分类结果"""
+        try:
+            conn = self._get_connection(date)
+            cursor = conn.cursor()
+            now_str = self._get_configured_time().strftime("%Y-%m-%d %H:%M:%S")
+
+            count = 0
+            for r in results:
+                try:
+                    cursor.execute("""
+                        INSERT INTO ai_filter_results
+                        (news_item_id, source_type, tag_id, relevance_score, created_at)
+                        VALUES (?, ?, ?, ?, ?)
+                    """, (
+                        r["news_item_id"],
+                        r.get("source_type", "hotlist"),
+                        r["tag_id"],
+                        r.get("relevance_score", 0.0),
+                        now_str,
+                    ))
+                    count += 1
+                except sqlite3.IntegrityError:
+                    pass  # 重复记录,跳过
+
+            conn.commit()
+            return count
+        except Exception as e:
+            print(f"[AI筛选] 保存分类结果失败: {e}")
+            return 0
+
+    def _get_active_filter_results_impl(self, date: Optional[str] = None, interests_file: str = "ai_interests.txt") -> List[Dict[str, Any]]:
+        """获取指定兴趣文件的 active 分类结果,JOIN news_items 获取新闻详情"""
+        try:
+            conn = self._get_connection(date)
+            cursor = conn.cursor()
+
+            # 热榜结果
+            cursor.execute("""
+                SELECT
+                    r.news_item_id, r.source_type, r.tag_id, r.relevance_score,
+                    t.tag, t.description as tag_description, t.priority,
+                    n.title, n.platform_id as source_id, p.name as source_name,
+                    n.url, n.mobile_url, n.rank,
+                    n.first_crawl_time, n.last_crawl_time, n.crawl_count
+                FROM ai_filter_results r
+                JOIN ai_filter_tags t ON r.tag_id = t.id
+                JOIN news_items n ON r.news_item_id = n.id
+                LEFT JOIN platforms p ON n.platform_id = p.id
+                WHERE r.status = 'active' AND r.source_type = 'hotlist'
+                    AND t.status = 'active' AND t.interests_file = ?
+                ORDER BY t.priority ASC, t.id ASC, r.relevance_score DESC
+            """, (interests_file,))
+
+            results = []
+            hotlist_news_ids = []
+            for row in cursor.fetchall():
+                results.append({
+                    "news_item_id": row[0], "source_type": row[1],
+                    "tag_id": row[2], "relevance_score": row[3],
+                    "tag": row[4], "tag_description": row[5], "tag_priority": row[6],
+                    "title": row[7], "source_id": row[8],
+                    "source_name": row[9] or row[8],
+                    "url": row[10] or "", "mobile_url": row[11] or "",
+                    "rank": row[12],
+                    "first_time": row[13], "last_time": row[14],
+                    "count": row[15],
+                })
+                hotlist_news_ids.append(row[0])
+
+            # 批量查排名历史(热榜)
+            ranks_map: Dict[int, List[int]] = {}
+            if hotlist_news_ids:
+                unique_ids = list(set(hotlist_news_ids))
+                placeholders = ",".join("?" * len(unique_ids))
+                cursor.execute(f"""
+                    SELECT news_item_id, rank FROM rank_history
+                    WHERE news_item_id IN ({placeholders}) AND rank != 0
+                """, unique_ids)
+                for rh_row in cursor.fetchall():
+                    nid, rank = rh_row[0], rh_row[1]
+                    if nid not in ranks_map:
+                        ranks_map[nid] = []
+                    if rank not in ranks_map[nid]:
+                        ranks_map[nid].append(rank)
+
+            for item in results:
+                item["ranks"] = ranks_map.get(item["news_item_id"], [item["rank"]])
+
+            # RSS 结果(如果有 rss 库)
+            try:
+                rss_conn = self._get_connection(date, db_type="rss")
+                rss_cursor = rss_conn.cursor()
+
+                # 从 news 库获取 rss 类型的分类结果 ID
+                cursor.execute("""
+                    SELECT r.news_item_id, r.tag_id, r.relevance_score,
+                           t.tag, t.description, t.priority
+                    FROM ai_filter_results r
+                    JOIN ai_filter_tags t ON r.tag_id = t.id
+                    WHERE r.status = 'active' AND r.source_type = 'rss'
+                        AND t.status = 'active' AND t.interests_file = ?
+                    ORDER BY t.priority ASC, t.id ASC, r.relevance_score DESC
+                """, (interests_file,))
+
+                rss_filter_rows = cursor.fetchall()
+                if rss_filter_rows:
+                    rss_ids = [row[0] for row in rss_filter_rows]
+                    placeholders = ",".join("?" * len(rss_ids))
+                    rss_cursor.execute(f"""
+                        SELECT i.id, i.title, i.feed_id, f.name as feed_name,
+                               i.url, i.published_at
+                        FROM rss_items i
+                        LEFT JOIN rss_feeds f ON i.feed_id = f.id
+                        WHERE i.id IN ({placeholders})
+                    """, rss_ids)
+
+                    rss_info = {row[0]: row for row in rss_cursor.fetchall()}
+
+                    for fr_row in rss_filter_rows:
+                        rss_id = fr_row[0]
+                        info = rss_info.get(rss_id)
+                        if info:
+                            results.append({
+                                "news_item_id": rss_id,
+                                "source_type": "rss",
+                                "tag_id": fr_row[1],
+                                "relevance_score": fr_row[2],
+                                "tag": fr_row[3],
+                                "tag_description": fr_row[4],
+                                "tag_priority": fr_row[5],
+                                "title": info[1],
+                                "source_id": info[2],
+                                "source_name": info[3] or info[2],
+                                "url": info[4] or "",
+                                "mobile_url": "",
+                                "rank": 0,
+                                "ranks": [],
+                                "first_time": info[5] or "",
+                                "last_time": info[5] or "",
+                                "count": 1,
+                            })
+            except Exception:
+                pass  # RSS 库不存在时静默跳过
+
+            return results
+        except Exception as e:
+            print(f"[AI筛选] 获取分类结果失败: {e}")
+            return []
+
+    def _get_all_news_ids_impl(self, date: Optional[str] = None) -> List[Dict]:
+        """获取当日所有新闻的 id 和标题(用于 AI 筛选分类)"""
+        try:
+            conn = self._get_connection(date)
+            cursor = conn.cursor()
+
+            cursor.execute("""
+                SELECT n.id, n.title, n.platform_id, p.name as platform_name
+                FROM news_items n
+                LEFT JOIN platforms p ON n.platform_id = p.id
+                ORDER BY n.id
+            """)
+
+            return [
+                {
+                    "id": row[0], "title": row[1],
+                    "source_id": row[2], "source_name": row[3] or row[2],
+                }
+                for row in cursor.fetchall()
+            ]
+        except Exception as e:
+            print(f"[AI筛选] 获取新闻列表失败: {e}")
+            return []
+
+    def _get_all_rss_ids_impl(self, date: Optional[str] = None) -> List[Dict]:
+        """获取当日所有 RSS 条目的 id 和标题(用于 AI 筛选分类)"""
+        try:
+            conn = self._get_connection(date, db_type="rss")
+            cursor = conn.cursor()
+
+            cursor.execute("""
+                SELECT i.id, i.title, i.feed_id, f.name as feed_name, i.published_at
+                FROM rss_items i
+                LEFT JOIN rss_feeds f ON i.feed_id = f.id
+                ORDER BY i.id
+            """)
+
+            return [
+                {
+                    "id": row[0], "title": row[1],
+                    "source_id": row[2], "source_name": row[3] or row[2],
+                    "published_at": row[4] or "",
+                }
+                for row in cursor.fetchall()
+            ]
+        except Exception as e:
+            print(f"[AI筛选] 获取 RSS 列表失败: {e}")
+            return []

+ 1 - 1
version

@@ -1 +1 @@
-6.0.0
+6.5.0

+ 4 - 3
version_configs

@@ -1,5 +1,6 @@
-config.yaml=2.0.0
-timeline.yaml=1.0.0
+config.yaml=2.2.0
+timeline.yaml=1.2.0
 frequency_words.txt=1.1.0
+ai_interests.txt=1.0.0
 ai_analysis_prompt.txt=2.0.0
-ai_translation_prompt.txt=1.1.0
+ai_translation_prompt.txt=1.2.0