Procházet zdrojové kódy

v4.7.0: 为 frequency_words.txt 新增更精确的新闻匹配语法,修复若干问题

sansan před 4 měsíci
rodič
revize
10c87f4111

+ 117 - 4
README-EN.md

@@ -13,8 +13,8 @@
 [![GitHub Stars](https://img.shields.io/github/stars/sansan0/TrendRadar?style=flat-square&logo=github&color=yellow)](https://github.com/sansan0/TrendRadar/stargazers)
 [![GitHub Forks](https://img.shields.io/github/forks/sansan0/TrendRadar?style=flat-square&logo=github&color=blue)](https://github.com/sansan0/TrendRadar/network/members)
 [![License](https://img.shields.io/badge/license-GPL--3.0-blue.svg?style=flat-square)](LICENSE)
-[![Version](https://img.shields.io/badge/version-v4.6.0-blue.svg)](https://github.com/sansan0/TrendRadar)
-[![MCP](https://img.shields.io/badge/MCP-v2.0.0-green.svg)](https://github.com/sansan0/TrendRadar)
+[![Version](https://img.shields.io/badge/version-v4.7.0-blue.svg)](https://github.com/sansan0/TrendRadar)
+[![MCP](https://img.shields.io/badge/MCP-v2.0.1-green.svg)](https://github.com/sansan0/TrendRadar)
 [![RSS](https://img.shields.io/badge/RSS-Feed_Support-orange.svg?style=flat-square&logo=rss&logoColor=white)](https://github.com/sansan0/TrendRadar)
 
 [![WeWork](https://img.shields.io/badge/WeWork-Notification-00D4AA?style=flat-square)](https://work.weixin.qq.com/)
@@ -135,6 +135,14 @@ After communication, the author indicated no concerns about server pressure, but
 >**📌 Check Latest Updates**: **[Original Repository Changelog](https://github.com/sansan0/TrendRadar?tab=readme-ov-file#-changelog)**:
 - **Tip**: Check [Changelog] to understand specific [Features]
 
+### 2026/01/02 - v4.7.0
+
+- **Fix RSS HTML Display**: Fixed RSS data format mismatch causing rendering issues, now displays correctly grouped by keyword
+- **New Regex Syntax**: Keyword config supports `/pattern/` regex syntax, solves English substring mismatch issues (e.g., `ai` matching `training`) [📖 View Syntax Details](#keyword-basic-syntax)
+- **New Display Name Syntax**: Use `=> alias` to give complex regex a friendly name, cleaner push notifications (e.g., `/\bai\b/ => AI Related`)
+- **Can't Write Regex?** README now includes AI prompt guide - just tell ChatGPT/Claude/DeepSeek what you want to match
+
+
 ### 2026/01/01 - v4.6.0
 
 - **Fix RSS HTML Display**: Merged RSS content into trending HTML page, grouped by source
@@ -740,12 +748,14 @@ rss:
 
 Set personal keywords (e.g., AI, BYD, Education Policy) to receive only relevant trending news, filtering out noise.
 
-**Basic Syntax** (5 types):
+**Basic Syntax** (7 types):
 - Normal words: Basic matching
 - Required words `+`: Narrow scope
 - Filter words `!`: Exclude noise
 - Count limit `@`: Control display count (v3.2.0 new)
 - Global filter `[GLOBAL_FILTER]`: Globally exclude specified content (v3.5.0 new)
+- Regex `/pattern/`: Precise pattern matching (v4.7.0 new)
+- Display name `=> alias`: Custom display text (v4.7.0 new)
 
 **Advanced Features** (v3.2.0 new):
 - 🔢 **Keyword Sorting Control**: Sort by popularity or config order
@@ -1190,6 +1200,7 @@ Method 1 discovered and suggested by **ziventian**, thanks to them. Default is p
 | **189 Mail** | 189.cn | smtp.189.cn | 465 | SSL |
 | **Aliyun Mail** | aliyun.com | smtp.aliyun.com | 465 | TLS |
 | **Yandex Mail** | yandex.com | smtp.yandex.com | 465 | TLS |
+| **iCloud Mail** | icloud.com | smtp.mail.me.com | 587 | SSL |
 
 > **Auto-detect**: When using above emails, no need to manually configure `EMAIL_SMTP_SERVER` and `EMAIL_SMTP_PORT`, system auto-detects.
 >
@@ -1201,6 +1212,7 @@ Method 1 discovered and suggested by **ziventian**, thanks to them. Default is p
 > - Thanks to [@DYZYD](https://github.com/DYZYD) for contributing 189 Mail (189.cn) configuration and completing self-send-receive testing ([#291](https://github.com/sansan0/TrendRadar/issues/291))
 > - Thanks to [@longzhenren](https://github.com/longzhenren) for contributing Aliyun Mail (aliyun.com) configuration and completing testing ([#344](https://github.com/sansan0/TrendRadar/issues/344))
 > - Thanks to [@ACANX](https://github.com/ACANX) for contributing Yandex Mail (yandex.com) configuration and completing testing ([#663](https://github.com/sansan0/TrendRadar/issues/663))
+> - Thanks to [@Sleepy-Tianhao](https://github.com/Sleepy-Tianhao) for contributing iCloud Mail (icloud.com) configuration and completing testing ([#728](https://github.com/sansan0/TrendRadar/issues/728))
 
 **Common Email Settings:**
 
@@ -1697,7 +1709,7 @@ platforms:
 
 **Configuration Location:** `config/frequency_words.txt`
 
-Configure monitoring keywords in `frequency_words.txt` with five syntax types, region markers, and grouping features.
+Configure monitoring keywords in `frequency_words.txt` with seven syntax types, region markers, and grouping features.
 
 | Syntax Type | Symbol | Purpose | Example | Matching Logic |
 |------------|--------|---------|---------|----------------|
@@ -1706,6 +1718,8 @@ Configure monitoring keywords in `frequency_words.txt` with five syntax types, r
 | **Filter** | `!` | Noise exclusion | `!ad` | Exclude if included |
 | **Count Limit** | `@` | Control display count | `@10` | Max 10 news (v3.2.0 new) |
 | **Global Filter** | `[GLOBAL_FILTER]` | Globally exclude content | See example below | Filter under any circumstances (v3.5.0 new) |
+| **Regex** | `/pattern/` | Precise matching | `/\bai\b/` | Match using regex (v4.7.0 new) |
+| **Display Name** | `=> alias` | Custom display text | `/\bai\b/ => AI Related` | Show alias in push/HTML (v4.7.0 new) |
 
 #### 2.1 Basic Syntax
 
@@ -1799,6 +1813,105 @@ AI
 - Recommended to keep global filter words under 5-15
 - For group-specific filtering, prioritize using group filter words (`!` prefix)
 
+##### 6. **Regex** `/pattern/` - Precise Matching (v4.7.0 new)
+
+Normal keywords use substring matching, which is convenient for Chinese but may cause false matches in English. For example, `ai` would match the `ai` in `training`.
+
+Use regex syntax `/pattern/` to achieve precise matching:
+
+```txt
+/(?<![a-z])ai(?![a-z])/
+artificial intelligence
+```
+
+**Effect:** Match using regular expressions, supports all Python regex syntax
+
+**Common Regex Patterns:**
+
+| Need | Regex | Description |
+|------|-------|-------------|
+| Word boundary | `/\bword\b/` | Match standalone word, e.g., `/\bai\b/` matches "AI" but not "training" |
+| Non-letter boundary | `/(?<![a-z])ai(?![a-z])/` | Looser boundary, suitable for mixed Chinese-English |
+| Start match | `/^breaking/` | Only match titles starting with "breaking" |
+| End match | `/release$/` | Only match titles ending with "release" |
+| Multiple options | `/apple\|huawei\|xiaomi/` | Match any one (note escaped `\|`) |
+
+**Matching Examples:**
+```txt
+# Config
+/(?<![a-z])ai(?![a-z])/
+artificial intelligence
+```
+
+- ✅ "AI is the future" ← Matches standalone "AI"
+- ✅ "Hello ai here" ← Non-letter boundaries, matches "ai"
+- ✅ "Artificial intelligence grows rapidly" ← Matches "artificial intelligence"
+- ❌ "Resistance training is important" ← "ai" in "training" doesn't match
+- ❌ "The maid cleaned the room" ← "ai" in "maid" doesn't match
+
+**Combined Usage:**
+```txt
+# Regex + Normal + Filter
+/\bai\b/
+artificial intelligence
+machine learning
+!advertisement
+```
+
+**Notes:**
+- Regex automatically enables case-insensitive matching (`re.IGNORECASE`)
+- Supports JavaScript-style `/pattern/i` syntax (flags are ignored since case-insensitive is always enabled)
+- Invalid regex syntax will be treated as normal words
+- Regex can be used for normal words, required words(`+`), and filter words(`!`)
+
+**💡 Can't Write Regex? Let AI Help!**
+
+If you're not familiar with regular expressions, just ask ChatGPT / Claude / DeepSeek to generate one:
+
+> I need a Python regex to match the word "ai" but not match "ai" in "training".
+> Please give me the regex in `/pattern/` format without extra explanation.
+
+AI will give you something like: `/(?<![a-zA-Z])ai(?![a-zA-Z])/`
+
+##### 7. **Display Name** `=> alias` - Custom Display Text (v4.7.0 new)
+
+Regex patterns can look unfriendly in push notifications and HTML pages. Use `=> alias` syntax to set a display name:
+
+```txt
+/(?<![a-zA-Z])ai(?![a-zA-Z])/ => AI Related
+artificial intelligence
+```
+
+**Effect:** Push notifications and HTML pages show "AI Related" instead of the complex regex
+
+**Syntax Format:**
+```txt
+# Regex + Display Name
+/pattern/ => Display Name
+/pattern/i => Display Name    # Supports flags syntax (flags are ignored)
+/pattern/=>Display Name       # Spaces around => are optional
+
+# Normal Word + Display Name
+deepseek => DeepSeek News
+```
+
+**Example:**
+```txt
+# Config
+/(?<![a-zA-Z])ai(?![a-zA-Z])/ => AI Related
+artificial intelligence
+```
+
+| Original Config | Push/HTML Display |
+|----------------|-------------------|
+| `/(?<![a-z])ai(?![a-z])/` + `artificial intelligence` | `(?<![a-z])ai(?![a-z]) artificial intelligence` |
+| `/(?<![a-z])ai(?![a-z])/ => AI Related` + `artificial intelligence` | **`AI Related`** |
+
+**Notes:**
+- Display name only needs to be set on the first word of a group
+- If multiple words have display names, the first one is used
+- Without display name, all words in the group are concatenated
+
 ---
 
 #### 🔗 Group Feature - Importance of Empty Lines

+ 130 - 15
README.md

@@ -13,8 +13,8 @@
 [![GitHub Stars](https://img.shields.io/github/stars/sansan0/TrendRadar?style=flat-square&logo=github&color=yellow)](https://github.com/sansan0/TrendRadar/stargazers)
 [![GitHub Forks](https://img.shields.io/github/forks/sansan0/TrendRadar?style=flat-square&logo=github&color=blue)](https://github.com/sansan0/TrendRadar/network/members)
 [![License](https://img.shields.io/badge/license-GPL--3.0-blue.svg?style=flat-square)](LICENSE)
-[![Version](https://img.shields.io/badge/version-v4.6.0-blue.svg)](https://github.com/sansan0/TrendRadar)
-[![MCP](https://img.shields.io/badge/MCP-v2.0.0-green.svg)](https://github.com/sansan0/TrendRadar)
+[![Version](https://img.shields.io/badge/version-v4.7.0-blue.svg)](https://github.com/sansan0/TrendRadar)
+[![MCP](https://img.shields.io/badge/MCP-v2.0.1-green.svg)](https://github.com/sansan0/TrendRadar)
 [![RSS](https://img.shields.io/badge/RSS-订阅源支持-orange.svg?style=flat-square&logo=rss&logoColor=white)](https://github.com/sansan0/TrendRadar)
 
 [![企业微信通知](https://img.shields.io/badge/企业微信-通知-00D4AA?style=flat-square)](https://work.weixin.qq.com/)
@@ -184,6 +184,14 @@
 > **📌 查看最新更新**:**[原仓库更新日志](https://github.com/sansan0/TrendRadar?tab=readme-ov-file#-更新日志)** :
 - **提示**:建议查看【历史更新】,明确具体的【功能内容】
 
+### 2026/01/02 - v4.7.0
+
+- **修复 RSS HTML 显示**:修复 RSS 数据格式不匹配导致的渲染问题,现在按关键词分组正确显示
+- **新增正则表达式语法**:关键词配置支持 `/pattern/` 正则语法,解决英文子字符串误匹配问题(如 `ai` 匹配 `training`)[📖 查看语法详解](#关键词基础语法)
+- **新增显示名称语法**:使用 `=> 备注` 给复杂的正则表达式起个好记的名字,推送消息显示更清晰(如 `/\bai\b/ => AI相关`)
+- **不会写正则?** README 新增 AI 生成正则的引导,告诉 ChatGPT/Claude/DeepSeek 你想匹配什么,让 AI 帮你写
+
+
 ### 2026/01/01 - v4.6.0
 
 - **修复 RSS HTML 显示**:将 RSS 内容合并到热榜 HTML 页面,按源分组显示
@@ -777,12 +785,14 @@ rss:
 
 设置个人关键词(如:AI、比亚迪、教育政策),只推送相关热点,过滤无关信息
 
-**基础语法**(5种):
+**基础语法**(7种):
 - 普通词:基础匹配
 - 必须词 `+`:限定范围
 - 过滤词 `!`:排除干扰
 - 数量限制 `@`:控制显示数量(v3.2.0 新增)
 - 全局过滤 `[GLOBAL_FILTER]`:全局排除指定内容(v3.5.0 新增)
+- 正则表达式 `/pattern/`:精确匹配模式(v4.7.0 新增)
+- 显示名称 `=> 备注`:自定义显示文本(v4.7.0 新增)
 
 **高级功能**(v3.2.0 新增):
 - 🔢 **关键词排序控制**:按热度优先 or 配置顺序优先
@@ -1234,6 +1244,7 @@ GitHub 一键 Fork 即可使用,无需编程基础。
    | **天翼邮箱** | 189.cn | smtp.189.cn | 465 | SSL |
    | **阿里云邮箱** | aliyun.com | smtp.aliyun.com | 465 | TLS |
    | **Yandex邮箱** | yandex.com | smtp.yandex.com | 465 | TLS |
+   | **iCloud邮箱** | icloud.com | smtp.mail.me.com | 587 | SSL |
 
    > **自动识别**:使用以上邮箱时,无需手动配置 `EMAIL_SMTP_SERVER` 和 `EMAIL_SMTP_PORT`,系统会自动识别。
    >
@@ -1245,6 +1256,7 @@ GitHub 一键 Fork 即可使用,无需编程基础。
    > - 感谢 [@DYZYD](https://github.com/DYZYD) 贡献天翼邮箱(189.cn)配置并完成自发自收测试 ([#291](https://github.com/sansan0/TrendRadar/issues/291))
    > - 感谢 [@longzhenren](https://github.com/longzhenren) 贡献阿里云邮箱(aliyun.com)配置并完成测试 ([#344](https://github.com/sansan0/TrendRadar/issues/344))
    > - 感谢 [@ACANX](https://github.com/ACANX) 贡献 Yandex 邮箱(yandex.com)配置并完成测试 ([#663](https://github.com/sansan0/TrendRadar/issues/663))
+   > - 感谢 [@Sleepy-Tianhao](https://github.com/Sleepy-Tianhao) 贡献 iCloud 邮箱(icloud.com)配置并完成测试 ([#728](https://github.com/sansan0/TrendRadar/issues/728))
 
    **常见邮箱设置:**
 
@@ -1750,7 +1762,7 @@ platforms:
 
 ### 2. 关键词配置
 
-在 `frequency_words.txt` 文件中配置监控的关键词,支持种语法、区域标记和词组功能。
+在 `frequency_words.txt` 文件中配置监控的关键词,支持种语法、区域标记和词组功能。
 
 | 语法类型 | 符号 | 作用 | 示例 | 匹配逻辑 |
 |---------|------|------|------|---------|
@@ -1759,6 +1771,8 @@ platforms:
 | **过滤词** | `!` | 排除干扰 | `!广告` | 包含则直接排除 |
 | **数量限制** | `@` | 控制显示数量 | `@10` | 最多显示10条新闻(v3.2.0新增) |
 | **全局过滤** | `[GLOBAL_FILTER]` | 全局排除指定内容 | 见下方示例 | 任何情况下都过滤(v3.5.0新增) |
+| **正则表达式** | `/pattern/` | 精确匹配模式 | `/\bai\b/` | 使用正则表达式匹配(v4.7.0新增) |
+| **显示名称** | `=> 备注` | 自定义显示文本 | `/\bai\b/ => AI相关` | 推送和HTML显示备注名称(v4.7.0新增) |
 
 #### 2.1 基础语法
 
@@ -1854,6 +1868,105 @@ AI
 - 建议全局过滤词控制在 5-15 个以内
 - 对于特定词组的过滤,优先使用词组内过滤词(`!` 前缀)
 
+##### 6. **正则表达式** `/pattern/` - 精确匹配模式(v4.7.0 新增)
+
+普通关键词使用子字符串匹配,这在中文环境下很方便,但在英文环境可能会产生误匹配。例如 `ai` 会匹配到 `training` 中的 `ai`。
+
+使用正则表达式语法 `/pattern/` 可以实现精确匹配:
+
+```txt
+/(?<![a-z])ai(?![a-z])/
+人工智能
+```
+
+**作用:** 使用正则表达式进行匹配,支持所有 Python 正则语法
+
+**常用正则模式:**
+
+| 需求 | 正则写法 | 说明 |
+|------|---------|------|
+| 英文单词边界 | `/\bword\b/` | 匹配独立单词,如 `/\bai\b/` 匹配 "AI" 但不匹配 "training" |
+| 前后非字母 | `/(?<![a-z])ai(?![a-z])/` | 更宽松的边界,适合中英混合场景 |
+| 开头匹配 | `/^breaking/` | 只匹配以 "breaking" 开头的标题 |
+| 结尾匹配 | `/发布$/` | 只匹配以 "发布" 结尾的标题 |
+| 多选一 | `/苹果\|华为\|小米/` | 匹配其中任意一个(注意转义 `\|`) |
+
+**匹配示例:**
+```txt
+# 配置
+/(?<![a-z])ai(?![a-z])/
+人工智能
+```
+
+- ✅ "AI is the future" ← 匹配独立的 "AI"
+- ✅ "你好ai这里" ← 前后是中文,匹配 "ai"
+- ✅ "人工智能发展迅速" ← 匹配 "人工智能"
+- ❌ "Resistance training is important" ← "training" 中的 "ai" 不匹配
+- ❌ "The maid cleaned the room" ← "maid" 中的 "ai" 不匹配
+
+**组合使用:**
+```txt
+# 正则 + 普通词 + 过滤词
+/\bai\b/
+人工智能
+机器学习
+!广告
+```
+
+**注意事项:**
+- 正则表达式自动启用大小写不敏感匹配(`re.IGNORECASE`)
+- 支持 `/pattern/i` 等 JavaScript 风格写法(flags 会被忽略,因为默认已启用忽略大小写)
+- 无效的正则语法会被当作普通词处理
+- 正则可用于普通词、必须词(`+`)、过滤词(`!`)
+
+**💡 不会写正则?让 AI 帮你生成!**
+
+如果你不熟悉正则表达式,可以直接让 ChatGPT / Claude / DeepSeek 帮你生成。只需告诉 AI:
+
+> 我需要一个 Python 正则表达式,用于匹配英文单词 "ai",但不匹配 "training" 中的 "ai"。
+> 请直接给出正则表达式,格式为 `/pattern/`,不需要额外解释。
+
+AI 会给你类似这样的结果:`/(?<![a-zA-Z])ai(?![a-zA-Z])/`
+
+##### 7. **显示名称** `=> 备注` - 自定义显示文本(v4.7.0 新增)
+
+正则表达式在推送消息和 HTML 页面显示时可能不太友好。使用 `=> 备注` 语法可以设置显示名称:
+
+```txt
+/(?<![a-zA-Z])ai(?![a-zA-Z])/ => AI 相关
+人工智能
+```
+
+**作用:** 推送消息和 HTML 页面显示 "AI 相关" 而不是复杂的正则表达式
+
+**语法格式:**
+```txt
+# 正则 + 显示名称
+/pattern/ => 显示名称
+/pattern/i => 显示名称    # 支持 flags 写法(flags 被忽略)
+/pattern/=>显示名称       # => 两边空格可选
+
+# 普通词 + 显示名称
+deepseek => DeepSeek 动态
+```
+
+**匹配示例:**
+```txt
+# 配置
+/(?<![a-zA-Z])ai(?![a-zA-Z])/ => AI 相关
+人工智能
+```
+
+| 原始配置 | 推送/HTML 显示 |
+|---------|---------------|
+| `/(?<![a-z])ai(?![a-z])/` + `人工智能` | `(?<![a-z])ai(?![a-z]) 人工智能` |
+| `/(?<![a-z])ai(?![a-z])/ => AI 相关` + `人工智能` | **`AI 相关`** |
+
+**注意事项:**
+- 显示名称只需写在词组的第一个词上
+- 如果词组中多个词都有显示名称,使用第一个
+- 不设置显示名称时,自动使用词组内所有词拼接
+
 ---
 
 #### 🔗 词组功能 - 空行分隔的重要作用
@@ -2261,31 +2374,33 @@ TrendRadar 提供两个独立的 Docker 镜像,可根据需求选择部署:
 ```
 
 2. **配置文件说明**:
-   - `config/config.yaml` - 应用主配置(报告模式、推送设置等)
-   - `config/frequency_words.txt` - 关键词配置(设置你关心的热点词汇)
-   - `.env` - 环境变量配置(webhook URLs 和定时任务)
+
+   **配置分工原则(v4.6.0 优化)**:
+   - `config/config.yaml` - **功能配置**(报告模式、推送设置、存储格式、推送窗口等)
+   - `config/frequency_words.txt` - **关键词配置**(设置你关心的热点词汇)
+   - `docker/.env` - **敏感信息 + Docker 特有配置**(webhook URLs、S3 密钥、定时任务)
+
+   > 💡 **配置修改生效**:修改 `config.yaml` 后,执行 `docker compose up -d` 重启容器即可生效
 
    **⚙️ 环境变量覆盖机制(v3.0.5+)**
 
-   如果你在 NAS 或其他 Docker 环境中遇到**修改 `config.yaml` 后配置不生效**的问题,可以通过环境变量直接覆盖配置:
+   `.env` 文件中的环境变量会覆盖 `config.yaml` 中的对应配置:
 
    | 环境变量 | 对应配置 | 示例值 | 说明 |
    |---------|---------|-------|------|
    | `ENABLE_CRAWLER` | `advanced.crawler.enabled` | `true` / `false` | 是否启用爬虫 |
    | `ENABLE_NOTIFICATION` | `notification.enabled` | `true` / `false` | 是否启用通知 |
    | `REPORT_MODE` | `report.mode` | `daily` / `incremental` / `current`| 报告模式 |
-   | `MAX_ACCOUNTS_PER_CHANNEL` | `advanced.max_accounts_per_channel` | `3` | 每个渠道最大账号数 |
-   | `PUSH_WINDOW_ENABLED` | `notification.push_window.enabled` | `true` / `false` | 推送时间窗口开关 |
-   | `PUSH_WINDOW_START` | `notification.push_window.start` | `08:00` | 推送开始时间 |
-   | `PUSH_WINDOW_END` | `notification.push_window.end` | `22:00` | 推送结束时间 |
+   | `DISPLAY_MODE` | `report.display_mode` | `keyword` / `platform` | 显示模式 |
    | `ENABLE_WEBSERVER` | - | `true` / `false` | 是否自动启动 Web 服务器 |
-   | `WEBSERVER_PORT` | - | `8080` | Web 服务器端口(默认 8080) |
-   | `FEISHU_WEBHOOK_URL` | `notification.channels.feishu.webhook_url` | `https://...` | 飞书 Webhook(支持多账号,用 `;` 分隔) |
+   | `WEBSERVER_PORT` | - | `8080` | Web 服务器端口 |
+   | `FEISHU_WEBHOOK_URL` | `notification.channels.feishu.webhook_url` | `https://...` | 飞书 Webhook(多账号用 `;` 分隔) |
+   | `S3_*` | `storage.remote.*` | - | 远程存储配置(5 个参数) |
 
    **配置优先级**:环境变量 > config.yaml
 
    **使用方法**:
-   - 修改 `.env` 文件,取消注释并填写需要的配置
+   - 修改 `.env` 文件,填写需要的配置
    - 或在 NAS/群晖 Docker 管理界面的"环境变量"中直接添加
    - 重启容器后生效:`docker compose up -d`
 

+ 6 - 1
config/config.yaml

@@ -89,6 +89,11 @@ rss:
       name: "阮一峰的网络日志"
       url: "http://www.ruanyifeng.com/blog/atom.xml"
       # max_age_days: 7               # 示例:推送7天内的文章(更新较慢的博客)
+    
+    - id: "yahoo-finance"
+      name: "雅虎财经"
+      url: "https://finance.yahoo.com/news/rssindex"
+      enabled: false                  # 禁用
 
     # 自定义源示例
     # - id: "custom-feed"
@@ -212,7 +217,7 @@ storage:
   formats:
     sqlite: true                      # 主存储(必须启用)
     txt: false                        # 是否生成 TXT 快照
-    html: false                       # 是否生成 HTML 报告(⚠️ 邮件推送必须设为 true)
+    html: true                       # 是否生成 HTML 报告(⚠️ 邮件推送必须设为 true)
 
   # 本地存储配置
   local:

+ 23 - 63
config/frequency_words.txt

@@ -1,85 +1,50 @@
-胖东来
-于东来
+/胖东来|于东来/ => 胖东来
 
-DeepSeek
-梁文锋
+/DeepSeek|梁文锋/i => DeepSeek
 
-华为
-鸿蒙
-HarmonyOS
-任正非
+/华为|鸿蒙|HarmonyOS|任正非/i => 华为
 
-比亚迪
-王传福
+/比亚迪|王传福/ => 比亚迪
 
-大疆
-DJI
+/大疆|\bDJI\b/i => 大疆
 
-宇树
-王兴兴
+/宇树|王兴兴/ => 宇树机器人
 
-智元
-灵犀
-稚晖君
-彭志辉
+/智元|灵犀|稚晖君|彭志辉/ => 智元机器人
 
-黑神话
-冯骥
+/黑神话|冯骥/ => 黑神话悟空
 
-影之刃零
-梁其伟
+/影之刃零|梁其伟/ => 影之刃零
 
-哪吒
-饺子
-杨宇
+/哪吒|杨宇/ => 哪吒电影
 !车
 !餐
 
-三体
-流浪地球
-刘慈欣
-郭帆
+/三体|流浪地球|刘慈欣|郭帆/ => 三体/流浪地球
 
 申奥
 
-京东
-刘强东
+/京东|刘强东/ => 京东
 
-字节
-bytedance
-张一鸣
+/字节|bytedance|张一鸣/i => 字节跳动
 
-特斯拉
-马斯克
+/特斯拉|马斯克/ => 特斯拉
 
-微软
-Microsoft
+/微软|\bMicrosoft\b/i => 微软
 
-英伟达
-NVIDIA
-黄仁勋
+/英伟达|\bNVIDIA\b|黄仁勋/i => 英伟达
 
-AMD
+/\bAMD\b/i => AMD
 
-谷歌
-google
-gemini
-deepmind
+/谷歌|\bgoogle\b|\bgemini\b|\bdeepMind\b/i => 谷歌
 
-chatgpt
-openai
-sora
+/\bchatgpt\b|\bopenai\b|\bsora\b/i => OpenAI
 
-claude
-Anthropic
+/\bclaude\b|Anthropic/i => Claude
 
-iphone
-ipad
-mac
-ios
+/\biphone\b|\bipad\b|\bmac\b|\bios\b/i => 苹果产品
 
-ai
-!gai
+/(?<![a-zA-Z])ai(?![a-zA-Z])/i => AI 相关
 人工智能
 
 自动驾驶
@@ -105,9 +70,4 @@ ai
 
 新质生产力
 
-月球
-登月
-火星
-宇宙
-飞船
-航空
+/月球|登月|火星|宇宙|飞船|航空/ => 航天航空

+ 6 - 52
docker/.env

@@ -6,14 +6,10 @@
 ENABLE_CRAWLER=
 # 是否启用通知 (true/false)
 ENABLE_NOTIFICATION=
-# 报告模式(daily|incremental|current)
+# 报告模式 (daily|incremental|current)
 REPORT_MODE=
-# 排序优先级 (true=先按配置位置排序,false=先按热点条数排序)
-SORT_BY_POSITION_FIRST=
-# 每个关键词最大显示数量 (0=不限制,>0=限制数量)
-MAX_NEWS_PER_KEYWORD=
-# 内容顺序:false=热点词汇统计在前,true=新增热点新闻在前
-REVERSE_CONTENT_ORDER=
+# 显示模式 (keyword|platform)
+DISPLAY_MODE=
 
 # ============================================
 # Web 服务器配置
@@ -28,26 +24,6 @@ ENABLE_WEBSERVER=false
 # 注意:修改后需要重启容器生效
 WEBSERVER_PORT=8080
 
-# ============================================
-# 推送时间窗口配置
-# ============================================
-
-# 是否启用推送时间窗口 (true/false)
-PUSH_WINDOW_ENABLED=
-# 推送开始时间 (HH:MM 格式,如 08:00)
-PUSH_WINDOW_START=
-# 推送结束时间 (HH:MM 格式,如 22:00)
-PUSH_WINDOW_END=
-# 每天只推送一次 (true/false)
-PUSH_WINDOW_ONCE_PER_DAY=
-
-# ============================================
-# 多账号配置
-# ============================================
-
-# 每个渠道最大账号数量(建议不超过 3,避免fork用户触发账号风险)
-MAX_ACCOUNTS_PER_CHANNEL=
-
 # ============================================
 # 通知渠道配置(多账号用 ; 分隔)
 # ============================================
@@ -73,6 +49,7 @@ EMAIL_SMTP_SERVER=
 EMAIL_SMTP_PORT=
 
 # ntfy 推送配置(多账号用 ; 分隔,topic 和 token 数量需一致)
+# ntfy 服务器地址(可改为自托管)
 NTFY_SERVER_URL=https://ntfy.sh
 # ntfy主题名称(多账号用 ; 分隔)
 NTFY_TOPIC=
@@ -86,38 +63,15 @@ BARK_URL=
 SLACK_WEBHOOK_URL=
 
 # ============================================
-# 存储配置
+# 远程存储配置(S3 兼容协议,支持 R2/OSS/COS/S3 等)
 # ============================================
 
-# 存储后端选择 (local/remote/auto)
-# - local: 本地 SQLite + TXT/HTML 文件
-# - remote: 远程云存储(S3 兼容协议)
-# - auto: 自动选择(GitHub Actions 用 remote,其他用 local)
-STORAGE_BACKEND=auto
-
-# 本地数据保留天数(0 = 无限制,不清理历史数据)
-LOCAL_RETENTION_DAYS=0
-
-# 远程数据保留天数(0 = 无限制,不清理历史数据)
-REMOTE_RETENTION_DAYS=0
-
-# 是否生成 TXT 快照 (true/false)
-STORAGE_TXT_ENABLED=
-
-# 是否生成 HTML 报告 (true/false)
-STORAGE_HTML_ENABLED=
-
-# 远程存储配置(S3 兼容协议,支持 R2/OSS/COS/S3 等)
 S3_ENDPOINT_URL=
 S3_BUCKET_NAME=
 S3_ACCESS_KEY_ID=
 S3_SECRET_ACCESS_KEY=
 S3_REGION=
 
-# 数据拉取配置(从远程同步到本地)
-PULL_ENABLED=false
-PULL_DAYS=7
-
 # ============================================
 # 运行配置
 # ============================================
@@ -127,4 +81,4 @@ CRON_SCHEDULE=*/30 * * * *
 # 运行模式:cron/once
 RUN_MODE=cron
 # 启动时立即执行一次
-IMMEDIATE_RUN=true
+IMMEDIATE_RUN=true

+ 2 - 20
docker/docker-compose-build.yml

@@ -19,19 +19,10 @@ services:
       - ENABLE_CRAWLER=${ENABLE_CRAWLER:-}
       - ENABLE_NOTIFICATION=${ENABLE_NOTIFICATION:-}
       - REPORT_MODE=${REPORT_MODE:-}
-      - SORT_BY_POSITION_FIRST=${SORT_BY_POSITION_FIRST:-}
-      - MAX_NEWS_PER_KEYWORD=${MAX_NEWS_PER_KEYWORD:-}
-      - REVERSE_CONTENT_ORDER=${REVERSE_CONTENT_ORDER:-}
+      - DISPLAY_MODE=${DISPLAY_MODE:-}
       # Web 服务器
       - ENABLE_WEBSERVER=${ENABLE_WEBSERVER:-false}
       - WEBSERVER_PORT=${WEBSERVER_PORT:-8080}
-      # 多账号配置
-      - MAX_ACCOUNTS_PER_CHANNEL=${MAX_ACCOUNTS_PER_CHANNEL:-}
-      # 推送时间窗口
-      - PUSH_WINDOW_ENABLED=${PUSH_WINDOW_ENABLED:-}
-      - PUSH_WINDOW_START=${PUSH_WINDOW_START:-}
-      - PUSH_WINDOW_END=${PUSH_WINDOW_END:-}
-      - PUSH_WINDOW_ONCE_PER_DAY=${PUSH_WINDOW_ONCE_PER_DAY:-}
       # 通知渠道
       - FEISHU_WEBHOOK_URL=${FEISHU_WEBHOOK_URL:-}
       - TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN:-}
@@ -53,23 +44,14 @@ services:
       - BARK_URL=${BARK_URL:-}
       # Slack配置
       - SLACK_WEBHOOK_URL=${SLACK_WEBHOOK_URL:-}
-      # 存储配置
-      - STORAGE_BACKEND=${STORAGE_BACKEND:-auto}
-      - LOCAL_RETENTION_DAYS=${LOCAL_RETENTION_DAYS:-0}
-      - REMOTE_RETENTION_DAYS=${REMOTE_RETENTION_DAYS:-0}
-      - STORAGE_TXT_ENABLED=${STORAGE_TXT_ENABLED:-true}
-      - STORAGE_HTML_ENABLED=${STORAGE_HTML_ENABLED:-true}
       # 远程存储配置(S3 兼容协议)
       - S3_ENDPOINT_URL=${S3_ENDPOINT_URL:-}
       - S3_BUCKET_NAME=${S3_BUCKET_NAME:-}
       - S3_ACCESS_KEY_ID=${S3_ACCESS_KEY_ID:-}
       - S3_SECRET_ACCESS_KEY=${S3_SECRET_ACCESS_KEY:-}
       - S3_REGION=${S3_REGION:-}
-      # 数据拉取配置
-      - PULL_ENABLED=${PULL_ENABLED:-false}
-      - PULL_DAYS=${PULL_DAYS:-7}
       # 运行模式
-      - CRON_SCHEDULE=${CRON_SCHEDULE:-*/5 * * * *}
+      - CRON_SCHEDULE=${CRON_SCHEDULE:-*/30 * * * *}
       - RUN_MODE=${RUN_MODE:-cron}
       - IMMEDIATE_RUN=${IMMEDIATE_RUN:-true}
 

+ 2 - 20
docker/docker-compose.yml

@@ -17,19 +17,10 @@ services:
       - ENABLE_CRAWLER=${ENABLE_CRAWLER:-}
       - ENABLE_NOTIFICATION=${ENABLE_NOTIFICATION:-}
       - REPORT_MODE=${REPORT_MODE:-}
-      - SORT_BY_POSITION_FIRST=${SORT_BY_POSITION_FIRST:-}
-      - MAX_NEWS_PER_KEYWORD=${MAX_NEWS_PER_KEYWORD:-}
-      - REVERSE_CONTENT_ORDER=${REVERSE_CONTENT_ORDER:-}
+      - DISPLAY_MODE=${DISPLAY_MODE:-}
       # Web 服务器
       - ENABLE_WEBSERVER=${ENABLE_WEBSERVER:-false}
       - WEBSERVER_PORT=${WEBSERVER_PORT:-8080}
-      # 多账号配置
-      - MAX_ACCOUNTS_PER_CHANNEL=${MAX_ACCOUNTS_PER_CHANNEL:-}
-      # 推送时间窗口
-      - PUSH_WINDOW_ENABLED=${PUSH_WINDOW_ENABLED:-}
-      - PUSH_WINDOW_START=${PUSH_WINDOW_START:-}
-      - PUSH_WINDOW_END=${PUSH_WINDOW_END:-}
-      - PUSH_WINDOW_ONCE_PER_DAY=${PUSH_WINDOW_ONCE_PER_DAY:-}
       # 通知渠道
       - FEISHU_WEBHOOK_URL=${FEISHU_WEBHOOK_URL:-}
       - TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN:-}
@@ -51,23 +42,14 @@ services:
       - BARK_URL=${BARK_URL:-}
       # Slack配置
       - SLACK_WEBHOOK_URL=${SLACK_WEBHOOK_URL:-}
-      # 存储配置
-      - STORAGE_BACKEND=${STORAGE_BACKEND:-auto}
-      - LOCAL_RETENTION_DAYS=${LOCAL_RETENTION_DAYS:-0}
-      - REMOTE_RETENTION_DAYS=${REMOTE_RETENTION_DAYS:-0}
-      - STORAGE_TXT_ENABLED=${STORAGE_TXT_ENABLED:-true}
-      - STORAGE_HTML_ENABLED=${STORAGE_HTML_ENABLED:-true}
       # 远程存储配置(S3 兼容协议)
       - S3_ENDPOINT_URL=${S3_ENDPOINT_URL:-}
       - S3_BUCKET_NAME=${S3_BUCKET_NAME:-}
       - S3_ACCESS_KEY_ID=${S3_ACCESS_KEY_ID:-}
       - S3_SECRET_ACCESS_KEY=${S3_SECRET_ACCESS_KEY:-}
       - S3_REGION=${S3_REGION:-}
-      # 数据拉取配置
-      - PULL_ENABLED=${PULL_ENABLED:-false}
-      - PULL_DAYS=${PULL_DAYS:-7}
       # 运行模式
-      - CRON_SCHEDULE=${CRON_SCHEDULE:-*/5 * * * *}
+      - CRON_SCHEDULE=${CRON_SCHEDULE:-*/30 * * * *}
       - RUN_MODE=${RUN_MODE:-cron}
       - IMMEDIATE_RUN=${IMMEDIATE_RUN:-true}
 

+ 14 - 10
docker/manage.py

@@ -275,28 +275,32 @@ def show_config():
     print("⚙️ 当前配置:")
 
     env_vars = [
+        # 运行配置
         "CRON_SCHEDULE",
         "RUN_MODE",
         "IMMEDIATE_RUN",
+        # 核心配置
+        "ENABLE_CRAWLER",
+        "ENABLE_NOTIFICATION",
+        "REPORT_MODE",
+        "DISPLAY_MODE",
+        # 通知渠道
         "FEISHU_WEBHOOK_URL",
         "DINGTALK_WEBHOOK_URL",
         "WEWORK_WEBHOOK_URL",
+        "WEWORK_MSG_TYPE",
         "TELEGRAM_BOT_TOKEN",
         "TELEGRAM_CHAT_ID",
-        "CONFIG_PATH",
-        "FREQUENCY_WORDS_PATH",
-        # 存储配置
-        "STORAGE_BACKEND",
-        "LOCAL_RETENTION_DAYS",
-        "REMOTE_RETENTION_DAYS",
-        "STORAGE_TXT_ENABLED",
-        "STORAGE_HTML_ENABLED",
+        "NTFY_SERVER_URL",
+        "NTFY_TOPIC",
+        "NTFY_TOKEN",
+        "BARK_URL",
+        "SLACK_WEBHOOK_URL",
+        # 远程存储配置
         "S3_BUCKET_NAME",
         "S3_ACCESS_KEY_ID",
         "S3_ENDPOINT_URL",
         "S3_REGION",
-        "PULL_ENABLED",
-        "PULL_DAYS",
     ]
 
     for var in env_vars:

+ 1 - 1
mcp_server/__init__.py

@@ -5,4 +5,4 @@ TrendRadar MCP Server
 
 """
 
-__version__ = "2.0.0"
+__version__ = "2.0.1"

+ 16 - 44
mcp_server/services/parser_service.py

@@ -373,6 +373,13 @@ class ParserService:
         """
         解析关键词配置文件
 
+        复用 trendradar.core.frequency 的解析逻辑,支持:
+        - 空行分隔词组
+        - +前缀必须词、!前缀过滤词、@数量限制
+        - /pattern/ 正则表达式语法
+        - => 备注 显示名称语法
+        - [GLOBAL_FILTER] 全局过滤区域
+
         Args:
             words_file: 关键词文件路径,默认为 config/frequency_words.txt
 
@@ -382,55 +389,20 @@ class ParserService:
         Raises:
             FileParseError: 文件解析错误
         """
+        from trendradar.core.frequency import load_frequency_words
+
         if words_file is None:
-            words_file = self.project_root / "config" / "frequency_words.txt"
+            words_file = str(self.project_root / "config" / "frequency_words.txt")
         else:
-            words_file = Path(words_file)
-
-        if not words_file.exists():
-            return []
-
-        word_groups = []
+            words_file = str(words_file)
 
         try:
-            with open(words_file, "r", encoding="utf-8") as f:
-                for line in f:
-                    line = line.strip()
-                    if not line or line.startswith("#"):
-                        continue
-
-                    parts = [p.strip() for p in line.split("|")]
-                    if not parts:
-                        continue
-
-                    group = {
-                        "required": [],
-                        "normal": [],
-                        "filter_words": []
-                    }
-
-                    for part in parts:
-                        if not part:
-                            continue
-
-                        words = [w.strip() for w in part.split(",")]
-                        for word in words:
-                            if not word:
-                                continue
-                            if word.endswith("+"):
-                                group["required"].append(word[:-1])
-                            elif word.endswith("!"):
-                                group["filter_words"].append(word[:-1])
-                            else:
-                                group["normal"].append(word)
-
-                    if group["required"] or group["normal"]:
-                        word_groups.append(group)
-
+            word_groups, filter_words, global_filters = load_frequency_words(words_file)
+            return word_groups
+        except FileNotFoundError:
+            return []
         except Exception as e:
-            raise FileParseError(str(words_file), str(e))
-
-        return word_groups
+            raise FileParseError(words_file, str(e))
 
     def get_available_dates(self, db_type: str = "news") -> List[str]:
         """

+ 6 - 1
mcp_server/tools/data_query.py

@@ -14,7 +14,8 @@ from ..utils.validators import (
     validate_date_range,
     validate_top_n,
     validate_mode,
-    validate_date_query
+    validate_date_query,
+    normalize_date_range
 )
 from ..utils.errors import MCPError
 
@@ -263,6 +264,10 @@ class DataQueryTools:
             # 参数验证 - 默认今天
             if date_range is None:
                 date_range = "今天"
+
+            # 规范化 date_range(处理 JSON 字符串序列化问题)
+            date_range = normalize_date_range(date_range)
+
             # 处理 date_range:支持字符串或对象
             if isinstance(date_range, dict):
                 # 范围对象,取 start 日期

+ 5 - 2
mcp_server/tools/search_tools.py

@@ -11,7 +11,7 @@ from difflib import SequenceMatcher
 from typing import Dict, List, Optional, Tuple, Union
 
 from ..services.data_service import DataService
-from ..utils.validators import validate_keyword, validate_limit, validate_threshold
+from ..utils.validators import validate_keyword, validate_limit, validate_threshold, normalize_date_range
 from ..utils.errors import MCPError, InvalidParameterError, DataNotFoundError
 
 
@@ -780,7 +780,10 @@ class SearchTools:
 
             # 确定日期范围
             today = datetime.now()
-            
+
+            # 规范化 date_range(处理 JSON 字符串序列化问题)
+            date_range = normalize_date_range(date_range)
+
             if date_range is None or date_range == "today":
                 # 只查询今天
                 search_dates = [today]

+ 47 - 0
mcp_server/utils/validators.py

@@ -295,6 +295,53 @@ def validate_date(date_str: str) -> datetime:
         )
 
 
+def normalize_date_range(date_range: Optional[Union[dict, str]]) -> Optional[Union[dict, str]]:
+    """
+    规范化 date_range 参数
+
+    某些 MCP 客户端(特别是 HTTP 方式)会将 JSON 对象序列化为字符串传入。
+    此函数尝试将 JSON 字符串解析为 dict,如果不是 JSON 格式则保持原样。
+
+    Args:
+        date_range: 日期范围,可能是:
+            - dict: {"start": "2025-01-01", "end": "2025-01-07"}
+            - JSON 字符串: '{"start": "2025-01-01", "end": "2025-01-07"}'
+            - 普通字符串: "今天", "昨天", "2025-01-01"
+            - None
+
+    Returns:
+        规范化后的 date_range(dict 或普通字符串)
+
+    Examples:
+        >>> normalize_date_range('{"start":"2025-01-01","end":"2025-01-07"}')
+        {"start": "2025-01-01", "end": "2025-01-07"}
+        >>> normalize_date_range("今天")
+        "今天"
+        >>> normalize_date_range({"start": "2025-01-01", "end": "2025-01-07"})
+        {"start": "2025-01-01", "end": "2025-01-07"}
+    """
+    if date_range is None:
+        return None
+
+    # 如果已经是 dict,直接返回
+    if isinstance(date_range, dict):
+        return date_range
+
+    # 如果是字符串,尝试解析为 JSON
+    if isinstance(date_range, str):
+        # 检查是否看起来像 JSON 对象
+        stripped = date_range.strip()
+        if stripped.startswith('{') and stripped.endswith('}'):
+            try:
+                parsed = json.loads(stripped)
+                if isinstance(parsed, dict):
+                    return parsed
+            except json.JSONDecodeError:
+                pass  # 解析失败,当作普通字符串处理
+
+    return date_range
+
+
 def validate_date_range(date_range: Optional[Union[dict, str]]) -> Optional[tuple]:
     """
     验证日期范围

+ 1 - 1
trendradar/__init__.py

@@ -9,5 +9,5 @@ TrendRadar - 热点新闻聚合与分析工具
 
 from trendradar.context import AppContext
 
-__version__ = "4.6.0"
+__version__ = "4.7.0"
 __all__ = ["AppContext", "__version__"]

+ 27 - 15
trendradar/core/analyzer.py

@@ -10,7 +10,7 @@
 
 from typing import Dict, List, Tuple, Optional, Callable
 
-from trendradar.core.frequency import matches_word_groups
+from trendradar.core.frequency import matches_word_groups, _word_matches
 
 
 def calculate_news_weight(
@@ -262,19 +262,19 @@ def count_word_frequency(
                     if source_id not in word_stats[group_key]["titles"]:
                         word_stats[group_key]["titles"][source_id] = []
                 else:
-                    # 原有的匹配逻辑
+                    # 原有的匹配逻辑(支持正则语法)
                     if required_words:
                         all_required_present = all(
-                            req_word.lower() in title_lower
-                            for req_word in required_words
+                            _word_matches(req_item, title_lower)
+                            for req_item in required_words
                         )
                         if not all_required_present:
                             continue
 
                     if normal_words:
                         any_normal_present = any(
-                            normal_word.lower() in title_lower
-                            for normal_word in normal_words
+                            _word_matches(normal_item, title_lower)
+                            for normal_item in normal_words
                         )
                         if not any_normal_present:
                             continue
@@ -415,13 +415,16 @@ def count_word_frequency(
                 )
 
     stats = []
-    # 创建 group_key 到位置和最大数量的映射
+    # 创建 group_key 到位置、最大数量、显示名称的映射
     group_key_to_position = {
         group["group_key"]: idx for idx, group in enumerate(word_groups)
     }
     group_key_to_max_count = {
         group["group_key"]: group.get("max_count", 0) for group in word_groups
     }
+    group_key_to_display_name = {
+        group["group_key"]: group.get("display_name") for group in word_groups
+    }
 
     for group_key, data in word_stats.items():
         all_titles = []
@@ -447,9 +450,12 @@ def count_word_frequency(
         if group_max_count > 0:
             sorted_titles = sorted_titles[:group_max_count]
 
+        # 优先使用 display_name,否则使用 group_key
+        display_word = group_key_to_display_name.get(group_key) or group_key
+
         stats.append(
             {
-                "word": group_key,
+                "word": display_word,
                 "count": data["count"],
                 "position": group_key_to_position.get(group_key, 999),
                 "titles": sorted_titles,
@@ -596,20 +602,20 @@ def count_rss_frequency(
             if len(word_groups) == 1 and word_groups[0]["group_key"] == "全部 RSS":
                 matched = True
             else:
-                # 检查必须词
+                # 检查必须词(支持正则语法)
                 if required_words:
                     all_required_present = all(
-                        req_word.lower() in title_lower
-                        for req_word in required_words
+                        _word_matches(req_item, title_lower)
+                        for req_item in required_words
                     )
                     if not all_required_present:
                         continue
 
-                # 检查普通词
+                # 检查普通词(支持正则语法)
                 if normal_words:
                     any_normal_present = any(
-                        normal_word.lower() in title_lower
-                        for normal_word in normal_words
+                        _word_matches(normal_item, title_lower)
+                        for normal_item in normal_words
                     )
                     if not any_normal_present:
                         continue
@@ -651,6 +657,9 @@ def count_rss_frequency(
     group_key_to_max_count = {
         group["group_key"]: group.get("max_count", 0) for group in word_groups
     }
+    group_key_to_display_name = {
+        group["group_key"]: group.get("display_name") for group in word_groups
+    }
 
     for group_key, data in word_stats.items():
         if data["count"] == 0:
@@ -669,8 +678,11 @@ def count_rss_frequency(
         if group_max_count > 0:
             sorted_titles = sorted_titles[:group_max_count]
 
+        # 优先使用 display_name,否则使用 group_key
+        display_word = group_key_to_display_name.get(group_key) or group_key
+
         stats.append({
-            "word": group_key,
+            "word": display_word,
             "count": data["count"],
             "position": group_key_to_position.get(group_key, 999),
             "titles": sorted_titles,

+ 102 - 14
trendradar/core/frequency.py

@@ -8,11 +8,84 @@
 - 过滤词(!前缀)
 - 全局过滤词([GLOBAL_FILTER] 区域)
 - 最大显示数量(@前缀)
+- 正则表达式(/pattern/ 语法)
+- 显示名称(=> 备注 语法)
 """
 
 import os
+import re
 from pathlib import Path
-from typing import Dict, List, Tuple, Optional
+from typing import Dict, List, Tuple, Optional, Union
+
+
+def _parse_word(word: str) -> Dict:
+    """
+    解析单个词,识别是否为正则表达式,支持显示名称
+
+    语法:
+    - 普通词:word
+    - 正则表达式:/pattern/ 或 /pattern/i(flags 会被忽略,默认已启用忽略大小写)
+    - 带显示名称:word => 显示名称 或 word=>显示名称(=>两边空格可选)
+    - 正则带显示名称:/pattern/ => 显示名称
+
+    Args:
+        word: 原始词
+
+    Returns:
+        {"word": str, "is_regex": bool, "pattern": Optional[re.Pattern], "display_name": Optional[str]}
+    """
+    display_name = None
+
+    # 解析 => 显示名称 语法(支持 => 两边有或没有空格)
+    # 使用正则匹配:空格可选的 =>
+    display_match = re.search(r'\s*=>\s*', word)
+    if display_match:
+        parts = re.split(r'\s*=>\s*', word, 1)
+        word = parts[0].strip()
+        display_name = parts[1].strip() if len(parts) > 1 and parts[1].strip() else None
+
+    # 解析正则表达式:支持 /pattern/ 或 /pattern/flags(如 /pattern/i)
+    # flags 会被忽略,因为默认已启用 IGNORECASE
+    regex_match = re.match(r'^/(.+)/([gimsux]*)$', word)
+    if regex_match:
+        pattern_str = regex_match.group(1)
+        # flags 参数被忽略,统一使用 IGNORECASE
+        try:
+            pattern = re.compile(pattern_str, re.IGNORECASE)
+            return {
+                "word": pattern_str,
+                "is_regex": True,
+                "pattern": pattern,
+                "display_name": display_name,
+            }
+        except re.error:
+            # 正则表达式无效,当作普通词处理
+            pass
+
+    return {"word": word, "is_regex": False, "pattern": None, "display_name": display_name}
+
+
+def _word_matches(word_config: Union[str, Dict], title_lower: str) -> bool:
+    """
+    检查词是否在标题中匹配
+
+    Args:
+        word_config: 词配置(字符串或字典)
+        title_lower: 小写的标题
+
+    Returns:
+        是否匹配
+    """
+    if isinstance(word_config, str):
+        # 向后兼容:纯字符串
+        return word_config.lower() in title_lower
+
+    if word_config.get("is_regex") and word_config.get("pattern"):
+        # 正则匹配
+        return bool(word_config["pattern"].search(title_lower))
+    else:
+        # 子字符串匹配
+        return word_config["word"].lower() in title_lower
 
 
 def load_frequency_words(
@@ -104,24 +177,38 @@ def load_frequency_words(
                 except (ValueError, IndexError):
                     pass  # 忽略无效的@数字格式
             elif word.startswith("!"):
-                filter_words.append(word[1:])
-                group_filter_words.append(word[1:])
+                # 过滤词(支持正则语法)
+                filter_word = word[1:]
+                parsed = _parse_word(filter_word)
+                filter_words.append(parsed)
+                group_filter_words.append(parsed)
             elif word.startswith("+"):
-                group_required_words.append(word[1:])
+                # 必须词(支持正则语法)
+                req_word = word[1:]
+                group_required_words.append(_parse_word(req_word))
             else:
-                group_normal_words.append(word)
+                # 普通词(支持正则语法)
+                group_normal_words.append(_parse_word(word))
 
         if group_required_words or group_normal_words:
             if group_normal_words:
-                group_key = " ".join(group_normal_words)
+                group_key = " ".join(w["word"] for w in group_normal_words)
             else:
-                group_key = " ".join(group_required_words)
+                group_key = " ".join(w["word"] for w in group_required_words)
+
+            # 提取显示名称:优先使用第一个有 display_name 的词
+            display_name = None
+            for w in group_normal_words + group_required_words:
+                if w.get("display_name"):
+                    display_name = w["display_name"]
+                    break
 
             processed_groups.append(
                 {
                     "required": group_required_words,
                     "normal": group_normal_words,
                     "group_key": group_key,
+                    "display_name": display_name,  # 可能为 None
                     "max_count": group_max_count,
                 }
             )
@@ -132,7 +219,7 @@ def load_frequency_words(
 def matches_word_groups(
     title: str,
     word_groups: List[Dict],
-    filter_words: List[str],
+    filter_words: List,
     global_filters: Optional[List[str]] = None
 ) -> bool:
     """
@@ -141,7 +228,7 @@ def matches_word_groups(
     Args:
         title: 标题文本
         word_groups: 词组列表
-        filter_words: 过滤词列表
+        filter_words: 过滤词列表(可以是字符串列表或字典列表)
         global_filters: 全局过滤词列表
 
     Returns:
@@ -164,9 +251,10 @@ def matches_word_groups(
     if not word_groups:
         return True
 
-    # 过滤词检查
-    if any(filter_word.lower() in title_lower for filter_word in filter_words):
-        return False
+    # 过滤词检查(兼容新旧格式)
+    for filter_item in filter_words:
+        if _word_matches(filter_item, title_lower):
+            return False
 
     # 词组匹配检查
     for group in word_groups:
@@ -176,7 +264,7 @@ def matches_word_groups(
         # 必须词检查
         if required_words:
             all_required_present = all(
-                req_word.lower() in title_lower for req_word in required_words
+                _word_matches(req_item, title_lower) for req_item in required_words
             )
             if not all_required_present:
                 continue
@@ -184,7 +272,7 @@ def matches_word_groups(
         # 普通词检查
         if normal_words:
             any_normal_present = any(
-                normal_word.lower() in title_lower for normal_word in normal_words
+                _word_matches(normal_item, title_lower) for normal_item in normal_words
             )
             if not any_normal_present:
                 continue

+ 2 - 0
trendradar/notification/senders.py

@@ -55,6 +55,8 @@ SMTP_CONFIGS = {
     "aliyun.com": {"server": "smtp.aliyun.com", "port": 465, "encryption": "TLS"},
     # Yandex邮箱(使用 TLS)
     "yandex.com": {"server": "smtp.yandex.com", "port": 465, "encryption": "TLS"},
+    # iCloud邮箱(使用 SSL)
+    "icloud.com": {"server": "smtp.mail.me.com", "port": 587, "encryption": "SSL"},
 }
 
 

+ 55 - 40
trendradar/report/html.py

@@ -844,66 +844,81 @@ def render_html_content(
                 </div>"""
 
     # 生成 RSS 统计内容
-    def render_rss_stats_html(items: List[Dict], title: str = "RSS 订阅更新") -> str:
-        if not items:
+    def render_rss_stats_html(stats: List[Dict], title: str = "RSS 订阅更新") -> str:
+        """渲染 RSS 统计区块 HTML
+
+        Args:
+            stats: RSS 分组统计列表,格式与热榜一致:
+                [
+                    {
+                        "word": "关键词",
+                        "count": 5,
+                        "titles": [
+                            {
+                                "title": "标题",
+                                "source_name": "Feed 名称",
+                                "time_display": "12-29 08:20",
+                                "url": "...",
+                                "is_new": True/False
+                            }
+                        ]
+                    }
+                ]
+            title: 区块标题
+
+        Returns:
+            渲染后的 HTML 字符串
+        """
+        if not stats:
             return ""
 
-        rss_html = ""
-        rss_count = len(items)
-        rss_html += f"""
+        # 计算总条目数
+        total_count = sum(stat.get("count", 0) for stat in stats)
+        if total_count == 0:
+            return ""
+
+        rss_html = f"""
                 <div class="rss-section">
                     <div class="rss-section-header">
                         <div class="rss-section-title">{title}</div>
-                        <div class="rss-section-count">{rss_count} 条</div>
+                        <div class="rss-section-count">{total_count} 条</div>
                     </div>"""
 
-        # 按 feed_id 分组
-        feeds_grouped = {}
-        for item in items:
-            feed_id = item.get("feed_id", "unknown")
-            if feed_id not in feeds_grouped:
-                feeds_grouped[feed_id] = {
-                    "name": item.get("feed_name", feed_id),
-                    "items": []
-                }
-            feeds_grouped[feed_id]["items"].append(item)
+        # 按关键词分组渲染(与热榜格式一致)
+        for stat in stats:
+            keyword = stat.get("word", "")
+            titles = stat.get("titles", [])
+            if not titles:
+                continue
 
-        # 渲染每个 feed 分组
-        for feed_id, feed_data in feeds_grouped.items():
-            feed_name = feed_data["name"]
-            feed_items = feed_data["items"]
-            feed_item_count = len(feed_items)
+            keyword_count = len(titles)
 
             rss_html += f"""
                     <div class="feed-group">
                         <div class="feed-header">
-                            <div class="feed-name">{html_escape(feed_name)}</div>
-                            <div class="feed-count">{feed_item_count} 条</div>
+                            <div class="feed-name">{html_escape(keyword)}</div>
+                            <div class="feed-count">{keyword_count} 条</div>
                         </div>"""
 
-            for item in feed_items:
-                item_title = item.get("title", "")
-                url = item.get("url", "")
-                published_at = item.get("published_at", "")
-                author = item.get("author", "")
-
-                # 格式化发布时间
-                time_str = ""
-                if published_at:
-                    if isinstance(published_at, datetime):
-                        time_str = published_at.strftime("%m-%d %H:%M")
-                    else:
-                        time_str = str(published_at)[:16] if len(str(published_at)) > 16 else str(published_at)
+            for title_data in titles:
+                item_title = title_data.get("title", "")
+                url = title_data.get("url", "")
+                time_display = title_data.get("time_display", "")
+                source_name = title_data.get("source_name", "")
+                is_new = title_data.get("is_new", False)
 
                 rss_html += """
                         <div class="rss-item">
                             <div class="rss-meta">"""
 
-                if time_str:
-                    rss_html += f'<span class="rss-time">{html_escape(time_str)}</span>'
+                if time_display:
+                    rss_html += f'<span class="rss-time">{html_escape(time_display)}</span>'
+
+                if source_name:
+                    rss_html += f'<span class="rss-author">{html_escape(source_name)}</span>'
 
-                if author:
-                    rss_html += f'<span class="rss-author">by {html_escape(author)}</span>'
+                if is_new:
+                    rss_html += '<span class="rss-author" style="color: #dc2626;">NEW</span>'
 
                 rss_html += """
                             </div>

+ 1 - 1
version

@@ -1 +1 @@
-4.6.0
+4.7.0