Huginn-信息聚合工具
2019年01月19日

概述

Huginn 是一个可以通过构建 agents 来帮你实现在线自动化任务的系统。它们可以理解 web,监听事件,按你所需地去执行一些行为.Huginn其实非常适合像我这样的RSS阅读重度"用户"。

部署流程

  1. 初始化
#!/bin/bash
docker run --name huginn_db \
	-v /path/of/dir:/var/lib/postgresql/data \
    -e POSTGRES_PASSWORD=mysecretpassword \
    -e POSTGRES_USER=huginn \
	-d postgres:11.1
docker run --name huginn_web \
    --link huginn_db:postgres \
    -p 3000:3000 \
    -e HUGINN_DATABASE_USERNAME=huginn \
    -e HUGINN_DATABASE_PASSWORD=mysecretpassword \
    -e HUGINN_DATABASE_ADAPTER=postgresql \
    huginn/huginn

huginn和数据库使用两个容器分开运行,数据文件脱离docker保存在宿主系统中.

参数 含义
-p 3000:3000 将容器的3000端口映射到主机的3000
-e HUGINN_TIMEZONE=Beijing 设置huginn的时区为Beijing,Shanghai无效
  1. 运行
# 运行
docker start huginn_db && docker start huginn_web

# 停止
docker stop huginn_db && docker stop huginn_web
  1. 访问
google-chrome example.com:3000

配置流程

假设我需要获取A网站的文章RSS输出:

  • huginn以scenarios作为上述目的的逻辑单位,scenarios内又将操作细化成多个agent模块,每个模块完成单一操作,最终达到需求效果.
  • 逻辑流程:获取A网站的文章列表->获取文章正文->输出RSS
  • 完成上述操作,只需要三个agent就可以完成了,agent之间通过参数传递必要信息.
  • 根据需求,可以修改agent的类型,前两步采用Website Agent,最后一步采用Data Output Agent
  • 其中agent触发类型为on_change的,会在内容存在差异时生成event,并传递给下一步agent,触发类型为all的会无条件生成event,传递给下一步.

scenarios源文件,可以直接导入huginn内

{
  "schema_version": 1,
  "name": "scenarios名称",
  "description": "scenarios描述",
  "source_url": false,
  "guid": "1021ada18740884e0e4c9ad2cf563dfd",
  "tag_fg_color": "#ffffff",
  "tag_bg_color": "#0061ff",
  "icon": "gear",
  "exported_at": "2019-01-25T16:38:28Z",
  "agents": [
    {
      "type": "Agents::WebsiteAgent",
      "name": "获取文章正文",
      "disabled": false,
      "guid": "440e1095b040e6b572d399c15e9f5667",
      "options": {
        "expected_update_period_in_days": "2",
        "url": "{{url}}",
        "type": "html",
        "mode": "on_change",
        "extract": {
          "title": {
            "xpath": "//div[@class=\"title\"]",
            "value": "text()"
          },
          "content": {
            "xpath": "//div[@class=\"content\"]",
            "value": "text()"
          }
        }
      },
      "schedule": "every_1m",
      "keep_events_for": 3600,
      "propagate_immediately": true
    },
    {
      "type": "Agents::WebsiteAgent",
      "name": "获取文章列表",
      "disabled": false,
      "guid": "70be1060f0a97a8519dce02792f076b4",
      "options": {
        "expected_update_period_in_days": "2",
        "url": "https://www.a.com/",
        "type": "html",
        "mode": "all",
        "extract": {
          "url": {
            "xpath": "//div[@class='list']//a",
            "value": "@href"
          },
          "title": {
            "xpath": "//div[@class='list']//a",
            "value": "text()"
          }
        }
      },
      "schedule": "every_10m",
      "keep_events_for": 3600,
      "propagate_immediately": true
    },
    {
      "type": "Agents::DataOutputAgent",
      "name": "生成RSS",
      "disabled": false,
      "guid": "86a178062c1c0685197cdcc3d87677b1",
      "options": {
        "secrets": [
          "a网站新闻输出"
        ],
        "expected_receive_period_in_days": 2,
        "template": {
          "title": "a网站新闻",
          "description": "a网站新闻",
          "item": {
            "title": "{{title}}",
            "description": "{{content}}",
            "link": "{{url}}"
          },
          "events_order": "[[\"{{pubDate}}\", \"number\", true]]"
        },
        "ns_media": "true"
      },
      "propagate_immediately": true
    }
  ],
  "links": [
    {
      "source": 0,
      "receiver": 2
    },
    {
      "source": 1,
      "receiver": 0
    }
  ],
  "control_links": [

  ]
}

常用链接

容器地址
https://huginn.cn/blog/huginn/huginn-agent-list