Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hadoop input plugin to categraf #1137

Merged
merged 3 commits into from
Jan 20, 2025

Conversation

JiaLiangC
Copy link
Contributor


Add hadoop input plugin to categraf

feat: Add Hadoop input plugin to Categraf


PR 描述

新增功能

此 PR 新增了一个 Hadoop 监控插件,支持通过 JMX 接口采集 Hadoop 集群中以下组件的监控指标:

  • Yarn ResourceManager
  • Yarn NodeManager
  • Hadoop NameNode
  • Hadoop DataNode

配置说明

插件的配置文件位于 conf/input.hadoop/hadoop.toml,支持以下配置项:

通用配置

[common]
useSASL = false
saslUsername = "HTTP/_HOST"
saslDisablePAFXFast = true
saslMechanism = "gssapi"
kerberosAuthType = "keytabAuth"
keyTabPath = "/path/to/keytab"
kerberosConfigPath = "/path/to/krb5.conf"
realm = "EXAMPLE.COM"

组件配置

每个组件的配置通过 [[components]] 块定义,支持以下字段:

  • name:组件名称(如 YarnResourceManager)。
  • port:JMX 端口。
  • processName:进程名称,用于动态判断是否需要采集该组件的指标。
  • allowRecursiveParse:是否递归解析 JMX 返回的 JSON 数据。
  • allowMetricsWhiteList:是否启用白名单。
  • jmxUrlSuffix:JMX URL 后缀。
  • white_list:需要采集的指标名称列表。

示例配置:

[[components]]
name = "YarnResourceManager"
port = 8088
processName = "org.apache.hadoop.yarn.server.resourcemanager.ResourceManager"
allowRecursiveParse = true
allowMetricsWhiteList = true
jmxUrlSuffix = "/jmx"
white_list = [
    "NumActiveNMs", # 活跃的NodeManager数量
    "NumUnhealthyNMs", # 不健康的NodeManager数量
    "NumLostNMs", # 丢失连接的NodeManager数量
]

[[components]]
name = "YarnNodeManager"
port = 8042
processName = "Dproc_nodemanager"
allowRecursiveParse = true
allowMetricsWhiteList = true
jmxUrlSuffix = "/jmx"
white_list = [
    "ContainersLaunched",        # 已启动的容器总数
    "ContainersCompleted",       # 已完成的容器总数
    "ContainersFailed",          # 失败的容器总数
]

[[components]]
name = "HadoopNameNode"
port = 50070
processName = "org.apache.hadoop.hdfs.server.namenode.NameNode"
allowRecursiveParse = true
allowMetricsWhiteList = true
jmxUrlSuffix = "/jmx"
white_list = [
    "FSState", # NameNode 文件系统状态(Operational/SafeMode等)
    "HAState", # HA状态(active/standby)
    "State", # NameNode 状态
]

[[components]]
name = "HadoopDataNode"
port = 1022
processName = "Dproc_datanode"
allowRecursiveParse = true
allowMetricsWhiteList = true
jmxUrlSuffix = "/jmx"
white_list = [
    "SystemCpuLoad",              # 系统CPU负载
    "ProcessCpuLoad",             # DataNode进程CPU负载
    "HeapMemoryUsage",            # JVM堆内存使用情况
]

白名单的作用

  • 白名单white_list 用于指定需要采集的指标名称。插件会根据白名单中的指标名称从 JMX 接口中提取对应的数据。
  • 动态采集:插件会根据 processName 判断当前机器是否有该进程,如果有则自动采集白名单中的指标。
  • 递归解析:如果开启 allowRecursiveParse,插件会递归解析 JMX 返回的 JSON 数据,并采集白名单中的指标。

测试

已通过以下测试:

  1. 在 Hadoop 集群中部署 Categraf,验证插件能够正确采集 Yarn ResourceManager、Yarn NodeManager、Hadoop NameNode 和 Hadoop DataNode 的指标。
  2. 验证白名单功能,确保只有白名单中的指标被采集。
  3. 验证递归解析功能,确保嵌套的 JSON 数据能够被正确解析。
    image
    image

相关 Issue

#1136)


代码变更

新增文件

  1. plugins/inputs/hadoop/hadoop.go:Hadoop 插件的核心实现。
  2. conf/input.hadoop/hadoop.toml:Hadoop 插件的配置文件模板。
  3. plugins/inputs/hadoop/README.md:Hadoop 插件的使用文档。

修改文件

  1. plugins/inputs/inputs.go:注册 Hadoop 插件。

@JiaLiangC JiaLiangC changed the title add hadoop input plugin to categraf Add hadoop input plugin to categraf Jan 17, 2025
inputs/hadoop/hadoop.go Outdated Show resolved Hide resolved
go.mod Show resolved Hide resolved
@kongfei605
Copy link
Collaborator

Thank you @JiaLiangC

@kongfei605 kongfei605 merged commit cc550db into flashcatcloud:main Jan 20, 2025
2 of 3 checks passed
@JiaLiangC
Copy link
Contributor Author

Thanks for helping review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants