有奖捉虫:行业应用 & 管理与支持文档专题 HOT

命名空间

Namespace = QCE/CKAFKA

监控指标

指标英文名
指标中文名
指标说明
单位
维度
统计规则 [period, statType]
BConsumeLocalTime95thTime
95th 消费本地耗时
消费本地耗时95th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BConsumeLocalTime999thTime
999th 消费本地耗时
消费本地耗时999th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BConsumeRemoteTime95thTime
95th 消费 ack=all 等待同步耗时
消费远程耗时95th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BConsumeRemoteTime999thTime
999th 消费 ack=all 等待同步耗时
消费远程耗时999th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BConsumeRequestQueueTime95thTime
95th 消费请求队列等待耗时
消费请求队列耗时95th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BConsumeRequestQueueTime999thTime
999th 消费请求队列等待耗时
消费请求队列耗时999th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BConsumeResponseQueueTime95thTime
95th 消费回包队列等待耗时
消费响应队列耗时95th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BConsumeResponseQueueTime999thTime
999th 消费回包队列等待耗时
消费响应队列耗时999th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BConsumeThrottleTime95thTime
95th 消费延时回包的耗时
消费延迟回包耗时95th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BConsumeThrottleTime999thTime
999th 消费延时回包的耗时
消费延迟回包耗时999th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BConsumeTotalTime95thTime
95th 消费总耗时
表示消费的总耗时,由请求队列耗时,本地耗时等指标汇总而成。注意,在每一个时间点,总耗时不会等于以上五个95th耗时的累加。原因是每个指标都是各自取平均得到的。故不相等
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BConsumeTotalTime999thTime
999th 消费总耗时
表示消费的总耗时,由请求队列耗时,本地耗时等指标汇总而成。注意,在每一个时间点,总耗时不会等于以上五个999th耗时的累加。原因是每个指标都是各自取平均得到的。故不相等
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BIsrExpand
ISR 扩充次数
Kafka ISR 扩充次数,即存在未同步副本的情况下,当未同步副本追上 leader 数据,会重新加入 ISR,此时该次数就会加1
Count
broker_ip
[60s, first] [300s, sum] [3600s, sum] [86400s, sum]
BIsrShrink
ISR 缩小抖动次数
Kafka ISR 收缩次数,即当出现 broker 宕机,Zookeeper 重连的情况,会出现 ISR 缩小的次数统计
Count
broker_ip
[60s, first] [300s, sum] [3600s, sum] [86400s, sum]
BNetworkProcessorAvgIdlePercent
网络繁忙程度
用于衡量实例当前网络线程处理能力的指标,越接近1越空闲
%
broker_ip
[60s, first] [300s, avg] [3600s, avg] [86400s, avg]
BProduceLocalTime95thTime
95th 生产本地耗时
生产本地耗时95th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BProduceLocalTime999thTime
999th 生产本地耗时
生产本地耗时999th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BProduceRemoteTime95thTime
95th 生产 ack=all 等待同步耗时
生产远程耗时95th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BProduceRemoteTime999thTime
999th 生产 ack=all 等待同步耗时
生产远程耗时999th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BProduceRequestQueueTime95thTime
95th 生产请求队列等待耗时
生产请求队列耗时95th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BProduceRequestQueueTime999thTime
999th 生产请求队列等待耗时
生产请求队列耗时999th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BProduceResponseQueueTime95thTime
95th 生产回包队列等待耗时
生产响应回包队列耗时95th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BProduceResponseQueueTime999thTime
999th 生产回包队列等待耗时
生产响应回包队列耗时999th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BProduceThrottleTime95thTime
95th 生产延时回包的耗时
生产延迟回包耗时95th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BProduceThrottleTime999thTime
999th 生产延时回包的耗时
生产延迟回包耗时999th
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BProduceTotalTime95thTime
95th 生产总耗时
表示生产请求的总耗时,由请求队列耗时,本地耗时,延时回包耗时等指标汇总而成。注意,在每一个时间点,总耗时不会等于以上五个耗时的累加。原因是每个指标都是各自取平均得到的。故不累加相等
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BProduceTotalTime999thTime
999th 生产总耗时
表示生产请求的总耗时,由请求队列耗时,本地耗时,延时回包耗时等指标汇总而成。注意,在每一个时间点,总耗时不会等于以上五个耗时的累加。原因是每个指标都是各自取平均得到的。故不累加相等
ms
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BUnderAr
低于 AR 分片数
集群中存在的未同步的副本个数,当实例存在未同步副本,就表示集群的健康度可能有问题
None
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
BZookeeperDisConnectsCount
zk 断连次数
Broker 和 Zookeeper 之间的长连接断开重连的次数。网络波动,集群负载较高有可能会引起连接断开&重连。发生时会发生 leader 切换。该值是一个累加值,Broker 启动后,断连一次加1,只有 Broker 重启才会置0
None
broker_ip
[60s, first] [300s, max] [3600s, max] [86400s, max]
CgroupMaxOffset
当前 partition 最大offset
消费分组最大 offset
count
consumerGroup, instanceId, partition, topicId, topicName
[60s, first] [300s, last] [3600s, last] [86400s, last]
CpartitionConsumerSpeed
消费速度/分钟
分区消费速度
count/min
consumerGroup, instanceId, partition, topicId, topicName
[60s, first] [300s, avg] [3600s, avg] [86400s, avg]
CpartitionMaxOffset
当前 partition 最大 offset
分区最大offset
count
consumerGroup, instanceId, partition, topicId, topicName
[60s, first] [300s, 0]
CpartitionOffset
当前消费 offset
分区当前消费 offset
count
consumerGroup, instanceId, partition, topicId, topicName
[60s, first] [300s, 0]
CpartitionUnconsume
未消费的消息条数
分区当前未消费消息
count
consumerGroup, instanceId, partition, topicId, topicName
[60s, first] [300s, 0]
CpuUsage
CPU 使用率
CPU 使用率
%
instanceid
[60s, first] [300s, last] [3600s, last] [86400s, last]
CtopicConCount
Topic 消费消息条数
消费者组消费速度是 Broker 通过 consume offset 统计的,而 Topic 或者实例的消费消息数是通过 Fetch 请求返回包统计的
count
instanceId, topicId
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
CtopicConFlow
Topic 消费流量
Topic 消费流量(不包含副本产生的流量),按照所选择的时间粒度统计求和
MB
instanceId, topicId
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
CtopicConReqCount
Topic 级别消费请求次数
Topic 级别消费请求次数,按照所选择的时间粒度统计求和
count
instanceId, topicId
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
CtopicConsumerSpeed
消费速度/分钟
主题消费速度
count/min
consumerGroup, instanceId, topicId, topicName
[60s, sum] [300s, avg] [3600s, avg] [86400s, avg]
CtopicMsgCount
Topic 落盘的消息总条数
Topic 落盘的消息总条数(不包含副本),按照所选择的时间粒度取最新值
count
instanceId, topicId
[60s, sum] [300s, last] [3600s, last] [86400s, last]
CtopicMsgHeap
Topic 占用磁盘的消息总量
Topic 占用磁盘的消息总量(不包含副本),按照所选择的时间粒度取最新值
MB
instanceId, topicId
[60s, sum] [300s, last] [3600s, last] [86400s, last]
CtopicMsgOffset
当前消费 offset
主题级别消费分组 offset
count
consumerGroup, instanceId, partition, topicId, topicName
[60s, first] [300s, last] [3600s, last] [86400s, last]
CtopicProCount
Topic 生产消息条数
Topic 生产消息条数,按照所选择的时间粒度统计求和
count
instanceId, topicId
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
CtopicProFlow
Topic 生产流量
Topic 生产流量(不包含副本产生的流量),按照所选择的时间粒度统计求和
MB
instanceId, topicId
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
CtopicProReqCount
Topic 级别生产请求次数
Topic 级别生产请求次数,按照所选择的时间粒度统计求和
count
instanceId, topicId
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
CtopicUnconsumeMsgCount
未消费的消息条数
主题级别未消费消息个数
count
consumerGroup, instanceId, partition, topicId, topicName
[60s, first] [300s, last] [3600s, last] [86400s, last]
CtopicUnconsumeMsgOffset
未消费消息堆积量
主题级别未消费消息 offset
MB
consumerGroup, instanceId, partition, topicId, topicName
[60s, first] [300s, last] [3600s, last] [86400s, last]
DetectStatus
Broker 探测状态
Broker 实例状态
None
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
DiskUsage
磁盘使用率
每台 CVM 上的硬盘实际使用情况。由于挂载 HDD/SSD 硬盘物理限制,且需要预留额外空间应对突发情况等原因,所有主机硬盘使用率之和与 CKafka 实例磁盘使用率不一定一致
%
instanceid
[60s, first] [300s, last] [3600s, last] [86400s, last]
InstanceConCount
Topic 消费消息条数
实例消费消息条数,按照所选择的时间粒度统计求和
count
instanceId
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
InstanceConFlow
Topic 消费流量
实例消费流量(不包含副本产生的流量),按照所选择的时间粒度统计求和
MB
instanceId
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
InstanceConnectCount
实例连接数
客户端和服务器的连接数
count
instanceId
[60s, first] [300s, last]
InstanceConnectPercentage
连接数百分比
实例连接数百分比(客户端和服务端连接数占配额百分比)
%
instanceId
[60s, first] [300s, last] [3600s, last] [86400s, last]
InstanceConReqCount
Topic 级别消费请求次数
实例级别消费请求次数,按照所选择的时间粒度统计求和
count
instanceId
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
InstanceConsumeBandwidthPercentage
消费带宽百分比
实例消费带宽百分比(实例消费带宽占配额百分比)
%
instanceId
[60s, first] [300s, last] [3600s, last] [86400s, last]
InstanceConsumeGroupNum
消费分组数
实例消费分组数量
None
instanceId
[60s, first] [300s, last] [3600s, last] [86400s, last]
InstanceConsumeGroupPercentage
消费分组百分比
实例消费分组百分比(实例消费组数占配额百分比)
%
instanceId
[60s, first] [300s, last] [3600s, last] [86400s, last]
InstanceConsumeThrottle
消费限流次数
实例消费限流次数
Count
instanceId
[60s, first] [300s, sum] [3600s, sum] [86400s, sum]
InstanceDiskUsage
instance_disk_usage
当前磁盘占用与实例规格磁盘总容量的百分比
%
instanceId
[60s, expr] [300s, max]
InstanceMaxConFlow
最大消费流量
实例消费消息峰值带宽(消费时无副本的概念)
MBytes/s
instanceId
[60s, first] [300s, max]
InstanceMaxProFlow
实例最大生产流量(不含副本)
实例生产消息峰值带宽(不包含副本生产的带宽)
MBytes/s
instanceId
[60s, first] [300s, max]
InstanceMsgCount
Topic 落盘的消息总条数
实例落盘的消息总条数(不包含副本),按照所选择的时间粒度取最新值
count
instanceId
[60s, sum] [300s, last] [3600s, last] [86400s, last]
InstanceMsgHeap
实例消息堆积量
实例磁盘占用量(包含副本),按照所选择的时间粒度取最新值
MB
instanceId
[60s, first] [300s, last] [3600s, last] [86400s, last]
InstancePartitionNum
partition 数量
实例 partition 数量
None
instanceId
[60s, first] [300s, last] [3600s, last] [86400s, last]
InstancePartitionPercentage
partition 百分比
实例 partition 百分比(占用配额百分比)
%
instanceId
[60s, first] [300s, last] [3600s, last] [86400s, last]
InstanceProCount
Topic 生产消息条数
实例生产消息条数,按照所选择的时间粒度统计求和
count
instanceId
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
InstanceProduceBandwidthPercentage
生产带宽百分比
实例生产带宽百分比(占用配额百分比)
%
instanceId
[60s, first] [300s, last] [3600s, last] [86400s, last]
InstanceProduceThrottle
生产限流次数
实例生产限流次数
Count
instanceId
[60s, first] [300s, sum] [3600s, sum] [86400s, sum]
InstanceProFlow
Topic 生产流量
实例生产流量(不包含副本产生的流量),按照所选择的时间粒度统计求和
MB
instanceId
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
InstanceProReqCount
Topic 级别生产请求次数
实例级别生产请求次数,按照所选择的时间粒度统计求和
count
instanceId
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
InstanceReplicaProduceFlow
生产全流量(包含副本流量)
实例生产消息峰值带宽(包含副本生产的带宽)
MBytes/s
instanceId
[60s, first] [300s, sum] [3600s, sum] [86400s, sum]
InstanceTopicNum
Topic 数量
实例 Topic 数量
None
instanceId
[60s, first] [300s, last] [3600s, last] [86400s, last]
InstanceTopicPercentage
Topic 百分比
实例 Topic 百分比(占用配额)
%
instanceId
[60s, first] [300s, last] [3600s, last] [86400s, last]
Intraffic
公网入带宽
公网入带宽
Bit/s
instanceid
[60s, avg] [300s, avg] [3600s, avg] [86400s, avg]
LanIntraffic
内网入带宽
内网入带宽。按照所选择的时间粒度统计求
MBytes
instanceid
[60s, first] [300s, sum] [3600s, sum] [86400s, sum]
LanOuttraffic
内网出带宽
内网出带宽。按照所选择的时间粒度统计求和
MBytes
instanceid
[60s, first] [300s, sum] [3600s, sum] [86400s, sum]
LastOldGcCount
Broker Full GC 的次数
Broker Full GC 的次数
Count
brokerip
[60s, first] [300s, sum] [3600s, sum] [86400s, sum]
LastYoungGcCount
Broker Yong GC 的次数
Broker Yong GC 的次数
Count
brokerip
[60s, first] [300s, sum] [3600s, sum] [86400s, sum]
MaxOffsetTopic
当前 partition 最大 offset
消费分组对应当前 Topic 最大 offset
count
consumerGroup, instanceId, topicId, topicName
[60s, max] [300s, last] [3600s, last] [86400s, last]
MemUsage
内存利用率
内存利用率
%
instanceid
[60s, first] [300s, last] [3600s, last] [86400s, last]
OffsetTopic
当前消费 offset
消费分组当前消费 offset
count
consumerGroup, instanceId, topicId, topicName
[60s, max] [300s, max] [3600s, max] [86400s, max]
Outtraffic
公网出带宽
公网出带宽
Bit/s
instanceid
[60s, avg] [300s, avg] [3600s, avg] [86400s, avg]
PartitionConCount
Topic 消费消息条数
Partition 消费消息条数,按照所选择的时间粒度统计求和
Count
instanceid, partition, topicid
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
PartitionConFlow
Topic 消费流量
Partition 消费流量(不包含副本产生的流量),按照所选择的时间粒度统计求和
MBytes
instanceid, partition, topicid
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
PartitionMsgCount
Topic 落盘的消息总条数
Partition 落盘的消息总条数(不包含副本),按照所选择的时间粒度取最新值
Count
instanceid, partition, topicid
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
PartitionMsgHeap
Topic 占用磁盘的消息总量
Partition 占用磁盘的消息总量(不包含副本),按照所选择的时间粒度取最新值
MBytes
instanceid, partition, topicid
[60s, sum] [300s, max] [3600s, max] [86400s, max]
PartitionProCount
Topic 生产消息条数
Partition 生产消息条数,按照所选择的时间粒度统计求和
Count
instanceid, partition, topicid
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
PartitionProFlow
Topic 生产流量
Partition 生产流量(不包含副本产生的流量),按照所选择的时间粒度统计求和
MBytes
instanceid, partition, topicid
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
QueueSize
队列深度
请求队列深度反映当前未处理的生产请求个数,如果该值过大可能是同一时间请求量过大,CPU 负载过高或者磁盘 IO 出现瓶颈
None
broker_ip
[60s, first] [300s, last] [3600s, last] [86400s, last]
SetBatchSizeAvg
批次平均数据大小
每批平均处理数据记录数
Count
setid
[60s, first] [300s, last] [3600s, last] [86400s, last]
SetBatchSizeMax
批次最大数据大小
批次最大处理数据记录数
Count
setid
[60s, first] [300s, max] [3600s, max] [86400s, max]
SetMaxLag
还未从源拉取到的条数(set 聚合)
未从源拉取到的记录数
Count
setid
[60s, first] [300s, last] [3600s, last] [86400s, last]
SetPollBatchAvgTimeMs
批次拉取的平均耗时
每批平均处理数据耗时
ms
setid
[60s, first] [300s, last] [3600s, last] [86400s, last]
SetPollBatchMaxTimeMs
批次最大拉取耗时
批次拉取最大耗时
ms
setid
[60s, max] [300s, max] [3600s, max] [86400s, max]
SetRecordSendTotal
当有消息正同步时,累计写入的记录数(set 聚合)
已写入目标的记录数
Count
setid
[60s, first] [300s, last] [3600s, last] [86400s, last]
SetRecordsLead
已从源拉取的条数(set 聚合)
已从源拉取到的记录数
Count
setid
[60s, first] [300s, last] [3600s, last] [86400s, last]
SetSourceRecordActiveCount
此任务已生成但尚未完全写入 Kafka 的记录数(set 聚合)
未写入的目标的记录数
Count
setid
[60s, first] [300s, last] [3600s, last] [86400s, last]
SourceRecordPollTotal
指定源连接器的任务生成/轮询(转换前)的记录总数
已拉取源数据的条数
Count
connectorname, task
[60s, first] [300s, last] [3600s, last] [86400s, last]
SourceRecordWriteTotal
自该任务上次重新启动以来从转换输出并为此任务写入 Kafka 的记录数
源数据已写入的条数
Count
connectorname, task
[60s, first] [300s, last] [3600s, last] [86400s, last]
TMaxConsumeFlow
Topic 最大消费流量
Topic 最大消费流量
MBytes/s
instanceId, topicId
[60s, first] [300s, max] [3600s, max] [86400s, max]
TMaxProduceFlow
最大生产流量
Topic 最大生产流量(不含副本流量)
MBytes/s
instanceId, topicId
[60s, first] [300s, max] [3600s, max] [86400s, max]
TTopicConsumeThrottle
Topic 消费限流次数
Topic 消费限流次数
Count/s
instanceId, topicId
[60s, avg] [300s, avg] [3600s, avg] [86400s, avg]
TTopicProduceThrottle
Topic 生产限流次数
Topic 生产限流次数
Count/s
instanceId, topicId
[60s, avg] [300s, avg] [3600s, avg] [86400s, avg]
UnconsumeSizeTopic
未消费消息堆积量
消费分组未消费消息大小
MB
consumerGroup, instanceId, topicId, topicName
[60s, sum] [300s, last] [3600s, last] [86400s, last]
UnconsumeTopic
未消费的消息条数
消费分组未消费消息数
count
consumerGroup, instanceId, topicId, topicName
[60s, sum] [300s, last] [3600s, last] [86400s, last]

各维度对应参数总览

参数名称
维度名称
维度解释
格式
Instances.N.Dimensions.0.Name
instanceId
ckafka 实例 ID 的维度名称
输入 String 类型维度名称:instanceId
Instances.N.Dimensions.0.Value
instanceId
ckafka 具体实例的 ID
输入实例具体 ID,例如:ckafka-test
Instances.N.Dimensions.0.Name
instanceid
专业版 ckafka 实例下的维度名称
输入 String 类型维度名称:instanceid
Instances.N.Dimensions.0.Value
instanceid
专业版 ckafka 实例下的broker ip
输入实例具体 ID,例如:brokerip
Instances.N.Dimensions.1.Name
topicId
实例所在主题 ID 的维度名称
输入 String 类型维度名称:topicId
Instances.N.Dimensions.1.Value
topicId
实例所在主题的具体主题 ID
输入主题具体 ID,例如:topic-test
Instances.N.Dimensions.0.Name
consumerGroup
消费分组的维度名称
输入 String 类型维度名称:consumerGroup
Instances.N.Dimensions.0.Value
consumerGroup
具体消费分组信息
输入用户需要查看的消费分组信息,例如:perf-consumer-8330
Instances.N.Dimensions.3.Name
partition
partition 的维度名称
输入 String 类型维度名称:partition
Instances.N.Dimensions.3.Value
partition
具体 partition 信息
输入 topic 分区信息,例如:0
Instances.N.Dimensions.4.Name
topicName
主题的维度名称
输入 String 类型维度名称:topicName
Instances.N.Dimensions.4.Value
topicName
具体主题名称
输入用户消费主题的名称,例如:test
Instances.N.Dimensions.5.Name
broker_ip
主题 ID 的名称
输入 String 类型维度名称:broker_ip
Instances.N.Dimensions.5.Value
broker_ip
主题 ID
输入用户要查看的主题ID:示例值:inter-topic-hwz9

入参说明

查询 QCE/CKAFKA 监控数据,入参取值如下: &Namespace=QCE/CKAFKA &Instances.N.Dimensions.0.Name=consumerGroup &Instances.N.Dimensions.0.Value=消费分组 &Instances.N.Dimensions.1.Name=instanceId &Instances.N.Dimensions.1.Value=实例 ID &Instances.N.Dimensions.2.Name=topicId &Instances.N.Dimensions.2.Value=主题 ID &Instances.N.Dimensions.3.Name=partition &Instances.N.Dimensions.3.Value=分区 &Instances.N.Dimensions.4.Name=topicName &Instances.N.Dimensions.4.Value=主题名称
&Instances.N.Dimensions.5.Name=broker_ip &Instances.N.Dimensions.5.Value=主题 ID
说明:
不同监控指标维度不同,请根据对应指标维度填写入参。
?
http://www.vxiaotou.com