当前位置:主页 > 查看内容

Rollup

发布时间:2021-05-17 00:00| 位朋友查看

简介:创作人:杨景江 汇总作业( rollup jobs )是周期性执行的任务,通过汇总作业,可以将某些索引中的数据进行周期性自定义化聚合,然后将聚合后的数据写入到新的索引中,整个流程叫做 Rollup 。 使用场景: 汇总历史数据: 由于历史数据数据量大,占用磁盘成本……

创作人:杨景江

汇总作业( rollup jobs )是周期性执行的任务,通过汇总作业,可以将某些索引中的数据进行周期性自定义化聚合,然后将聚合后的数据写入到新的索引中,整个流程叫做 Rollup 。

使用场景:

汇总历史数据:

由于历史数据数据量大,占用磁盘成本高,相关业务方只关心近期几天的原始数据,历史数据不关心原始数据,只关心固定指标统计。为了节省成本,就可以通过 Rollup 操作将历史数据进行汇总,写入到新的索引,之后将历史索引删除( ILM 功能),进而节省大量成本

转换最佳时间:

由于数据量或机器硬件等原因,导致实时聚合查询耗时较长,可以通过在夜间或者准实时进行 Rollup 操作,将前一天索引或者几分钟前的数据进行汇总,写入到新索引(将毫秒级别数据汇总,转换为秒级甚至分钟级别),用户查询 Rollup 后新索引的数据,进而提升查询效率。

汇总历史数据功能限制:

汇总功能只允许使用以下聚合方式对字段进行分组

Date Histogram aggregationHistogram aggregationTerms aggregation (使用较多)

数字字段只可以进行如下指标聚合

Min aggregationMax aggregationSum aggregationAverage aggregationValue Count aggregation

每个功能都要结合具体业务场景来使用,切忌为了使用功能而设计

API 介绍

此处以 Elasticsearch 慢查原始数据统计功能为例进行介绍(敏感信息已经替换)

数据准备

索引 mapping 结构:

PUT es-slowlog-2021-04-21
 "mappings": {
 "_field_names": {
 "enabled": false
 "dynamic_templates": [
 "strings": {
 "match_mapping_type": "string",
 "mapping": {
 "ignore_above": 512,
 "type": "keyword"
 "properties": {
 "@timestamp": {
 "type": "date"
 "cluster": {
 "type": "keyword",
 "ignore_above": 512
 "host": { 
 "properties": { 
 "name": { 
 "type": "keyword",
 "ignore_above": 512
 "elasticsearch": {
 "properties": {
 "index": {
 "properties": {
 "name": {
 "type": "keyword",
 "ignore_above": 512
 "timestamp_local": {
 "type": "date"
}

单条数据 demo 样例(与上边的 mapping 对应):

POST es-slowlog-2021-04-21/_doc
 "cluster": "clustername-demo",
 "offset": 0,
 "log": {
 "level": "WARN"
 "prospector": {
 "type": "log"
 "source": "/home/elasticsearch/clustername-demo_index_search_slowlog.log",
 "message": "[2021-04-21T14:03:06,896][WARN ][i.s.s.query ] [host_name-demo] [basiclog-slowlog_2021-04-02][2] took[2.3s], took_millis[2307], total_hits[23129 hits], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[4], source[{\"size\":0,\"query\":{\"bool\":{\"filter\":[{\"match_all\":{\"boost\":1.0}},{\"match_phrase\":{\"logtype.keyword\":{\"query\":\"server\",\"slop\":0,\"zero_terms_query\":\"NONE\",\"boost\":1.0}}},{\"range\":{\"@timestamp\":{\"from\":\"2021-04-02T15:48:04.138Z\",\"to\":\"2021-04-02T16:03:04.138Z\",\"include_lower\":true,\"include_upper\":true,\"format\":\"strict_date_optional_time\",\"boost\":1.0}}}],\"adjust_pure_negative\":true,\"boost\":1.0}},\"_source\":{\"includes\":[],\"excludes\":[]},\"stored_fields\":\"*\",\"docvalue_fields\":[{\"field\":\"@timestamp\",\"format\":\"date_time\"},{\"field\":\"time\",\"format\":\"date_time\"}],\"script_fields\":{},\"track_total_hits\":2147483647,\"aggregations\":{\"2\":{\"terms\":{\"field\":\"cluster.keyword\",\"size\":20,\"min_doc_count\":1,\"shard_min_doc_count\":0,\"show_term_doc_count_error\":false,\"order\":[{\"_count\":\"desc\"},{\"_key\":\"asc\"}]}}}}], id[],",
 "input": {
 "type": "log"
 "logtype": "slowlog",
 "log_type": "basic-slowlog",
 "timestamp_local": "2021-04-21T14:03:06.896+08:00",
 "@timestamp": "2021-04-21T14:03:06.896Z",
 "elasticsearch": {
 "node": {
 "name": "host_name-demo"
 "slowlog": {
 "took": "2.3s",
 "logger": "i.s.s.query "
 "index": {
 "name": "basiclog-slowlog_2021-04-02"
 "shard": {
 "id": "2"
 "host": {
 "name": "host_name-demo"
 "beat": {
 "hostname": "beathostname-demo",
 "name": "beathostname-demo",
 "version": "6.5.4"
 "@version": "1",
 "event": {
 "duration": 2307000000,
 "created": "2021-04-21T06:59:11.934Z",
 "kind": "event",
 "category": "database",
 "type": "info"

在 Kibana 中配置 Index Patterns

注:最新版本 API 请参考官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/master/xpack-rollup.html基础 API

创建汇总任务:

请求:PUT _rollup/job/ job_id

参数必选类型说明index_pattern是string索引pattern名称rollup_index是string目标索引,部分版本限制索引名以rollup开头cron是string定时任务执行周期,与汇总数据的时间间隔无关。page_size是integer汇总索引每次迭代中处理的存储桶的结果数。值越大,执行越快,但是处理过程中需要更多的内存。groups是object为汇总作业定义日期直方图聚合-date_histogram是object定义 日期直方图聚合--calendar_interval是object时间桶大小,1m 代表一分钟一个桶--field是string聚合依据的时间字段--time_zone否string时区,default:UTC--delay否time units汇总延时,多久之前的数据可以进行汇总,因为部分数据写入可能会有延时,汇总任务前要将数据全部写入并且可查询-terms否object分组的字段属性--fields是string定义terms字段集。此数组字段可以是keyword也可以是numerics类型,无顺序要求。-histogram否object直方图组将一个或多个数字字段聚合为数字直方图间隔--fields是array构建直方图的字段,必须是数字--interval是integer汇总时要生成的直方图存储桶的间隔metrics否object定义汇总数据的方式-field是string定义需要采集的指标的字段。例如以上示例是分别对,进行采集。-metrics是array定义聚合算子。设置为sum,表示对某个指标进行sum运算。仅支持min、max、sum、avg、value_count。timeout否string请求超时时间
PUT _rollup/job/es-slowlog-agg-id
 "index_pattern": "es-slowlog*", //索引pattern名称
 "rollup_index": "rollup-es-slowlog-agg", //目标索引,rollup-开头必须明确指定
 "cron": "0 * * * * ?", //定时任务执行周期,与汇总数据的时间间隔无关。
 "groups": {
 "date_histogram": { //定义 日期直方图聚合
 "calendar_interval": "1m", // 时间桶大小,一分钟一个桶
 "field": "timestamp_local", //聚合的时间字段
 "delay": "1m", //汇总延时,多久之前的数据可以进行汇总,因为部分数据写入可能会有延时,汇总任务前要将数据全部写入并且可查询
 "time_zone": "UTC" // 时区 eg: GMT+8
 "terms": {
 "fields": [ //汇总字段
 "cluster", // 集群的名称
 "elasticsearch.index.name", //索引名称
 "host.name" //主机名
 "metrics": [], //默认是count数,可以指定min、max、sum、average、value count
 "timeout": "20s", // 超时时间
 "page_size": 10000 // 单页数量,较大的值会更快地汇总,但也会耗费更多内存
}

查询所有汇总任务:

GET _rollup/job/*

获取单个汇总任务详情:

请求:GET _rollup/job/ job_id

GET _rollup/job/es-slowlog-agg-id
 "jobs": [
 "config": {
 "id": "es-slowlog-agg-id",
 "index_pattern": "es-slowlog*",
 "rollup_index": "rollup-es-slowlog-agg",
 "cron": "0 * * * * ?",
 "groups": {
 "date_histogram": {
 "calendar_interval": "1m",
 "field": "timestamp_local",
 "delay": "1m",
 "time_zone": "UTC"
 "terms": {
 "fields": [
 "cluster",
 "elasticsearch.index.name",
 "host.name"
 "metrics": [
 "timeout": "20s",
 "page_size": 10000
 "status": {
 "job_state": "stopped",
 "upgraded_doc_id": true
 "stats": {
 "pages_processed": 0,
 "documents_processed": 0,
 "rollups_indexed": 0,
 "trigger_count": 0,
 "index_time_in_ms": 0,
 "index_total": 0,
 "index_failures": 0,
 "search_time_in_ms": 0,
 "search_total": 0,
 "search_failures": 0,
 "processing_time_in_ms": 0,
 "processing_total": 0
}

开始汇总任务:

请求:POST _rollup/job/ job_id /_start

POST _rollup/job/es-slowlog-agg-id/_start
//执行后获取当前任务状态,关注下status、stat,status中
GET _rollup/job/es-slowlog-agg-id
 "jobs": [
 "config": {
 "id": "es-slowlog-agg-id",
 "index_pattern": "es-slowlog*",
 "rollup_index": "rollup-es-slowlog-agg",
 "cron": "0 * * * * ?",
 "groups": {
 "date_histogram": {
 "calendar_interval": "1m",
 "field": "timestamp_local",
 "delay": "1m",
 "time_zone": "UTC"
 "terms": {
 "fields": [
 "cluster",
 "elasticsearch.index.name",
 "host.name"
 "metrics": [
 "timeout": "20s",
 "page_size": 10000
 "status": {
 "job_state": "started", //如果停止的任务,此处显示stopped 
 "current_position": { //当前rollup任务执行的位置,及term结果
 "cluster.terms": "clustername-demo",
 "elasticsearch.index.name.terms": "basiclog-slowlog_2021-04-02",
 "host.name.terms": "host_name-demo",
 "timestamp_local.date_histogram": 1618984980000
 "upgraded_doc_id": true
 "stats": {//执行状态
 "pages_processed": 2,
 "documents_processed": 1,
 "rollups_indexed": 1,
 "trigger_count": 1,
 "index_time_in_ms": 103,
 "index_total": 1,
 "index_failures": 0,
 "search_time_in_ms": 6,
 "search_total": 2,
 "search_failures": 0,
 "processing_time_in_ms": 0,
 "processing_total": 2
}

status.job_state 描述:

stopped

表示任务已暂停。

started

表示任务正在运行,但没有主动汇总数据。当 cron 间隔触发时,作业的任务将开始处理数据。

indexing

意味着正在处理数据并创建新的汇总文档。在此状态下,任何后续的 cron 间隔触发器都将被忽略,因为该作业已经与先前的触发器一起处于活动状态。

abort

是一种瞬态,通常用户不会看到。如果由于某种原因需要关闭任务(已删除作业,遇到不可恢复的错误等)。abort 状态后不久,作业将自己从群集中删除。

停止汇总任务:

请求:POST _rollup/job/ job_id /_stop

POST _rollup/job/es-slowlog-agg-id/_stop

删除汇总任务:

请求:DELETE _rollup/job/ job_id

删除操作需谨慎
DELETE /_rollup/job/es-slowlog-agg-id
_rollup_search 查询

因为在原始文档和汇总文档中使用的文档结构不同。 Rollup 搜索会将标准查询 DSL 重写为与汇总文档相同的结构,然后获取响应并将其重写回客户端。

使用方式:

GET ** target **/_rollup_search

target 参数规则(必需,字符串):

必须指定索引或通配符表达式。可以指定多个非汇总索引。只能指定一个汇总索引。如果提供多个,则会发生异常。可以使用通配符表达式,但是,如果它们匹配多个汇总索引,则会发生异常。

eg: es-slowlog*,rollup-es-slowlog-agg1/_rollup_search。

请求体支持常规 Search API 的功能的子集。它支持:

query用于指定 DSL 查询的参数,但受一些限制请参阅

汇总搜索限制https://www.elastic.co/guide/en/elasticsearch/reference/7.x/rollup-search-limitations.html

汇总聚合限制https://www.elastic.co/guide/en/elasticsearch/reference/7.x/rollup-agg-limitations.html

aggregations 用于指定聚合的参数不可用的功能:size:无法获取原始数据,如果想获取原始数据,请使用 _search 查询汇总索引。highlighter,suggestors,post_filter,profile,explain:不允许使用。原始数据和汇总索引同时查询实现原理:

Elasticsearch 接收到原始数据和汇总数据联合 _rollup_search 查询响应后, 会重写汇总响应,并将两者合并在一起。在合并过程中,如果两个响应之间的存储桶中有任何重叠,则使用非汇总索引中汇总的桶数据。

样例:

创建新的复杂任务,具体任务信息如下

//创建复杂任务,汇总多个指标,任务详情如下
 "config": {
 "id": "es-slowlog-agg-id1",
 "index_pattern": "es-slowlog*",
 "rollup_index": "rollup-es-slowlog-agg1",
 "cron": "0 * * * * ?",
 "groups": {
 "date_histogram": {
 "calendar_interval": "1m",
 "field": "timestamp_local",
 "delay": "1m",
 "time_zone": "UTC"
 "histogram": {
 "interval": 8,
 "fields": [
 "event.duration"
 "terms": {
 "fields": [
 "cluster",
 "elasticsearch.index.name",
 "host.name"
 "metrics": [
 "field": "event.duration",
 "metrics": [
 "avg",
 "max",
 "min",
 "sum",
 "value_count"
 "timeout": "20s",
 "page_size": 10000
 "status": {
 "job_state": "started",
 "current_position": {
 "cluster.terms": "clustername-demo",
 "elasticsearch.index.name.terms": "basiclog-slowlog_2021-04-02",
 "event.duration.histogram": 2307000000,
 "host.name.terms": "host_name-demo",
 "timestamp_local.date_histogram": 1618984980000
 "upgraded_doc_id": true
 "stats": {
 "pages_processed": 6,
 "documents_processed": 1,
 "rollups_indexed": 1,
 "trigger_count": 5,
 "index_time_in_ms": 115,
 "index_total": 1,
 "index_failures": 0,
 "search_time_in_ms": 21,
 "search_total": 6,
 "search_failures": 0,
 "processing_time_in_ms": 0,
 "processing_total": 6

_search 查询汇总目标索引中的原始数据:

GET rollup-es-slowlog-agg1/_search
 "size":10,
 "query": {
 "bool": {
 "must": [],
 "filter": [
 "match_all": {}
 "should": [],
 "must_not": []
 "took": 2,
 "timed_out": false,
 "_shards": {
 "total": 1,
 "successful": 1,
 "skipped": 0,
 "failed": 0
 "hits": {
 "total": {
 "value": 1,
 "relation": "eq"
 "max_score": 1,
 "hits": [
 "_index": "rollup-es-slowlog-agg1",
 "_type": "_doc",
 "_id": "es-slowlog-agg-id1$5uzfGmyS2uAb3XRznkZBgA",
 "_score": 1,
 "_source": {
 "cluster.terms.value": "bj-ali-xueyan-oa-es-cluster",
 "event.duration.avg._count": 1,
 "event.duration.max.value": 2377000000,
 "event.duration.histogram.value": 2377000000,
 "timestamp_local.date_histogram.time_zone": "UTC",
 "elasticsearch.index.name.terms.value": "basiclog-slowlog_2400-2021-04-02",
 "host.name.terms._count": 1,
 "cluster.terms._count": 1,
 "host.name.terms.value": "bj-sjhl-university-es-online-99-62",
 "event.duration.avg.value": 2377000000,
 "elasticsearch.index.name.terms._count": 1,
 "event.duration.histogram.interval": 8,
 "timestamp_local.date_histogram._count": 1,
 "timestamp_local.date_histogram.timestamp": 1618995780000,
 "_rollup.version": 2,
 "event.duration.histogram._count": 1,
 "timestamp_local.date_histogram.interval": "1m",
 "event.duration.sum.value": 2377000000,
 "event.duration.min.value": 2377000000,
 "event.duration.value_count.value": 1,
 "_rollup.id": "es-slowlog-agg-id1"

本文转自网络,原文链接:https://developer.aliyun.com/article/784094
本站部分内容转载于网络,版权归原作者所有,转载之目的在于传播更多优秀技术内容,如有侵权请联系QQ/微信:153890879删除,谢谢!
上一篇:Graph 下一篇:Kibana 的 Alert

推荐图文


随机推荐