当前位置:主页 > 查看内容

Search 通过 Kibana—Elastic Stack 实战手册

发布时间:2021-06-03 00:00| 位朋友查看

简介:} 查询文档 ID 为 1 的文档是否存在。 只判断文档是否存在 ,Head 返回的信息更少、性能更高,满足特殊业务场景使用 HEAD /my_goods/_doc/1 返回 200 - OK Mutil get ES 同时支持批量查询,需要使用 _mget API,查询文档 ID 等于1和2的文档信息 GET /my_goods……
}

查询文档 ID 为 1 的文档是否存在。

只判断文档是否存在 ,Head 返回的信息更少、性能更高,满足特殊业务场景使用

HEAD /my_goods/_doc/1

返回

200 - OK
Mutil get

ES 同时支持批量查询,需要使用 _mget API,查询文档 ID 等于1和2的文档信息

GET /my_goods/_mget
 "docs": [
 "_id": "1"
 "_id": "2"
}

返回

{
 "docs" : [
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "1",
 "_version" : 7,
 "_seq_no" : 8,
 "_primary_term" : 1,
 "found" : true,
 "_source" : {
 "goodsName" : "苹果 51英寸 4K超高清",
 "skuCode" : "skuCode1",
 "brandName" : "苹果",
 "closeUserCode" : [
 "channelType" : "cloudPlatform",
 "shopCode" : "sc00001",
 "publicPrice" : "8188.88",
 "groupPrice" : null,
 "boxPrice" : null,
 "boostValue" : 1.8,
 "shopName" : "张三店铺"
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "2",
 "found" : false
}
Query DSL

查询索引包括全文本查询、组合查询、结构化查询等。

通常 Search 与 Filter 区别

二者的查询是有区别的:

Query 查询

用于解答文档是否存在,并且告知返回文档与查询条件的匹配度,返回 _score 评分供用户选择。

Filter 查询

只用于返回文档是否与查询匹配,但是不会告诉你匹配度,即不进行评分。在做聚合查询时,filter 经常发挥更大的作用。因为没有评分 Elasticsearch 的处理速度就会提高,提升了整体响应时间。同时 filter 可以缓存查询结果,而 Query 则不能缓存。

使用场景

如果涉及到全文检索以及评分相关业务使用 Query,其他场景推荐使用 Filter 查询。

组合查询Boolean 查询

Boolean 查询包含 must、filter、must_not。

must :必须匹配并且返回评分,filter 忽略评分,should 相当于数据库查询中的 or,针对 should 有一个特殊的情况,也就是所有的搜索只有 should ,那么必须满足 should 里的其中一个才会被搜索到。must_not 为不匹配,相当于不等于。

查询:店铺编码=sc00001 且渠道 channelType=cloudPlatform 且 publicPrice 价格区间不在 8288-8888 之间,或者品牌包含"果"。首先以下条件必须全部满足:

店铺编码=sc00001渠道 channelType=cloudPlatformpublicPrice 价格区间不在 8288-8888 之间

另外,由于还有 should 查询,满足品牌中包含“果”的也会被查询出来,另外匹配成功后的评分也会提高,相应的结果也会排在前面:

品牌包含"果"

2 者取并集的结果作为最终结果返回

POST /my_goods/_search
 "query": {
 "bool": {
 "must": {
 "term":{
 "shopCode":"sc00001"
 "filter": {
 "term": {
 "channelType": "cloudPlatform"
 "must_not": [
 "range": {
 "publicPrice": {
 "gte": 8288,
 "lte": 8888
 "should": [
 "term": {
 "brandName": {
 "value": "果"
 "minimum_should_match" : 1
}

minimum_should_match 为最小匹配数量,如果 bool 查询包含至少一个 should 子句,并且没有 must 或 filter 子句,则默认值为 1,否则,默认值为 0。举例说明:

POST /my_goods/_search
 "query": {
 "bool": {
 "should": [
 "term": {
 "brandName": {
 "value": "东"
 "term": {
 "brandName": {
 "value": "果"
 "minimum_should_match" : 1
}

以上查询表示 brandName 包含“东” 和 “果” 至少匹配成功一次,查询结果如下:

"hits" : [
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "4",
 "_score" : 1.5678144,
 "_source" : {
 "shopCode" : "sc00001",
 "brandName" : "山东苹果",
 "closeUserCode" : [
 "uc001",
 "uc002",
 "uc003"
 "skuCode_brandName" : "skuCode4山东苹果",
 "channelType" : "cloudPlatform",
 "publicPrice" : 16977.76,
 "goodsName_length" : 31,
 "groupPrice" : [
 "level" : "level1",
 "boxLevelPrice" : "2488.88"
 "level" : "level2",
 "boxLevelPrice" : "3488.88"
 "boxPrice" : [
 "boxType" : "box1",
 "boxUserCode" : [
 "uc004",
 "uc005",
 "uc006",
 "uc001"
 "boxPriceDetail" : 4488.88
 "boxType" : "box2",
 "boxUserCode" : [
 "htd007",
 "htd008",
 "htd009",
 "uc0010"
 "boxPriceDetail" : 5488.88
 "boostValue" : 1.2,
 "goodsName" : "山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清",
 "skuCode" : "skuCode4"
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "2",
 "_score" : 0.2792403,
 "_source" : {
 "shopCode" : "sc00002",
 "brandName" : "苹果",
 "closeUserCode" : [
 "skuCode_brandName" : "skuCode2苹果",
 "channelType" : "cloudPlatform",
 "publicPrice" : 12377.76,
 "goodsName_length" : 13,
 "groupPrice" : null,
 "boxPrice" : null,
 "boostValue" : 1.0,
 "goodsName" : "苹果 55英寸 3K超高清",
 "skuCode" : "skuCode2"
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "1",
 "_score" : 0.2792403,
 "_source" : {
 "shopCode" : "sc00001",
 "brandName" : "苹果",
 "closeUserCode" : [
 "skuCode_brandName" : "skuCode1苹果",
 "channelType" : "cloudPlatform",
 "publicPrice" : 32755.52,
 "goodsName_length" : 13,
 "groupPrice" : null,
 "boxPrice" : null,
 "boostValue" : 1.8,
 "goodsName" : "苹果 51英寸 4K超高清",
 "skuCode" : "skuCode1"
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "3",
 "_score" : 0.21222264,
 "_source" : {
 "shopCode" : "sc00001",
 "brandName" : "美国苹果",
 "closeUserCode" : [
 "skuCode_brandName" : "skuCode3美国苹果",
 "channelType" : "cloudPlatform",
 "publicPrice" : 16777.76,
 "goodsName_length" : 26,
 "groupPrice" : null,
 "boxPrice" : [
 "boxType" : "box1",
 "boxUserCode" : [
 "htd003",
 "uc004"
 "boxPriceDetail" : 4388.88
 "boxType" : "box2",
 "boxUserCode" : [
 "uc005",
 "uc0010"
 "boxPriceDetail" : 5388.88
 "boostValue" : 1.2,
 "goodsName" : "苹果UA55RU7520JXXZ 53英寸 4K高清",
 "skuCode" : "skuCode3"
]

当我们调整 minimum_should_match 为 2 时观察结果返回:

POST /my_goods/_search
 "query": {
 "bool": {
 "should": [
 "term": {
 "brandName": {
 "value": "东"
 "term": {
 "brandName": {
 "value": "果"
 "minimum_should_match" : 2
"hits" : [
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "4",
 "_score" : 1.5678144,
 "_source" : {
 "shopCode" : "sc00001",
 "brandName" : "山东苹果",
 "closeUserCode" : [
 "uc001",
 "uc002",
 "uc003"
 "skuCode_brandName" : "skuCode4山东苹果",
 "channelType" : "cloudPlatform",
 "publicPrice" : 16977.76,
 "goodsName_length" : 31,
 "groupPrice" : [
 "level" : "level1",
 "boxLevelPrice" : "2488.88"
 "level" : "level2",
 "boxLevelPrice" : "3488.88"
 "boxPrice" : [
 "boxType" : "box1",
 "boxUserCode" : [
 "uc004",
 "uc005",
 "uc006",
 "uc001"
 "boxPriceDetail" : 4488.88
 "boxType" : "box2",
 "boxUserCode" : [
 "htd007",
 "htd008",
 "htd009",
 "uc0010"
 "boxPriceDetail" : 5488.88
 "boostValue" : 1.2,
 "goodsName" : "山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清",
 "skuCode" : "skuCode4"
 ]

可以看到,只有 goodsName 出现 “东” 和 “果” 2 次以及 2 次以上的结果被查询到。

Boosting 查询

Boosting 用于控制评分相关度相关,可以提升评分也可以降低评分。

可以看到 2 条文档记录评分一致:"_score" : 1.3862942 ,

当我们修改 negative_boost: 0.2 时,此时返回(省略部分无关字段)

POST /my_goods/_search
 "query": {
 "boosting": {
 "positive": {
 "term": {
 "skuCode": {
 "value": "skuCode1"
 "negative": {
 "term": {
 "goodsName": {
 "value": "三星"
 "negative_boost": 0.2
"hits" : [
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "1",
 "_score" : 1.3862942,
 "_source" : {
 "goodsName" : "苹果 51英寸 4K超高清",
 "skuCode" : "skuCode1",
 "brandName" : "苹果",
 "closeUserCode" : [
 "channelType" : "cloudPlatform",
 "shopCode" : "sc00001",
 "publicPrice" : "8188.88",
 "groupPrice" : null,
 "boxPrice" : null,
 "boostValue" : 1.8,
 "shopName" : "张三店铺"
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "6",
 "_score" : 0.27725884,
 "_source" : {
 "goodsName" : "三星UA55RU7520JXXZ 51英寸 4K超高清",
 "skuCode" : "skuCode1",
 "brandName" : "三星",
 "closeUserCode" : [
 "channelType" : "cmccPlatform",
 "shopCode" : "sc00001",
 "publicPrice" : "8188.88",
 "groupPrice" : null,
 "boxPrice" : null,
 "boostValue" : 1.2
 ]

此时发现文档 ID=6 的评分下降到 _score : 0.27725884,因为在 negative 命中了查询条件,negative_boost 在 0 到 1 之间时,用于降低评分,相反,大于 1 用于提升评分。

Constant score query 查询

当查询不关心 TF(词频)时,就可以使用 constant score query 。

POST /my_goods/_search
 "query": {
 "constant_score": {
 "filter": {
 "term": {
 "goodsName": "苹果"
 "boost": 1.2
}

返回(省略部分无关字段)

{
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "3",
 "_score" : 1.2,
 "_source" : {
 "goodsName" : "苹果UA55RU7520JXXZ 53英寸 4K高清"
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "4",
 "_score" : 1.2,
 "_source" : {
 "goodsName" : "山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清"
}

可以看到,文档 ID =3 的评分和文档 ID =4 的评分一样,但是 ID=4 的匹配相关度更高,这是由于我们忽略了词频对打分的影响。

Disjunction max query 查询

Disjunction 查询也被理解为分离最大化查询,指的是将任何与任一查询匹配的文档,作为结果返回,但只将最佳匹配的评分,作为查询的评分结果返回。

例如查询商品名称和品牌名称中包含“苹果”的信息,当品牌的评分高于商品名称时,则返回品牌的评分做为总评分(忽略tie_breaker缓冲)。

GET /my_goods/_search
 "query": {
 "dis_max": {
 "tie_breaker": 0.7,
 "boost": 1.2,
 "queries": [
 "term": {
 "goodsName": {
 "value": "苹果"
 "term": {
 "brandName": {
 "value": "苹果"
}

返回结果(忽略无关字段)

"max_score" : 3.0150018,
 "hits" : [
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "1",
 "_score" : 3.0150018,
 "_source" : {
 "goodsName" : "苹果 51英寸 4K超高清",
 "brandName" : "苹果"
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "5",
 "_score" : 1.3465583,
 "_source" : {
 "goodsName" : "苹果UA55R苹果U7苹果520JXXZ 55英寸 5K超高清",
 "brandName" : "三星苹果"
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "4",
 "_score" : 1.2337791,
 "_source" : {
 "goodsName" : "山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清",
 "brandName" : "山东苹果"
 },

分析:

ID=1 的记录,由于品牌只包含“苹果” 2 字,Elasticsearch 认为这种匹配度更高,所以此条记录评分排在第一位。ID=5 的记录,由于品牌中和 ID =4 的记录都包含苹果且字数一样,此时就要看 goodsName 包含苹果的词频数量了,ID=5 的品牌中,“苹果”出现了 3 次,而 ID=4 的值出现了 2 次,所以评分没有 ID=5 的高,符合我们的预期结果。tie_breaker 字段做什么用呢?它是起到了缓冲的作用(取值范围:0 到 1 之间),Disjunction 查询会将匹配度最高的字段得分,做为整个文档的得分返回,这种情况其他字段就不起作用了,难免有点走极端。此时就需要 tie_breaker 来做缓存,提升其他字段的影响力,最终的结果:brandName 评分+ goodsName 评分 *tie_breaker,作为总评分返回。Function score query 查询

Function score 允许你控制查询评分,是用来控制评分过程的终极武器。最高效的用法是用过滤器对结果的子集应用不同的函数,同时运用了 filter 的缓存,并且达到控制评分的过程。

我们想让山东的苹果搜索出现在美国苹果之前,查询商品名称包含“苹果”,当品牌中包含“美国”时,权重设置为 2,当出现“山东”时,权重设置为 40 。

GET /my_goods/_search
 "query": {
 "function_score": {
 "query": {
 "term": {
 "goodsName": {
 "value": "苹果"
 "boost": 2, 
 "functions": [
 "filter": {
 "match":{
 "brandName":"美国"
 "random_score": {
 "weight": 2
 "filter": {
 "match":{
 "brandName":"山东"
 "weight": 40
 "max_boost": 60,
 "score_mode": "max",
 "boost_mode": "multiply",
 "min_score": 2
}

返回主要信息

 "max_score" : 2.2442641,
 "hits" : [
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "4",
 "_score" : 2.0562985,
 "_source" : {
 "goodsName" : "山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清",
 "brandName" : "山东苹果"
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "3",
 "_score" : 1.7582327,
 "_source" : {
 "goodsName" : "苹果UA55RU7520JXXZ 53英寸 4K高清",
 "brandName" : "美国苹果",
 ]

解释几个参数:

score_modemultiply: 默认,分数相乘avg:平均分数,第一个 function 的分数max:使用评分最大的分数min:使用评分最小的分数 avg

举例,如果 2 个函数返回的分数为 1 和 2,并且它们的权重分别为 3 和 4,则他们的评分为:(13+24)/(3+4)

其他详解请参考官方score-functions详解:

https://www.elastic.co/guide/en/elasticsearch/reference/7.10/query-dsl-function-score-query.html#score-functions

Full text 全文本查询Match 查询

Match 查询是一种标准的查询,示例如下

# 通过 highlight 对查询到的结果进行高亮显示
GET /my_goods/_search
 "query": {
 "match": {
 "goodsName": "苹果 高清 英寸"
 "highlight": {
 "fields": {
 "goodsName": {
 "pre_tags": [
 " strong "
 "post_tags": [
 " /strong "

Match 查询是一种 boolean 类型的查询,可以使用 operator 来控制 boolean 字句,operator 包含 and 和 or (默认为 or)。

GET /my_goods/_search
 "query": {
 "match": {
 "goodsName": {
 "query": "苹果 高清 英寸",
 "operator": "and"
#返回结果:
 "took" : 1,
 "timed_out" : false,
 "_shards" : {
 "total" : 1,
 "successful" : 1,
 "skipped" : 0,
 "failed" : 0
 "hits" : {
 "total" : {
 "value" : 0,
 "relation" : "eq"
 "max_score" : null,
 "hits" : [ ]
}

命中为 0,因为没有标题中包含 “苹果 高清 英寸” 词组的商品信息,这里的 and 是将查询条件做分词处理,然后查询结果时,必须全部包含 “苹果 高清 英寸” 分词词组才能被检索,下面再演示下 or 的例子:

GET /my_goods/_search
 "query": {
 "match": {
 "goodsName": {
 "query": "苹果 高清 英寸",
 "operator": "or"
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "4",
 "_score" : 1.836855,
 "_source" : {
 "shopCode" : "sc00001",
 "brandName" : "山东苹果",
 "closeUserCode" : [
 "uc001",
 "uc002",
 "uc003"
 "skuCode_brandName" : "skuCode4山东苹果",
 "channelType" : "cloudPlatform",
 "publicPrice" : 16977.76,
 "goodsName_length" : 31,
 "groupPrice" : [
 "level" : "level1",
 "boxLevelPrice" : "2488.88"
 "level" : "level2",
 "boxLevelPrice" : "3488.88"
 "boxPrice" : [
 "boxType" : "box1",
 "boxUserCode" : [
 "uc004",
 "uc005",
 "uc006",
 "uc001"
 "boxPriceDetail" : 4488.88
 "boxType" : "box2",
 "boxUserCode" : [
 "htd007",
 "htd008",
 "htd009",
 "uc0010"
 "boxPriceDetail" : 5488.88
 "boostValue" : 1.2,
 "goodsName" : "山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清",
 "skuCode" : "skuCode4"
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "10",
 "_score" : 0.9227071,
 "_source" : {
 "goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
 "skuCode" : "skuCode10",
 "brandName" : "三星",
 "closeUserCode" : [
 "uc0022"
 "channelType" : "cloudPlatform",
 "shopCode" : "sc00001",
 "publicPrice" : "8288.88",
 "groupPrice" : null,
 "boxPrice" : [
 "boxType" : "box1",
 "boxUserCode" : [
 "uc0022"
 "boxPriceDetail" : 4288.88
 "boostValue" : 1.8,
 "city" : "cloudPlatform1"
 }

可以看到,“三星 UA55RU7520JXXZ 52 英寸 4K 超高清” 由于包含 “高清” 所以能被查询到。

Match phrase query

用于匹配索引中是否存在所输入的查询条件数据

GET /my_goods/_search
 "query": {
 "match_phrase": {
 "goodsName": "apple"
}

比较 match_phrase 与 match 区别

match_phrase

将查询条件的中的信息看做一个整体,如下面的 “goods t” 必须 goods 在前 t 在后。

match

将查询中的条件做分词处理后,再去做查询。

#查询不到任何数据,因为不存在'goods t'的词组
GET /my_goods/_search
 "query": {
 "match_phrase": {
 "goodsName": "goods t"
#能查询到数据,因为文档中包含goods和t的词组
GET /my_goods/_search
 "query": {
 "match": {
 "goodsName": "goods t"
}

在 match_phrase 中,可以通过 slop 来控制单词中间的间隔,默认为 0,下面举例说明

GET /my_goods/_search
 "query": {
 "match_phrase": {
 "goodsName": {
 "query": "apple test",
 "slop": 1
 "took" : 10,
 "timed_out" : false,
 "_shards" : {
 "total" : 1,
 "successful" : 1,
 "skipped" : 0,
 "failed" : 0
 "hits" : {
 "total" : {
 "value" : 1,
 "relation" : "eq"
 "max_score" : 3.08089,
 "hits" : [
 "_index" : "my_goods",
 "_type" : "_doc",
 "_id" : "21",
 "_score" : 3.08089,
 "_source" : {
 "goodsName" : "apple goods test",
 "skuCode" : "skuCode3",
 "brandName" : "美国苹果",
 "closeUserCode" : [
 "channelType" : "cloudPlatform",
 "shopCode" : "sc00001",
 "publicPrice" : "8388.88",
 "groupPrice" : null,
 "boxPrice" : [
 "boxType" : "box1",
 "boxUserCode" : [
 "htd003",
 "uc004"
 "boxPriceDetail" : 4388.88
 "boxType" : "box2",
 "boxUserCode" : [
 "uc005",
 "uc0010"
 "boxPriceDetail" : 5388.88
 "boostValue" : 1.2

可以看到,我们设置了 1 个词条,apple 与 test 之间间隔 一个词条,故能查询到。

Match phrase prefix query

返回文档包含给定查询条件的文档,文档中必须包含给定条件的内容,且是按照 prefix 来进行匹配的,如 "apple goods test" ,商品名称包含 apple goods test 的数据将被查询到返回。

新增一条测试数据

POST my_goods/_bulk
{"index":{"_id":13}}
{"goodsName":"apple and goods product ","skuCode":"skuCode3","brandName":"美国苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8388.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd003","uc004"],"boxPriceDetail":4388.88},{"boxType":"box2","boxUserCode":["uc005","uc0010"],"boxPriceDetail":5388.88}],"boostValue":1.2}
{"index":{"_id":21}}
{"goodsName":"apple goods test","skuCode":"skuCode3","brandName":"美国苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8388.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd003","uc004"],"boxPriceDetail":4388.88},{"boxType":"box2","boxUserCode":["uc005","uc0010"],"boxPriceDetail":5388.88}],"boostValue":1.2}
#只返回goodsName : apple goods test的数据
GET /my_goods/_search
 "query": {
 "match_phrase_prefix": {
 "goodsName": "apple goods t"
}

总结比较 match 这四种查询

| Match | 返回匹配查询条件的文档内容,查询条件会在匹配之前会被分词处理。

 |
Match boolean prefix是一个 Boolean 查询,将分词后的短语按照 term 进行查询,最后一个词组按照 prefix 查询。Match phrase
 | 将查询条件当做一个词组进行查询,不进行分词处理。
 |

| Match phrase prefix
| 返回文档包含给定查询条件的文档,文档中必须包含给定条件的内容且是按照顺序的
,与 match phrase 类似,对最后一个 token 会进行前缀匹配,可以通过 slop 来控制匹配token的位置差。 |

Multi-match

多字段匹配,可以在多个字段中匹配查询相关信息,通过 type 参数可以调整结果集

#查询商品名称和品牌名称中包含苹果的文档信息
POST /my_goods/_search
 "query": {
 "multi_match": {
 "query": "苹果",
 "type": "best_fields", 
 "fields": ["goodsName","brandName"],
 "tie_breaker": 0.3
}

type 参数类型详解:

best_fields :默认,匹配 fields,将评分最高的分数做为整个查询的分数返回;most_fields:查询匹配的文档,并且返回各个字段的分数之和的平均值;cross_fields:跨字段匹配,匹配多个字段中是否包含查询词组,对每个字段分别进行打分,然后执行 max 运算获取打分最高的;phrase:以 match_phrase 方式运行查询,并返回最佳匹配的评分做为总评分;phrase_prefix:以 match_phrase_prefix 方式运行查询,并返回最佳匹配的评分做为总评分;bool_prefix:在每个字段上运行 match_bool_prefix 查询,并组合每个字段的评分,详情参考 bool_prefix 以 cross_fields 为例进行实战讲解。
#插入测试数据
PUT my_shop
 "settings": {
 "number_of_shards": 1,
 "number_of_replicas": 1
 "mappings": {
 "properties": {
 "firstName":{
 "type":"text"
 "lastName":{
 "type":"text"
POST my_shop/_bulk
{"index":{"_id":1}}
{"first_name":"Will","last_name":"Smith","age":25}
{"index":{"_id":2}}
{"first_name":"Smith","last_name":"hello","age":21}
{"index":{"_id":3}}
{"first_name":"Will","last_name":"hello","age":20}
#查询姓名为 Will Smith 的信息
GET /my_shop/_search
 "query": {
 "multi_match" : {
 "query": "Will Smith",
 "type": "cross_fields",
 "fields": [ "first_name^2", "last_name" ],
 "operator": "and"
 "max_score" : 1.9208363,
 "hits" : [
 "_index" : "my_shop",
 "_type" : "_doc",
 "_id" : "1",
 "_score" : 1.9208363,
 "_source" : {
 "first_name" : "Will",
 "last_name" : "Smith",
 "age" : 25
 ]

另外,first_name 提升了权重,默认为1。

Term - level 查询

可以使用 Term - level 查询结构化数据,结构化数据如日期范围、IP 地址、价格等,下面分别演示在业务场景中的实际使用。

Exists 查询

返回包含字段索引值的文档

#返回包含 goodsName 字段的索引文档
GET /my_goods/_search
 "query": {
 "exists": {
 "field": "goodsName"
}
Fuzzy 查询

返回包含与搜索字词相似的字词的文档,可以用于查询纠错功能。

Edit distance 指的是最小编辑距离,指的是两个字符串之间,由一个字符串转换为另外一个字符串,所需要的最少编辑次数,也叫:Levenshtein ,

参考地址:https://en.wikipedia.org/wiki/Levenshtein_distance

一些查询和 APIs 支持参数去做不精准查询操作,此时可以使用 fuzziness 参数

0、1、2 表示最大允许可编辑距离

AUTO 根据词项的长度确定可编辑距离数值,有两种可选参数,AUTO:[low] 和 [high],用于分别表示短距离参数与长距离参数,未指定情况下,默认值是 3 和 6

0..2 单词长度为 0 到 2个字母之间时,必须要精确匹配3..5 单词长度 3 到 5 个字母时,最大编辑距离为 15 单词长度大于 5 个字母时,最大编辑距离为 2
#以官网例子举例说明
POST /my_index/_bulk
{ "index": { "_id": 1 }}
{ "text": "Surprise me!"}
{ "index": { "_id": 2 }}
{ "text": "That was surprising."}
{ "index": { "_id": 3 }}
{ "text": "I wasn't surprised."}
GET /my_index/_search
 "query": {
 "fuzzy": {
 "text": {
 "value": "surprize",
 "prefix_length": 1
"hits" : [
 "_index" : "my_index",
 "_type" : "my_type",
 "_id" : "1",
 "_score" : 0.9559981,
 "_source" : {
 "text" : "Surprise me!"
 "_index" : "my_index",
 "_type" : "my_type",
 "_id" : "3",
 "_score" : 0.69983494,
 "_source" : {
 "text" : "I wasn't surprised."
 }

默认如果不设置,prefix_length 就是 0

surprising 未被搜索到,原因是默认的 auto 只允许两个编辑错误,因为 surprize 的长度大于 5,确切地说有三个编辑距离(需要有三次编辑),不能纠错。surprize 拼写错误,s- z,错误在一个位置,在 2 个位置的纠错范围之内为提高性能,可以设置 max_expansions,将限制产生模糊文档的个数;prefix_length 不宜设置过大,也将影响查询性能,同时错误过多,也将导致查询结果不是用户期望的。

fuziness 实际上采用的是 auto,允许有两个编辑距离,假设采用如下的查询,将只有一个结果

GET /my_index/_search
 "query": {
 "fuzzy": {
 "text": {
 "value": "surprize",
 "fuzziness": "1",
 "prefix_length": 1
 "took" : 19,
 "timed_out" : false,
 "_shards" : {
 "total" : 1,
 "successful" : 1,
 "skipped" : 0,
 "failed" : 0
 "hits" : {
 "total" : {
 "value" : 1,
 "relation" : "eq"
 "max_score" : 0.9559981,
 "hits" : [
 "_index" : "my_index",
 "_type" : "my_type",
 "_id" : "1",
 "_score" : 0.9559981,
 "_source" : {
 "text" : "Surprise me!"
Ids 查询

范围文档包含ID的文档信息

GET /my_goods/_search
 "query": {
 "ids" : {
 "values" : ["1", "4", "5"]
}
Prefix 查询

返回在提供的字段中包含特定前缀的文档

PUT my_shop_test
 "settings": {
 "number_of_shards": 1,
 "number_of_replicas": 1
 "mappings": {
 "properties": {
 "shopName":{
 "type":"text"
 "shopCode":{
 "type":"text"
#添加测试数据
POST my_shop_test/_bulk
{"index":{"_id":1}}
{"shopName":"box","shopCode":"Smith"}
{"index":{"_id":2}}
{"shopName":"black","shopCode":"jack"}
{"index":{"_id":3}}
{"shopName":"fox","shopCode":"act"}
{"index":{"_id":4}}
{"shopName":"booex","shopCode":"act"}
GET /my_shop_test/_search
 "query": {
 "prefix": {
 "shopName": {
 "value": "bo"
"hits" : [
 "_index" : "my_shop_test",
 "_type" : "_doc",
 "_id" : "1",
 "_score" : 1.0,
 "_source" : {
 "shopName" : "box",
 "shopCode" : "Smith"
 "_index" : "my_shop_test",
 "_type" : "_doc",
 "_id" : "4",
 "_score" : 1.0,
 "_source" : {
 "shopName" : "booex",
 "shopCode" : "act"
 ]
Range 查询

Range 查询类似数据库中的 大于、小于范围查询

GET my_goods/_search
 "query": {
 "range": {
 "publicPrice": {
 "gte": 2000,
 "lte": 8488
}
gt:大于gte:大于等于lt:小于lte:小于等于Regexp 查询

正则表达式查询,查询店铺编码以 's' 开头,中间包括任何字符,以及长度且以'1'结尾的数据

GET my_goods/_search
 "query": {
 "regexp": {
 "shopCode": {
 "value": "s.*1",
 "flags": "ALL",
 "case_insensitive": true,
 "max_determinized_states": 10000,
 "rewrite": "constant_score"
}
Term 查询
#返回确切的文档内容,避免对 text 字段类型使用 term
GET my_goods/_search
 "query": {
 "term": {
 "brandName": {
 "value": "三星",
 "boost": 1.0
}
Terms 查询

Terms 返回一个或多个包含精确查询条件的文档信息

GET /my_goods/_search
 "query": {
 "terms": {
 "brandName": [ "美国", "三星" ],
 "boost": 1.0
}
Terms_set 查询

返回最小精确匹配成功的文档信息,terms_set 类似 terms 查询,只不过 terms_se 多定义了返回最小匹配的数量。

#新定义商品信息
PUT /my_goods_info
 "mappings": {
 "properties": {
 "goodsName": {
 "type": "keyword"
 "sale_property": {
 "type": "keyword"
 "required_matches": {
 "type": "long"
#添加3条商品测试数据
#销售属性 白色、64G、标品
PUT /my_goods_info/_doc/1?refresh
 "name": "apple",
 "sale_property": [ "white", "64","standard" ],
 "required_matches": 2
#黑色、32G、非标品
PUT /my_goods_info/_doc/2?refresh
 "name": "apple",
 "sale_property": [ "black", "32","no standard" ],
 "required_matches": 2
#黑色、64 非标品
PUT /my_goods_info/_doc/3?refresh
 "name": "apple",
 "sale_property": [ "black", "64","no standard" ],
 "required_matches": 2
GET /my_goods_info/_search
 "query": {
 "terms_set": {
 "sale_property": {
 "terms": [ "white", "64"],
 "minimum_should_match_field": "required_matches"
"hits" : [
 "_index" : "my_goods_info",
 "_type" : "_doc",
 "_id" : "1",
 "_score" : 1.1149836,
 "_source" : {
 "name" : "apple",
 "sale_property" : [
 "white",
 "64",
 "standard"
 "required_matches" : 2
 ]
Wildcard 查询

返回包含与通配符模式匹配的术语的文档

GET /my_goods/_search
 "query": {
 "wildcard": {
 "shopCode": {
 "value": "sc*1",
 "boost": 1.0,
 "rewrite": "constant_score"
}
Geo 查询

Elasticsearch 支持两种 geo 数据:geo_point 经纬度 和 geo_shape 点、线、圆、多边形等复杂图形

Geo_point

用于查找距离另一个坐标范围内的所有坐标点,或者计算亮点之间的距离用于排序、打分、聚合等操作。

Geo-shapes

常用于过滤,比如判断两个地理形状是否有重叠或者某个地形是否包含了其他的地理形状

查询分为 4 种类型

geo_bounding_box:查找具有落入指定矩形的地理位置的坐标点geo_distance:查找地理点在中心点指定距离内的坐标点geo_polygon:查找具有指定多边形内的地理点的坐标点

geo_shape:查找具有以下内容的坐标点:

geo-shapes 与指定的几何形状相交,包含于其中或不与指定的几何形状相交的坐标点geo-points 与指定的地理形状相交的坐标点

过滤器将所有文档载入内存,然后每个过滤器执行计算,判断坐标点是否落在指定区域。可见坐标过滤器的代价较昂贵。

最优的做法是先用简单的过滤器尽可能多的过滤掉文档,然后再交给地理坐标过滤器来处理数据。

Geo-bounding box 查询

定义索引对象店铺信息

PUT /my_shop_info
 "mappings": {
 "properties": {
 "pin": {
 "properties": {
 "location": {
 "type": "geo_point"
#添加2条测试数据
PUT /my_shop_info/_doc/1
 "pin": {
 "location": {
 "lat": 40.12,
 "lon": -71.34
PUT /my_shop_info/_doc/2
 "pin": {
 "location": {
 "lat": 50.12,
 "lon": -61.34
#查询指定范围内的数据
GET my_shop_info/_search
 "query": {
 "bool": {
 "must": {
 "match_all": {}
 "filter": {
 "geo_bounding_box": {
 "pin.location": {
 "top_left": {
 "lat": 40.73,
 "lon": -74.1
 "bottom_right": {
 "lat": 40.01,
 "lon": -71.12
"hits" : {
 "total" : {
 "value" : 1,
 "relation" : "eq"
 "max_score" : 1.0,
 "hits" : [
 "_index" : "my_shop_info",
 "_type" : "_doc",
 "_id" : "1",
 "_score" : 1.0,
 "_source" : {
 "pin" : {
 "location" : {
 "lat" : 40.12,
 "lon" : -71.34
Geo-distance 查询

查询仅包含距某个地理点特定距离之内的匹配的坐标,如下所示,查询坐标

#仍然以 my_shop_info 为例
GET /my_shop_info/_search
 "query": {
 "bool": {
 "must": {
 "match_all": {}
 "filter": {
 "geo_distance": {
 "distance": "200km",
 "pin.location": {
 "lat": 40,
 "lon": -70
}
创作人简介:
李增胜,Elasticsearch 认证工程师、PMP 项目管理认证,现就职于汇通达网络股份有
公司,任产业交易平台交易域技术经理,从事微服务架构、搜索架构方向开发与管理
工作。技术关注:电商、产业互联网等领域。
博客:https://www.jianshu.com/u/59dceda66b57
本文转自网络,原文链接:https://developer.aliyun.com/article/784407
本站部分内容转载于网络,版权归原作者所有,转载之目的在于传播更多优秀技术内容,如有侵权请联系QQ/微信:153890879删除,谢谢!

推荐图文


随机推荐