众所周知,产品数据监控体系中,用户留存是一个非常重要的指标,细化到业务场景中,留存其实可以拆解出来多个类别:包括新增用户留存、活跃用户留存、付费用户留存、回流用户留存、新增付费用户留存、回流付费用户留存等等,以下主要针对活跃用户留存设计ClickHouse查询语句。
表结构
time | user_id |
---|---|
January 1, 2021, 08:33 AM | 112233abc |
January 1, 2021, 09:00 AM | 221133bac |
January 1, 2021, 09:20 AM | 113322acb |
January 2, 2021, 10:12 AM | 113322acb |
January 2, 2021, 11:01 AM | 221133bac |
January 3, 2021, 11:45 AM | 112233abc |
…… | …… |
查询需求:在表 User_login 中,按指定时间区间给出该区间内每日的活跃次留、三留、七留、十四留、三十留。
PS:涉及多重嵌套查询
select
uid,
base_date,
retention_date
from
-- 表a,首日登录的用户
(select
distinct
date(time) base_date,
user_id uid
from
User_login
where
-- 需要计算留存的日期区间
base_date between '2021-03-01' and '2021-03-31'
group by
uid, base_date) a
left join
-- 表b,表连接匹配后续登录的情况,计算留存
(select
distinct
date(time) retention_date,
user_id uid
from
User_login
where
-- 此处不应限制最大日期,否则可能导致后续留存率为0
retention_date >= '2021-03-01'
group by
uid, retention_date) b
on
a.uid = b.uid
where
-- 保证每条记录不会连接到在当前记录时间之前的数据,如昨天的登录
base_date <= retention_date
order by
uid, base_date
结果如下(demo数据):
select
uid,
base_date,
retention_date,
retention_date - base_date datediff
from
/* 上一步的查询
(select
uid,
base_date,
retention_date
from
-- 表a,首日登录的用户
(select
distinct
date(time) base_date,
user_id uid
from
User_login
where
-- 需要计算留存的日期区间
base_date between '2021-03-01' and '2021-03-31'
group by
uid, base_date) a
left join
-- 表b,表连接匹配后续登录的情况,计算留存
(select
distinct
date(time) retention_date,
user_id uid
from
User_login
where
-- 此处不应限制最大日期,否则可能导致后续留存率为0
retention_date >= '2021-03-01'
group by
uid, retention_date) b
on
a.uid = b.uid
where
-- 保证每条记录不会连接到在当前记录时间之前的数据,如昨天的登录
base_date <= retention_date
order by
uid, base_date) c
*/
order by
uid, base_date
结果如下(demo数据):
select
base_date,
sum(case when datediff = 0 then 1 else 0 end) day_0,
sum(case when datediff = 1 then 1 else 0 end) day_1,
sum(case when datediff = 2 then 1 else 0 end) day_2,
sum(case when datediff = 6 then 1 else 0 end) day_6,
sum(case when datediff = 13 then 1 else 0 end) day_13,
sum(case when datediff = 29 then 1 else 0 end) day_29
from
/* 上一步的查询
(select
uid,
base_date,
retention_date,
retention_date - base_date datediff
from
(select
uid,
base_date,
retention_date
from
-- 表a,首日登录的用户
(select
distinct
date(time) base_date,
user_id uid
from
User_login
where
-- 需要计算留存的日期区间
base_date between '2021-03-01' and '2021-03-31'
group by
uid, base_date) a
left join
-- 表b,表连接匹配后续登录的情况,计算留存
(select
distinct
date(time) retention_date,
user_id uid
from
User_login
where
-- 此处不应限制最大日期,否则可能导致后续留存率为0
retention_date >= '2021-03-01'
group by
uid, retention_date) b
on
a.uid = b.uid
where
-- 保证每条记录不会连接到在当前记录时间之前的数据,如昨天的登录
base_date <= retention_date
order by
uid, base_date) c
order by
uid, base_date) d
*/
group by base_date
order by base_date
结果如下(demo数据,一共查询出31天每天的各个留存情况,其中day_0是当日的登录):
select
base_date,
-- 保留4位小数,用于百分比展示
round(day_1/day_0, 4) retention_2,
round(day_2/day_0, 4) retention_3,
round(day_6/day_0, 4) retention_7,
round(day_13/day_0, 4) retention_14,
round(day_29/day_0, 4) retention_30
from
/* 上一步的查询
(select
base_date,
sum(case when datediff = 0 then 1 else 0 end) day_0,
sum(case when datediff = 1 then 1 else 0 end) day_1,
sum(case when datediff = 2 then 1 else 0 end) day_2,
sum(case when datediff = 6 then 1 else 0 end) day_6,
sum(case when datediff = 13 then 1 else 0 end) day_13,
sum(case when datediff = 29 then 1 else 0 end) day_29
from
(select
uid,
base_date,
retention_date,
retention_date - base_date datediff
from
(select
uid,
base_date,
retention_date
from
-- 表a,首日登录的用户
(select
distinct
date(time) base_date,
user_id uid
from
User_login
where
-- 需要计算留存的日期区间
base_date between '2021-03-01' and '2021-03-31'
group by
uid, base_date) a
left join
-- 表b,表连接匹配后续登录的情况,计算留存
(select
distinct
date(time) retention_date,
user_id uid
from
User_login
where
-- 此处不应限制最大日期,否则可能导致后续留存率为0
retention_date >= '2021-03-01'
group by
uid, retention_date) b
on
a.uid = b.uid
where
-- 保证每条记录不会连接到在当前记录时间之前的数据,如昨天的登录
base_date <= retention_date
order by
uid, base_date) c
order by
uid, base_date) d
group by base_date
order by base_date)
*/
结果如下(demo数据):
到这里所有活跃留存就查询完毕了,百分比的显示只需在metabase设置中调整一下即可,效果如下:
emmm…总感觉还是差点啥,是的没错!还有一环!!
这也是最后一步了,如果在编写查询语句时就把查询时间写死了,当业务方想在平台上查看留存时,还得专门找数据小伙伴来调整时间,这不仅给数据侧带来更多工作量,还降低了业务方工作效率;
如果不限制这个时间,则每次查询都要把整个数据表都过一遍,这就不仅影响到查询效率了,当数据量日渐增多,这查询语句也会给数据库带来越来越大的压力;
因此可以使用时间筛选交互框的方式,让业务方自己选择想要查看的日期区间对应的留存情况,操作如下:(在日期筛选时把条件设为 {{date}} 形式,再在设置里将交互框调整为日期类型即可)
select
base_date,
-- 保留4位小数,用于百分比展示
round(day_1/day_0, 4) retention_2,
round(day_2/day_0, 4) retention_3,
round(day_6/day_0, 4) retention_7,
round(day_13/day_0, 4) retention_14,
round(day_29/day_0, 4) retention_30
from
(select
base_date,
sum(case when datediff = 0 then 1 else 0 end) day_0,
sum(case when datediff = 1 then 1 else 0 end) day_1,
sum(case when datediff = 2 then 1 else 0 end) day_2,
sum(case when datediff = 6 then 1 else 0 end) day_6,
sum(case when datediff = 13 then 1 else 0 end) day_13,
sum(case when datediff = 29 then 1 else 0 end) day_29
from
(select
uid,
base_date,
retention_date,
retention_date - base_date datediff
from
(select
uid,
base_date,
retention_date
from
-- 表a,首日登录的用户
(select
distinct
date(time) base_date,
user_id uid
from
User_login
where
-- 需要计算留存的日期区间
-- 利用时间选择交互,手动选择时间区间后再运行
base_date between {{start_date}} and {{end_date}}
group by
uid, base_date) a
left join
-- 表b,表连接匹配后续登录的情况,计算留存
(select
distinct
date(time) retention_date,
user_id uid
from
User_login
where
-- 此处不应限制最大日期,否则可能导致后续留存率为0
retention_date >= {{start_date}}
group by
uid, retention_date) b
on
a.uid = b.uid
where
-- 保证每条记录不会连接到在当前记录时间之前的数据,如昨天的登录
base_date <= retention_date
order by
uid, base_date) c
order by
uid, base_date) d
group by base_date
order by base_date)
效果如下:
大
功
告
成
伪静态:即网站本身是动态网页如.php、.asp、.aspx等格式,而这类网页还带“”加...
了解类似的获取字符位置的方法: charAt() 获取字符串指定位置的字符 用法:strO...
前言 最近在阅读某开源框架源码的时候,发现作者在其中运用了很多 Action委托 和...
分享背景 由于一直在虚拟机的状态下开发PHP,尝试一下mac本地搭建环境.mac本身是...
HTML常用meta 文章目录 HTML常用meta Meta(元数据) 提示以下是本篇文章正文内容...
问题 如果打算为项目选择一款免费、开源的 数据库 ,那么你可能会在 MySQL 与Pos...
分析可能是: asp的一些保留字段导致,具体的大家可以参考这篇文章: ASP常见的...
1.如果你连追求自己喜欢的事物与人的勇气都没有,你注定是个失败者。 2.不要因...
注入获取 注入获取通过IConfiguration直接获取的方法官方文档里就有,可以直接看...
非ES6代码实现继承的主流方式主要可以分为: 构造继承、原型链继承、构造继承+原...