SQL窗口分析函数
发布日期:2022-02-14 23:02:57
浏览次数:33
分类:技术文章
本文共 4586 字,大约阅读时间需要 15 分钟。
Reference
- Hive
- Oracle
- MySQL
概述
- 代码准备(hive)
-- ALTER TABLE order RENAME TO order_infoscreate table order_infos(name varchar(20), order_date date, cost int)insert into order_infosselect 'jack',cast('2015-01-01' as date),10union all select 'tony',cast('2015-01-02' as date),15union all select 'jack',cast('2015-02-03' as date),23union all select 'tony',cast('2015-01-04' as date),29union all select 'jack',cast('2015-01-05' as date),46union all select 'jack',cast('2015-04-06' as date),42union all select 'tony',cast('2015-01-07' as date),50union all select 'jack',cast('2015-01-08' as date),55union all select 'mart',cast('2015-04-08' as date),62union all select 'mart',cast('2015-04-09' as date),68union all select 'neil',cast('2015-05-10' as date),12union all select 'mart',cast('2015-04-11' as date),75union all select 'neil',cast('2015-06-12' as date),80union all select 'mart',cast('2015-04-13' as date),94
- 当同一个
select
查询中存在多个窗口函数时,他们相互之间是没有影响的.每个窗口函数应用自己的规则. - window子句
PRECEDING
:往前FOLLOWING
:往后CURRENT ROW
:当前行UNBOUNDED
:起点UNBOUNDED PRECEDING
:从前面的起点UNBOUNDED FOLLOWING
:到后面的终点
- 如果只使用
partition by
子句,未指定order by
的话,我们的聚合是分组内的聚合 - 使用了
order by
子句,未使用window
子句的情况下,默认从起点到当前行
select name,order_date,cost,sum(cost) over() as sample1, --在每一行后面增加 所有行相加sum(cost) over(partition by name) as sample2,--在每一行后面增加 按name分组,组内数据 相加sum(cost) over(partition by name order by order_date) as sample3,--在每一行后面增加 按name分组,组内数据 累加sum(cost) over(partition by name order by order_date rows between UNBOUNDED PRECEDING and current row ) as sample4 ,--和sample3一样,由起点到当前行的聚合sum(cost) over(partition by name order by order_date rows between 1 PRECEDING and current row) as sample5, --当前行和前面一行做聚合sum(cost) over(partition by name order by order_date rows between 1 PRECEDING AND 1 FOLLOWING ) as sample6,--当前行和前边一行及后面一行sum(cost) over(partition by name order by order_date rows between current row and UNBOUNDED FOLLOWING ) as sample7 --当前行及后面所有行from testtest.order_infos;
- 排序
row_number()
:从1开始,按照顺序,生成分组内记录的序列,row_number()
的值不会存在重复,当排序的值相同时,按照表中记录的顺序进行排列RANK()
: 生成数据项在分组中的排名,排名相等会在名次中留下空位 (排序相同时不连续)。RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn
DENSE_RANK()
: 生成数据项在分组中的排名,排名相等会在名次中不会留下空位(排序连续)
- LAG(向上,向前)和LEAD(向下,向后)函数:可以返回上下数据行的数据
select name,order_date,cost,lag(order_date,1,'1900-01-01') over(partition by name order by order_date ) as time1, -- 购买时间lag(order_date,2) over (partition by name order by order_date) as time2 -- 购买时间对应的上一次的购买时间from testtest.order_infos
- first_value取分组内排序后,截止到当前行,第一个值
- last_value取分组内排序后,截止到当前行,最后一个值
select name,order_date,cost,first_value(order_date) over(partition by name order by order_date) as time1,last_value(order_date) over(partition by name order by order_date) as time2from testtest.order_infos
- MySQL
- 说明:MySQL从8.0开始支持窗口函数
- 常见窗口函数
row_number()
rank()
dense_rank()
percent_rank()
:当前记录排序的(rank() - 1)
除以 (分组总记录数 - 1)的百分比cume_dist()
:计算当前记录排序rank()
的总记录数除以分组总记录数的百分比lag(col,N)
:按照分组排序,显示该记录的前N
个的col
值lead(col,N)
first_value(col)
:按照分组排序,显示排序第一的col
值last_value(col)
nth_value(col,N)
:按照分组排序,截止到当前记录第N
排序的col
值nfile(N)
:按照分组排序,将所有记录分为N
份
-- row_number()-- 8.0select row_number() over w as row_num,class_num,user_id,score from score where subject_name='Math' window w as ( partition by class_num order by score desc) ;-- 8.0之前select `row_number`,class_num,user_id,score from (select if(@class=class_num,@r:=@r+1,@r:=1) as `row_number`,@class:=class_num,class_num,user_id,score from score,(select @r:=0,@class=NULL) temp order by class_num,score desc) tmp2;-- dense_rank()-- 8.0select dense_rank() over w as `dense_rank`,class_num,user_id,score from score where subject_name='Math' window w as ( partition by class_num order by score desc) ;-- 8.0之前select `dense_rank`,class_num,user_id,score from (select if(@class=class_num,case when @s = score then @r when @s := score then @r := @r + 1 end,@r:=1) as `dense_rank`,@class:=class_num,@s:=score,class_num,user_id,score FROM score, ( SELECT @r := 0, @s = NULL,@class=NULL ) temp where subject_name = 'Math' order by class_num,score desc) temp2;-- rank()-- 8.0select rank() over w as `rank`,class_num,user_id,score from score where subject_name='Math' window w as ( order by score desc) ;-- 8.0之前select `rank`,class_num,user_id,score from (SELECT @r:=if(@s = score,@r,@c) AS `rank`,@s:=score,@c:=@c+1,class_num,user_id,score from score, ( select @r := 0, @s = NULL,@c:=1 ) r where subject_name = 'Math' order by score desc) temp;
转载地址:https://blog.csdn.net/fish2009122/article/details/113110312 如侵犯您的版权,请留言回复原文章的地址,我们会给您删除此文章,给您带来不便请您谅解!
发表评论
最新留言
关注你微信了!
[***.104.42.241]2024年04月07日 11时09分28秒
关于作者
喝酒易醉,品茶养心,人生如梦,品茶悟道,何以解忧?唯有杜康!
-- 愿君每日到此一游!
推荐文章
浅谈代码覆盖率
2019-04-27
Java代码覆盖率历史发展轨迹
2019-04-27
【防止重复下单】分布式系统接口幂等性实现方案
2019-04-27
一图秒懂开源许可证协议-GPL、BSD、MIT、Mozilla、Apache,LGPL
2019-04-27
websocket 项目启示录
2019-04-27
性能测试
2019-04-27
Java电商系统商品详情页存储方案设计
2019-04-27
Jacoco探针源码解析(0.8.5 版本)
2019-04-27
Java的Instrumentation类原理分析
2019-04-27
"org.jacoco.agent.rt" 在 maven 中找不到
2019-04-27
计算机中的dump到底是什么意思?
2019-04-27
JaCoCo探针策略原理及案例总结
2019-04-27
阿里三面:说说线程封闭与ThreadLocal的关系
2019-04-27
看完让你吊打面试官-@Autowired注解到底怎么实现的?
2019-04-27
MySQL的行锁、表锁、间隙锁详解
2019-04-27
和阿里面试官扯了半小时ArrayBlockingQueue源码
2019-04-27
远离996,PDMan开源免费的国产数据库建模工具!
2019-04-27
现代操作系统的存储器结构
2019-04-27
深度揭秘年薪60W的阿里P7简历制作过程!
2019-04-27
可能是全网最全的SpringBoot启动流程源码分析(基于 2.1.5 版本)
2019-04-27