SQL窗口分析函数
发布日期:2022-02-14 23:02:57 浏览次数:33 分类:技术文章

本文共 4586 字,大约阅读时间需要 15 分钟。

Reference

  • Hive
  • Oracle
  • MySQL

概述

  • 代码准备(hive)
-- ALTER TABLE order RENAME TO order_infoscreate table order_infos(name varchar(20), order_date date, cost int)insert into order_infosselect 'jack',cast('2015-01-01' as date),10union all                select 'tony',cast('2015-01-02' as date),15union all                select 'jack',cast('2015-02-03' as date),23union all                select 'tony',cast('2015-01-04' as date),29union all                select 'jack',cast('2015-01-05' as date),46union all                select 'jack',cast('2015-04-06' as date),42union all                select 'tony',cast('2015-01-07' as date),50union all                select 'jack',cast('2015-01-08' as date),55union all                select 'mart',cast('2015-04-08' as date),62union all                select 'mart',cast('2015-04-09' as date),68union all                select 'neil',cast('2015-05-10' as date),12union all                select 'mart',cast('2015-04-11' as date),75union all                select 'neil',cast('2015-06-12' as date),80union all                select 'mart',cast('2015-04-13' as date),94
  • 当同一个select查询中存在多个窗口函数时,他们相互之间是没有影响的.每个窗口函数应用自己的规则.
  • window子句
    • PRECEDING:往前
    • FOLLOWING:往后
    • CURRENT ROW:当前行
    • UNBOUNDED:起点
      • UNBOUNDED PRECEDING:从前面的起点
      • UNBOUNDED FOLLOWING:到后面的终点
  • 如果只使用partition by子句,未指定order by的话,我们的聚合是分组内的聚合
  • 使用了order by子句,未使用window子句的情况下,默认从起点到当前行
select name,order_date,cost,sum(cost) over() as sample1, --在每一行后面增加 所有行相加sum(cost) over(partition by name) as sample2,--在每一行后面增加 按name分组,组内数据 相加sum(cost) over(partition by name order by order_date) as sample3,--在每一行后面增加 按name分组,组内数据 累加sum(cost) over(partition by name order by order_date rows between UNBOUNDED PRECEDING and current row )  as sample4 ,--和sample3一样,由起点到当前行的聚合sum(cost) over(partition by name order by order_date rows between 1 PRECEDING   and current row) as sample5, --当前行和前面一行做聚合sum(cost) over(partition by name order by order_date rows between 1 PRECEDING   AND 1 FOLLOWING  ) as sample6,--当前行和前边一行及后面一行sum(cost) over(partition by name order by order_date rows between current row and UNBOUNDED FOLLOWING ) as sample7 --当前行及后面所有行from testtest.order_infos;
  • 排序
    • row_number():从1开始,按照顺序,生成分组内记录的序列,row_number()的值不会存在重复,当排序的值相同时,按照表中记录的顺序进行排列
    • RANK(): 生成数据项在分组中的排名,排名相等会在名次中留下空位 (排序相同时不连续)。RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn
    • DENSE_RANK(): 生成数据项在分组中的排名,排名相等会在名次中不会留下空位(排序连续)
  • LAG(向上,向前)和LEAD(向下,向后)函数:可以返回上下数据行的数据
select name,order_date,cost,lag(order_date,1,'1900-01-01') over(partition by name order by order_date ) as time1, -- 购买时间lag(order_date,2) over (partition by name order by order_date) as time2 -- 购买时间对应的上一次的购买时间from testtest.order_infos
  • first_value取分组内排序后,截止到当前行,第一个值
  • last_value取分组内排序后,截止到当前行,最后一个值
select name,order_date,cost,first_value(order_date) over(partition by name order by order_date) as time1,last_value(order_date) over(partition by name order by order_date) as time2from testtest.order_infos
  • MySQL
    • 说明:MySQL从8.0开始支持窗口函数
    • 常见窗口函数
      • row_number()
      • rank()
      • dense_rank()
      • percent_rank():当前记录排序的(rank() - 1) 除以 (分组总记录数 - 1)的百分比
      • cume_dist():计算当前记录排序rank()的总记录数除以分组总记录数的百分比
      • lag(col,N):按照分组排序,显示该记录的前N个的col
      • lead(col,N)
      • first_value(col):按照分组排序,显示排序第一的col
      • last_value(col)
      • nth_value(col,N):按照分组排序,截止到当前记录第N排序的col
      • nfile(N):按照分组排序,将所有记录分为N
-- row_number()-- 8.0select row_number() over w as row_num,class_num,user_id,score from score where subject_name='Math' window  w as ( partition by class_num order by score desc) ;-- 8.0之前select `row_number`,class_num,user_id,score from (select if(@class=class_num,@r:=@r+1,@r:=1) as `row_number`,@class:=class_num,class_num,user_id,score from score,(select @r:=0,@class=NULL) temp order by class_num,score desc) tmp2;-- dense_rank()-- 8.0select dense_rank() over w as `dense_rank`,class_num,user_id,score from score where subject_name='Math' window  w as ( partition by class_num order by score desc) ;-- 8.0之前select `dense_rank`,class_num,user_id,score from (select if(@class=class_num,case when  @s = score then @r when @s := score then @r := @r + 1 end,@r:=1) as `dense_rank`,@class:=class_num,@s:=score,class_num,user_id,score FROM score, ( SELECT @r := 0, @s = NULL,@class=NULL ) temp where subject_name = 'Math' order by  class_num,score desc) temp2;-- rank()-- 8.0select rank() over w as `rank`,class_num,user_id,score from score where subject_name='Math' window  w as ( order by score desc) ;-- 8.0之前select `rank`,class_num,user_id,score from (SELECT  @r:=if(@s = score,@r,@c)  AS `rank`,@s:=score,@c:=@c+1,class_num,user_id,score from score, ( select @r := 0, @s = NULL,@c:=1 ) r where subject_name = 'Math' order by score desc) temp;

转载地址:https://blog.csdn.net/fish2009122/article/details/113110312 如侵犯您的版权,请留言回复原文章的地址,我们会给您删除此文章,给您带来不便请您谅解!

上一篇:建模之常见优化器(Keras)
下一篇:docker-compose资源限制&docker资源监控

发表评论

最新留言

关注你微信了!
[***.104.42.241]2024年04月07日 11时09分28秒