厦门专业网站设计微信卖货小程序怎么做
5个小案例说清楚-窗口函数(开窗函数)
结论先行:
首先,写开窗函数一定要先理解开窗的定义和为什么开窗,开窗分几种场景;
写sql,需要心平气和,一步一步进行推导,这样可以找到写sql的逻辑,最终达到万变不离其宗。
over(): 就是为每条数据都开启一个窗口. 窗口的大小默认为当前数据集的大小.
over(partition by…): 会按照分区字段将数据分到不同的区中.
会为每个分区中的每条数据都开启一个窗口.
窗口的大小默认为当前分区数据集的大小
over(order by …): 会为每条数据都开启一个窗口,窗口的大小默认为起点到当前行
over(rows between … and …) :划分窗口大小
over(partition by … order by … ) :
会按照分区字段将数据分到不同的区中.
会为每个分区中的每条数据都开启一个窗口.
窗口的大小默认为当前分区数据集中起点到当前行
1.相关函数说明
over():指定分析函数工作的数据窗口大小,也就是可操作的数据集大小,这个数据窗口大小可能会随着行的改变而变化;
current row:当前行
n preceding:往前n行数据
n following:往后n行数据
unbounded:起点
unbounded preceding:表示到前面的起点
unbounded following:表示到后面的重点
lag(col, n, default_val):往前第n行数据
lead(col, n, default_val):往后第n行数据
ntile(n):把有序窗口的行分发到指定数据的组中,各个组有编号,编号从1开始,对于每一行,ntile返回此行所属的组的编号。注意:n必须为int类型
2.准备表和数据:
jack,2017-01-01,10
tony,2017-01-02,15
jack,2017-02-03,23
tony,2017-01-04,29
jack,2017-01-05,46
jack,2017-04-06,42
tony,2017-01-07,50
jack,2017-01-08,55
mart,2017-04-08,62
mart,2017-04-09,68
neil,2017-05-10,12
mart,2017-04-11,75
neil,2017-06-12,80
mart,2017-04-13,94
--创建表
create table business(
name string,
orderdate string,
cost int
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
--导入数据
load data local inpath "/opt/module/hive-3.1.2/datas/business.txt" into table business;
需求一:
查询在2017年4月份购买过的顾客及总人数
分步分析得出:
a)查询2017年4月份的数据
b)查询购买过的顾客名称
c)求总人数这里用到了开窗函数over()表示在所有的当前数据集中。
注意:要一步一步分析,切记上来囫囵吞枣,没有搞清楚需求,就开始写
--a)查询2017年4月份的数据
selectname,orderdate,cost
frombusiness
where orderdate like '2017-04%'; =>t1
+-------+-------------+-------+
| name | orderdate | cost |
+-------+-------------+-------+
| jack | 2017-04-06 | 42 |
| mart | 2017-04-08 | 62 |
| mart | 2017-04-09 | 68 |
| mart | 2017-04-11 | 75 |
| mart | 2017-04-13 | 94 |
+-------+-------------+-------+
-- b)查询购买过的顾客名称
selectdistinct t1.name
fromt1 ; => t2
-- c)求总人数
selectt2.name,count(t2.name) over() total_num
fromt2 ;
-- d)组装
selectt2.name,count(t2.name) over() total_num
from(
selectdistinct t1.name
from(
selectname,orderdate,cost
frombusiness
where orderdate like '2017-04%')t1 )t2 ; --结果可得
+----------+------------+
| t2.name | total_num |
+----------+------------+
| mart | 2 |
| jack | 2 |
+----------+------------+
需求二:
查询顾客的购买明细及月购买总额
分析:
每行的窗口数据为该月份本人的购买次数
所以需要按照月份进行分区
需要用到 over(partition by monthNum)此题可以扩展成两个题;
扩展1
查询顾客的购买明细及所有顾客月购买总额
selectname,orderdate,cost,sum(cost) over(partition by month(orderdate)) total_cost
frombusiness ;
--结果可得
+-------+-------------+-------+-------------+
| name | orderdate | cost | total_cost |
+-------+-------------+-------+-------------+
| jack | 2017-01-01 | 10 | 205 |
| jack | 2017-01-08 | 55 | 205 |
| tony | 2017-01-07 | 50 | 205 |
| jack | 2017-01-05 | 46 | 205 |
| tony | 2017-01-04 | 29 | 205 |
| tony | 2017-01-02 | 15 | 205 |
| jack | 2017-02-03 | 23 | 23 |
| mart | 2017-04-13 | 94 | 341 |
| jack | 2017-04-06 | 42 | 341 |
| mart | 2017-04-11 | 75 | 341 |
| mart | 2017-04-09 | 68 | 341 |
| mart | 2017-04-08 | 62 | 341 |
| neil | 2017-05-10 | 12 | 12 |
| neil | 2017-06-12 | 80 | 80 |
+-------+-------------+-------+-------------+
扩展1
查询顾客的购买明细及每个顾客的月购买总额
selectname,orderdate,cost,sum(cost) over(partition by name, month(orderdate)) total_cost
frombusiness ;
--查询结果
+-------+-------------+-------+-------------+
| name | orderdate | cost | total_cost |
+-------+-------------+-------+-------------+
| jack | 2017-01-05 | 46 | 111 |
| jack | 2017-01-08 | 55 | 111 |
| jack | 2017-01-01 | 10 | 111 |
| jack | 2017-02-03 | 23 | 23 |
| jack | 2017-04-06 | 42 | 42 |
| mart | 2017-04-13 | 94 | 299 |
| mart | 2017-04-11 | 75 | 299 |
| mart | 2017-04-09 | 68 | 299 |
| mart | 2017-04-08 | 62 | 299 |
| neil | 2017-05-10 | 12 | 12 |
| neil | 2017-06-12 | 80 | 80 |
| tony | 2017-01-04 | 29 | 94 |
| tony | 2017-01-02 | 15 | 94 |
| tony | 2017-01-07 | 50 | 94 |
+-------+-------------+-------+-------------+
14 rows selected (20.778 seconds)
需求三:
求每个顾客的购买明细及将每个顾客的cost按照日期进行累加分析:窗口函数用到了order by 和 rows between unbounded preceding and current row
selectname,orderdate,cost,sum(cost) over(order by orderdate rows between unbounded preceding and current row)
frombusiness ;
-- 查询结果
+-------+-------------+-------+---------------+
| name | orderdate | cost | sum_window_0 |
+-------+-------------+-------+---------------+
| jack | 2017-01-01 | 10 | 10 |
| tony | 2017-01-02 | 15 | 25 |
| tony | 2017-01-04 | 29 | 54 |
| jack | 2017-01-05 | 46 | 100 |
| tony | 2017-01-07 | 50 | 150 |
| jack | 2017-01-08 | 55 | 205 |
| jack | 2017-02-03 | 23 | 228 |
| jack | 2017-04-06 | 42 | 270 |
| mart | 2017-04-08 | 62 | 332 |
| mart | 2017-04-09 | 68 | 400 |
| mart | 2017-04-11 | 75 | 475 |
| mart | 2017-04-13 | 94 | 569 |
| neil | 2017-05-10 | 12 | 581 |
| neil | 2017-06-12 | 80 | 661 |
+-------+-------------+-------+---------------+
扩展需求:
求每个顾客的购买明细及
起点到当前行的累加
上一行到当前行的累加
当前行到下一行的累加
上一行到下一行的累加
当前行到终点的累加
selectname,orderdate,cost,sum(cost) over(order by orderdate rows between unbounded preceding and current row) up_c,sum(cost) over(order by orderdate rows between 1 preceding and current row) 1p_c,sum(cost) over(order by orderdate rows between current row and 1 following) c_1f,sum(cost) over(order by orderdate rows between 1 preceding and 1 following) 1p_1f,sum(cost) over(order by orderdate rows between current row and unbounded following) c_uf
frombusiness;
--执行结果
+-------+-------------+-------+-------+-------+-------+--------+-------+
| name | orderdate | cost | up_c | 1p_c | c_1f | 1p_1f | c_uf |
+-------+-------------+-------+-------+-------+-------+--------+-------+
| jack | 2017-01-01 | 10 | 10 | 10 | 25 | 25 | 661 |
| tony | 2017-01-02 | 15 | 25 | 25 | 44 | 54 | 651 |
| tony | 2017-01-04 | 29 | 54 | 44 | 75 | 90 | 636 |
| jack | 2017-01-05 | 46 | 100 | 75 | 96 | 125 | 607 |
| tony | 2017-01-07 | 50 | 150 | 96 | 105 | 151 | 561 |
| jack | 2017-01-08 | 55 | 205 | 105 | 78 | 128 | 511 |
| jack | 2017-02-03 | 23 | 228 | 78 | 65 | 120 | 456 |
| jack | 2017-04-06 | 42 | 270 | 65 | 104 | 127 | 433 |
| mart | 2017-04-08 | 62 | 332 | 104 | 130 | 172 | 391 |
| mart | 2017-04-09 | 68 | 400 | 130 | 143 | 205 | 329 |
| mart | 2017-04-11 | 75 | 475 | 143 | 169 | 237 | 261 |
| mart | 2017-04-13 | 94 | 569 | 169 | 106 | 181 | 186 |
| neil | 2017-05-10 | 12 | 581 | 106 | 92 | 186 | 92 |
| neil | 2017-06-12 | 80 | 661 | 92 | 80 | 92 | 80 |
+-------+-------------+-------+-------+-------+-------+--------+-------+
14 rows selected (17.403 seconds)
需求四:
查询每个顾客上次 和 下次 的购买时间
分析:
lag(col, n, default_val):往前第n行数据
lead(col, n, default_val):往后第n行数据
selectname,orderdate,cost,lag(orderdate, 1, '1977-01-01') over(partition by name order by orderdate) pre_ord,lead(orderdate, 1, '9999-01-01') over(partition by name order by orderdate) nex_ord
frombusiness;
--结果
+-------+-------------+-------+-------------+-------------+
| name | orderdate | cost | pre_ord | nex_ord |
+-------+-------------+-------+-------------+-------------+
| jack | 2017-01-01 | 10 | 1977-01-01 | 2017-01-05 |
| jack | 2017-01-05 | 46 | 2017-01-01 | 2017-01-08 |
| jack | 2017-01-08 | 55 | 2017-01-05 | 2017-02-03 |
| jack | 2017-02-03 | 23 | 2017-01-08 | 2017-04-06 |
| jack | 2017-04-06 | 42 | 2017-02-03 | 9999-01-01 |
| mart | 2017-04-08 | 62 | 1977-01-01 | 2017-04-09 |
| mart | 2017-04-09 | 68 | 2017-04-08 | 2017-04-11 |
| mart | 2017-04-11 | 75 | 2017-04-09 | 2017-04-13 |
| mart | 2017-04-13 | 94 | 2017-04-11 | 9999-01-01 |
| neil | 2017-05-10 | 12 | 1977-01-01 | 2017-06-12 |
| neil | 2017-06-12 | 80 | 2017-05-10 | 9999-01-01 |
| tony | 2017-01-02 | 15 | 1977-01-01 | 2017-01-04 |
| tony | 2017-01-04 | 29 | 2017-01-02 | 2017-01-07 |
| tony | 2017-01-07 | 50 | 2017-01-04 | 9999-01-01 |
+-------+-------------+-------+-------------+-------------+
需求五:
需求五: 查询前20%时间的订单信息分析:ntile(n):把有序窗口的行分发到指定数据的组中,各个组有编号,编号从1开始,对于每一行,ntile返回此行所属的组的编号。注意:n必须为int类型
--将数据分成5组
selectname,orderdate,cost,ntile(5) over(order by orderdate) gid
frombusiness ; =>t1
--取第一组的数据
selectt1.name,t1.orderdate,t1.cost
fromt1
where t1.gid = 1;
--组合
selectt1.name,t1.orderdate,t1.cost
from(
selectname,orderdate,cost,ntile(5) over(order by orderdate) gid
frombusiness )t1 where t1.gid = 1 ;
-- 结果
+----------+---------------+----------+
| t1.name | t1.orderdate | t1.cost |
+----------+---------------+----------+
| jack | 2017-01-01 | 10 |
| tony | 2017-01-02 | 15 |
| tony | 2017-01-04 | 29 |
+----------+---------------+----------+