W3Cschool
恭喜您成為首批注冊(cè)用戶
獲得88經(jīng)驗(yàn)值獎(jiǎng)勵(lì)
分析函數(shù)(某些數(shù)據(jù)庫下也叫做窗口函數(shù))與聚合函數(shù)類似,計(jì)算總是基于一組行的集合,不同的是,聚合函數(shù)一組只能返回一行,而分析函數(shù)每組可以返回多行,組內(nèi)每一行都是基于窗口的邏輯計(jì)算的結(jié)果。分析函數(shù)可以顯著優(yōu)化需要 self-join 的查詢。
“窗口”也稱為 FRAME,OceanBase 數(shù)據(jù)庫同時(shí)支持 ROWS 與 RANGE 兩種 FRAME 語義,前者是基于物理行偏移的窗口,后者則是基于邏輯值偏移的窗口。
分析函數(shù)語法如下:
analytic_function:
analytic_function([ arguments ]) OVER (analytic_clause)
analytic_clause:
[ query_partition_clause ] [ order_by_clause [ windowing_clause ] ]
query_partition_clause:
PARTITION BY { expr[, expr ]... | ( expr[, expr ]... ) }
order_by_clause:
ORDER [ SIBLINGS ] BY{ expr | position | c_alias } [ ASC | DESC ] [ NULLS FIRST | NULLS LAST ] [, { expr | position | c_alias } [ ASC | DESC ][ NULLS FIRST | NULLS LAST ]]...
windowing_clause:
{ ROWS | RANGE } { BETWEEN { UNBOUNDED PRECEDING | CURRENT ROW | value_expr {
PRECEDING | FOLLOWING } } AND{ UNBOUNDED FOLLOWING | CURRENT ROW | value_expr {
PRECEDING | FOLLOWING } } | { UNBOUNDED PRECEDING | CURRENT ROW| value_expr
PRECEDING}}
聲明
SUM 的語法為:SUM([ DISTINCT | ALL ] expr) [ OVER (analytic_clause) ]
MIN 的語法為:MIN([ DISTINCT | ALL ] expr) [ OVER (analytic_clause) ]
MAX 的語法為:MAX([ DISTINCT | ALL ] expr) [ OVER (analytic_clause) ]
COUNT 的語法為:COUNT({ * | [ DISTINCT | ALL ] expr }) [ OVER (analytic_clause) ]
AVG 的語法為:AVG([ DISTINCT | ALL ] expr) [ OVER(analytic_clause) ]
說明
以上分析函數(shù)都有對(duì)應(yīng)的聚合函數(shù),其中,SUM
返回 expr
的和,MIN
/MAX
返回 expr
的最小值/最大值,COUNT
返回窗口中查詢的行數(shù),AVG
返回 expr
的平均值。
對(duì)于 COUNT
函數(shù),如果指定了 expr
,即返回 expr
不為 NULL 的統(tǒng)計(jì)個(gè)數(shù),如果指定 COUNT(*)
返回所有行的統(tǒng)計(jì)數(shù)目。
例子
obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.17 sec)
obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.03 sec)
obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.00 sec)
obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.01 sec)
obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)
obclient>select last_name, sum(salary) over(partition by job_id) totol_s, min(salary) over(partition by job_id) min_s, max(salary) over(partition by job_id) max_s, count(*) over(partition by job_id) count_s from exployees;
+-----------+---------+-------+-------+---------+
| last_name | totol_s | min_s | max_s | count_s |
+-----------+---------+-------+-------+---------+
| jim | 2000 | 2000 | 2000 | 1 |
| mike | 36000 | 11000 | 13000 | 3 |
| lily | 36000 | 11000 | 13000 | 3 |
| tom | 36000 | 11000 | 13000 | 3 |
+-----------+---------+-------+-------+---------+
4 rows in set (0.01 sec)
聲明
NTH_VALUE 的語法為:NTH_VALUE (measure_expr, n) [ FROM { FIRST | LAST } ] [ { RESPECT | IGNORE } NULLS ] OVER (analytic_clause)
FIRST_VALUE 的語法為:FIRST_VALUE { (expr) [ {RESPECT | IGNORE} NULLS ] | (expr [ {RESPECT | IGNORE} NULLS ])} OVER (analytic_clause)
LAST_VALUE 的語法為:LAST_VALUE { (expr) [ {RESPECT | IGNORE} NULLS ] | (expr [ {RESPECT | IGNORE} NULLS ])} OVER (analytic_clause)
說明
NTH_VALUE 函數(shù)表示第幾個(gè)值,方向由 [ FROM { FIRST | LAST } ]
確定,默認(rèn)為 FROM FIRST
,含有是否忽略 NULL 值的標(biāo)志。其窗口為統(tǒng)一的 analytic_clause
。這里 n
應(yīng)該是正數(shù),如果 n
是 NULL,函數(shù)將返回錯(cuò)誤;如果 n
大于窗口內(nèi)所有的行數(shù),此函數(shù)將返回 NULL。
FIRST_VALUE 和 LAST_VALUE 表示從第一個(gè)開始計(jì)數(shù)或者是從最后一個(gè)開始計(jì)數(shù)。
例子
obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.08 sec)
obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)
obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.00 sec)
obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)
obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.01 sec)
obclient> select last_name, first_value(salary) over(partition by job_id) totol_s, last_value(salary) over(partition by job_id) min_s, max(salary) over(partition by job_id) max_s from exployees;
+-----------+---------+-------+-------+
| last_name | totol_s | min_s | max_s |
+-----------+---------+-------+-------+
| jim | 2000 | 2000 | 2000 |
| mike | 12000 | 11000 | 13000 |
| lily | 12000 | 11000 | 13000 |
| tom | 12000 | 11000 | 13000 |
+-----------+---------+-------+-------+
4 rows in set (0.01 sec)
聲明
LEAD 的語法為:LEAD { ( value_expr [, offset [, default]]) [ { RESPECT | IGNORE } NULLS ] | ( value_expr [ { RESPECT | IGNORE } NULLS ] [, offset [, default]] )} OVER ([ query_partition_clause ] order_by_clause)
LAG 的語法為:LAG { ( value_expr [, offset [, default]]) [ { RESPECT | IGNORE } NULLS ] | ( value_expr [ { RESPECT | IGNORE } NULLS ] [, offset [, default]] )} OVER ([ query_partition_clause ] order_by_clause)
說明
LEAD 和 LAG 含義為可以在一次查詢中取出當(dāng)前行的同一個(gè)字段的前面或后面第 N 行的數(shù)據(jù),這種操作可以使用相同表的自連接來實(shí)現(xiàn),但 LEAD/LAG 窗口函數(shù)有更高的效率。
其中,value_expr
是要做比對(duì)的字段,offset
是 value_expr
的偏移量,default
參數(shù)的默認(rèn)值為 NULL,即如果在 LEAD/LAG 沒有顯示的設(shè)置 default
值的情況下,返回值為 NULL。例如:對(duì) LAG 來說,當(dāng)前行為 4,offset
值為 6,這時(shí)候所要找的數(shù)據(jù)就是第 -2 行,不存在此行即返回 default
的值。
[ { RESPECT | IGNORE } NULLS ]
的語法為是否考慮 NULL 值,默認(rèn)為 RESPECT
,考慮 NULL 值。
注意 LEAD/LAG 兩個(gè)函數(shù)后必須有 order_by_clause
,數(shù)據(jù)應(yīng)該在一個(gè)列上排序之后才能有前多少行后多少行的概念。query_partition_clause
是可選的,如果沒有 query_partition_clause
,就是全局的數(shù)據(jù)。
例子
obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.08 sec)
obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)
obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.01 sec)
obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)
obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)
obclient> select last_name, lead(salary) over(order by salary) lead, lag(salary) over(order by salary) lag from exployees;
+-----------+-------+-------+
| last_name | lead | lag |
+-----------+-------+-------+
| jim | 11000 | NULL |
| tom | 12000 | 2000 |
| mike | 13000 | 11000 |
| lily | NULL | 12000 |
+-----------+-------+-------+
4 rows in set (0.01 sec)
聲明
VARIANCE 的語法為:VARIANCE([ DISTINCT | ALL ] expr) [ OVER (analytic_clause) ]
STDDEV 的語法為:STDDEV([ DISTINCT | ALL ] expr) [ OVER (analytic_clause) ]
STDDEV_SAMP 的語法為:STDDEV_SAMP(expr) [ OVER (analytic_clause) ]
STDDEV_POP 的語法為:STDDEV_POP(expr) [ OVER (analytic_clause) ]
說明
VARIANCE 返回的是 expr
的方差,expr
可能是數(shù)值類型或者可以轉(zhuǎn)換成數(shù)值類型的類型,方差的類型和輸入的值的類型相同。
STDDEV 返回的是 expr
的標(biāo)準(zhǔn)差,參數(shù)類型方面和 VARIANCE 的相同。
STDDEV_SAMP 返回的是樣本標(biāo)準(zhǔn)差。
STDDEV_POP 返回的是總體標(biāo)準(zhǔn)差。
例子
obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.08 sec)
obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)
obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.01 sec)
obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)
obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)
obclient> select last_name, stddev(salary) over(order by salary) std, variance(salary) over(order by salary) var, stddev_pop(salary) over() std_pop, stddev_samp(salary) over() from exployees;
+-----------+-------------------+--------------------+-------------------+----------------------------+
| last_name | std | var | std_pop | stddev_samp(salary) over() |
+-----------+-------------------+--------------------+-------------------+----------------------------+
| jim | 0 | 0 | 4387.482193696061 | 5066.228051190222 |
| tom | 4500 | 20250000 | 4387.482193696061 | 5066.228051190222 |
| mike | 4496.912521077347 | 20222222.222222224 | 4387.482193696061 | 5066.228051190222 |
| lily | 4387.482193696061 | 19250000 | 4387.482193696061 | 5066.228051190222 |
+-----------+-------------------+--------------------+-------------------+----------------------------+
4 rows in set (0.00 sec)
聲明
NTILE(expr) OVER ([ query_partition_clause ] order_by_clause)
說明
NTILE 函數(shù)將分區(qū)中已經(jīng)排序的行劃分為大小盡可能相同的指定數(shù)量的分組,并返回給每行組號(hào)。expr
如果是 NULL,則返回 NULL。
例子
obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.08 sec)
obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)
obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.01 sec)
obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)
obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)
obclient> select last_name, ntile(10) over(partition by job_id order by salary) ntl from exployees;
+-----------+------+
| last_name | ntl |
+-----------+------+
| jim | 1 |
| tom | 1 |
| mike | 2 |
| lily | 3 |
+-----------+------+
4 rows in set (0.01 sec)
聲明
ROW_NUMBER( ) OVER ([ query_partition_clause ] order_by_clause)
說明
ROW_NUMBER 函數(shù)按照 order_by_clause
子句中指定的行的順序,為每一行分配一個(gè)編號(hào)。
例子
obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.08 sec)
obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)
obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.01 sec)
obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)
obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)
obclient> select last_name, row_number() over(partition by job_id order by salary) ntl from exployees;
+-----------+------+
| last_name | ntl |
+-----------+------+
| jim | 1 |
| tom | 1 |
| mike | 2 |
| lily | 3 |
+-----------+------+
4 rows in set (0.00 sec)
聲明
RANK 的語法為:RANK( ) OVER ([ query_partition_clause ] order_by_clause)
DENSE_RANK 的語法為:DENSE_RANK( ) OVER([ query_partition_clause ] order_by_clause)
PERCENT_RANK 的語法為:PERCENT_RANK( ) OVER ([ query_partition_clause ] order_by_clause)
說明
RANK 計(jì)算每一行數(shù)據(jù)在某列上的排序,該列由 order_by_clause
中的列決定。例如,按照 salary 排序可以看出員工的收入排名。
DENSE_RANK 的語義基本和 RANK 函數(shù)相同,但是 RANK 的排序中間會(huì)有‘跳過’,但是 DENSE_RANK 中不會(huì)有。
PERCENT_RANK 的語義基本和 RANK 函數(shù)相同,但是 PERCENT_RANK 排序的結(jié)果是百分比,計(jì)算的是給定行的百分比。
例子
obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.10 sec)
obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)
obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.00 sec)
obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)
obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)
obclient> select last_name, rank() over(partition by job_id order by salary) rank, dense_rank() over(partition by job_id order by salary) dense_rank, percent_rank() over(partition by job_id order by salary) percent_rank from exployees;
+-----------+------+------------+----------------------------------+
| last_name | rank | dense_rank | percent_rank |
+-----------+------+------------+----------------------------------+
| jim | 1 | 1 | 0.000000000000000000000000000000 |
| tom | 1 | 1 | 0.000000000000000000000000000000 |
| mike | 2 | 2 | 0.500000000000000000000000000000 |
| lily | 3 | 3 | 1.000000000000000000000000000000 |
+-----------+------+------------+----------------------------------+
4 rows in set (0.01 sec)
聲明
CUME_DIST() OVER ([ query_partition_clause ] order_by_clause)
說明
該函數(shù)計(jì)算一個(gè)值的分布,返回值為大于 0 小于等于 1 的值。作為一個(gè)分析函數(shù),CUME_DIST 在升序情況下計(jì)算比當(dāng)前行的特定列小的數(shù)據(jù)的占比。例如如下例子中,按 job_id 分組并在薪水排序的情況下,每行數(shù)據(jù)在窗口內(nèi)的排序列上的占比。
例子
obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.10 sec)
obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)
obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.00 sec)
obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)
obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)
obclient> select last_name, cume_dist() over(partition by job_id order by salary) cume_dist from exployees;
+-----------+----------------------------------+
| last_name | cume_dist |
+-----------+----------------------------------+
| jim | 1.000000000000000000000000000000 |
| tom | 0.333333333333333333333333333333 |
| mike | 0.666666666666666666666666666667 |
| lily | 1.000000000000000000000000000000 |
+-----------+----------------------------------+
4 rows in set (0.01 sec)
Copyright©2021 w3cschool編程獅|閩ICP備15016281號(hào)-3|閩公網(wǎng)安備35020302033924號(hào)
違法和不良信息舉報(bào)電話:173-0602-2364|舉報(bào)郵箱:jubao@eeedong.com
掃描二維碼
下載編程獅App
編程獅公眾號(hào)
聯(lián)系方式:
更多建議: