本文共 8195 字,大约阅读时间需要 27 分钟。
海量数据性能优化的一个基本的原则就是“分区”也有叫“分片”的)。分区思想其实就是日常工作生活中的抽屉原理:我们把自己的物品按照某种逻辑归置到多个小抽屉中,一般会比混在一个大抽屉中好找;但是小抽屉太多了、或者逻辑混乱了,也可能效果适得其反。
Teradata的分区语法较为简洁,其中常用的是按时间分区,如下例只要添加到create table语句末尾就可以实现2013年全年一天一个分区了为了省事,可以一次分5-10年):
PARTITION BY RANGE_N( Rcd_Dt BETWEEN DATE '2013-01-01' AND DATE '2013-12-31' EACH INTERVAL '1' DAY, NO RANGE );
另外一个常用但是不容易掌握)的是按字符串取值分区。在上述按时间分区中我们可以看到RANGE_N关键字。按值分区采用CASE_N关键字,如下例所示:
PARTITION BY CASE_N( (CASE WHEN (my_field='A') THEN (1) ELSE (0) END)=1, (CASE WHEN (my_field='B') THEN (2) ELSE (0) END)=2, (CASE WHEN (my_field='C') THEN (3) ELSE (0) END)=3, NO CASE OR UNKNOWN);
更进一步,其中如下面的语法元素:
my_field='A'
可以修改为类似于这样的形式:
SUBSTR(my_field,1,1) IN ('E','F','G')
在现实中,因为访问数据从全表扫描变成了分区扫描的原因,某些步骤可以达成10-100倍的性能提升。对于复杂的耗时较长的大作业,也总是能够缩短一半以上的运行时间。非常有意思的现象是,即使是经验丰富的开发人员,对数据分区的掌握也不一定很好。数据分区理念是超越具体数据库的,无论是Teradata还是别的什么数据库,在我过去将近十年的职业生涯中,大多数性能问题都可以通过数据分区得以妥善解决。
建立ppi的表为了提高删除效率,我们可以使用分区删除数据,但是需要注意下面几个关于分区的问题:
1、未分区时,所有记录的partition号为0,不能使用分区删除数据; 2、分区中设立no range或no case时,不满足分区条件的记录都会分到no range或no case分区,在添加分区时,如果no range或no case分区中的数据满足新分区的条件,则满足条件的记录被移到新分区中,其余仍然不满足分区条件的数据在no range或no case分区。 3、只能增加现有分区中最小分区以前和最大分区以后的分区,可以不连续,但是不能增加中间跳过的分区: 4、定义了no range或no case分区的情况下,删除分区时分区被删除,分区的数据并未被删除,而是移动到no range或no case分区。 5、当不定义no range或no case分区时,删除分区才是真正的删除掉数据,且只能删除头和尾的分区,中间的分区不允许删除。1、未分区时,所有记录的partition号为0;
例:建立表SCOTT.EMP,不建分区: CREATE MULTISET TABLE SCOTT.EMP ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( EMPNO INTEGER, ENAME VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC, JOB VARCHAR(9) CHARACTER SET LATIN NOT CASESPECIFIC, MGR INTEGER, HIREDATE DATE FORMAT 'YYYYMMDD', SAL DECIMAL(7,2), COMM DECIMAL(7,2), DEPTNO INTEGER) PRIMARY INDEX ( EMPNO );向表中插入15条数据,结果如下:
select * from scott.emp; 7788 SCOTT ANALYST 7566 1987-04-19 3000.00 ? 20 7782 CLARK MANAGER 7839 1981-06-09 2450.00 ? 10 7521 FAN SALESMAN 7698 1981-02-22 1250.00 500.00 30 7698 BLAKE MANAGER 7839 1981-05-01 2850.00 ? 30 7902 FORD ANALYST 7566 1981-12-03 3000.00 ? 20 7839 KING PRESIDENT ? 1981-11-17 5000.00 ? 10 7900 JAMES CLERK 7698 1981-12-03 950.00 ? 30 7654 MARTIN SALESMAN 7698 1981-09-28 1250.00 1400.00 30 7955 fan CLERK 7782 1987-05-23 100.00 ? ? 7566 JONES MANAGER 7839 1981-04-02 2975.00 ? 20 7934 MILLER CLERK 7782 1982-01-23 1300.00 ? 10 7499 ALLEN SALESMAN 7698 1981-02-20 1600.00 300.00 30 7844 TURNER SALESMAN 7698 1981-09-08 1500.00 .00 30 7369 SMITH CLERK 7902 1980-12-17 800.00 ? 20 7876 ADAMS CLERK 7788 1987-05-23 1100.00 ? 20查看所有记录的分区,因为未分区,所以分区号都为0:
select partition,count(*) from scott.emp group by 1; 0 152、分区中设立no range时,不满足分区条件的记录都会分到no range分区,在添加分区时,如果no range分区中的数据满足新分区的条件,
则满足条件的记录被移到新分区中,其余如果仍然不满足分区条件仍然在no range分区。 例:建立分区表 CREATE MULTISET TABLE SCOTT.EMP_PPI ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( EMPNO INTEGER, ENAME VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC, JOB VARCHAR(9) CHARACTER SET LATIN NOT CASESPECIFIC, MGR INTEGER, HIREDATE DATE FORMAT 'YYYYMMDD', SAL DECIMAL(7,2), COMM DECIMAL(7,2), DEPTNO INTEGER) PRIMARY INDEX ( EMPNO ) partition by range_N( HIREDATE between date'1980-01-01' and date'1986-12-31' each interval '1' year,no range);插入样本数据:
insert into SCOTT.EMP_PPI select * from scott.emp; 查看每个分区记录数,87年的3条记录不满足条件,存在no range分区中,下例为8号分区: select partition,HIREDATE from scott.emp_ppi order by 1; 1 1980-12-17 2 1981-09-08 2 1981-12-03 2 1981-02-22 2 1981-12-03 2 1981-06-09 2 1981-09-28 2 1981-04-02 2 1981-11-17 2 1981-02-20 2 1981-05-01 3 1982-01-23 8 1987-05-23 8 1987-04-19 8 1987-05-23如果再插入一条1989年的记录,则继续插入no range分区:
insert into scott.emp_ppi values(7132,'ADAMS','CLERK',7788,cast('19890523' as date format 'YYYYMMDD'),1100.00,0,20); select partition,HIREDATE from scott.emp_ppi order by 1; 1 1980-12-17 2 1981-12-03 2 1981-12-03 2 1981-09-08 2 1981-02-22 2 1981-06-09 2 1981-09-28 2 1981-04-02 2 1981-11-17 2 1981-02-20 2 1981-05-01 3 1982-01-23 8 1987-05-23 8 1989-05-23 8 1987-04-19 8 1987-05-23增加1987年分区:
ALTER TABLE SCOTT.EMP_PPI MODIFY PRIMARY INDEX ADD RANGE BETWEEN DATE '1987-01-01' AND DATE '1987-12-31' EACH INTERVAL '1' year; 原来no range分区内的3条记录因为都满足新分区,所以现在8号分区现在为1987年分区: select partition,HIREDATE from scott.emp_ppi order by 1; 1 1980-12-17 2 1981-12-03 2 1981-09-08 2 1981-12-03 2 1981-02-22 2 1981-05-01 2 1981-04-02 2 1981-02-20 2 1981-09-28 2 1981-11-17 2 1981-06-09 3 1982-01-23 8 1987-04-19 8 1987-05-23 8 1987-05-23 9 1989-05-233、只能增加现有分区中最小分区以前和最大分区以后的分区,可以不连续,但是不能增加中间跳过的分区:
如下增加1989年分区: ALTER TABLE SCOTT.EMP_PPI MODIFY PRIMARY INDEX ADD RANGE BETWEEN DATE '1989-01-01' AND DATE '1989-12-31' EACH INTERVAL '1' year; 查看表定义: show table SCOTT.EMP_PPI; CREATE MULTISET TABLE SCOTT.EMP_PPI ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( EMPNO INTEGER, ENAME VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC, JOB VARCHAR(9) CHARACTER SET LATIN NOT CASESPECIFIC, MGR INTEGER, HIREDATE DATE FORMAT 'YYYYMMDD', SAL DECIMAL(7,2), COMM DECIMAL(7,2), DEPTNO INTEGER) PRIMARY INDEX ( EMPNO ) PARTITION BY RANGE_N(HIREDATE BETWEEN DATE '1980-01-01' AND DATE '1987-12-31' EACH INTERVAL '1' YEAR , DATE '1989-01-01' AND DATE '1989-12-31' EACH INTERVAL '1' YEAR , NO RANGE);不能增加中间跳过的分区,如增加1988分区会报错: ALTER TABLE SCOTT.EMP_PPI MODIFY PRIMARY INDEX add RANGE BETWEEN DATE '1988-01-01' AND DATE '1988-12-31' EACH INTERVAL '1' year; 报错如下: The facility of altering the partition of a non-empty table in this case has not been implemented yet; 增加1990年分区成功: ALTER TABLE SCOTT.EMP_PPI MODIFY PRIMARY INDEX add RANGE BETWEEN DATE '1990-01-01' AND DATE '1990-12-31' EACH INTERVAL '1' year; 查看表定义: show table SCOTT.EMP_PPI; CREATE MULTISET TABLE SCOTT.EMP_PPI ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( EMPNO INTEGER, ENAME VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC, JOB VARCHAR(9) CHARACTER SET LATIN NOT CASESPECIFIC, MGR INTEGER, HIREDATE DATE FORMAT 'YYYYMMDD', SAL DECIMAL(7,2), COMM DECIMAL(7,2), DEPTNO INTEGER) PRIMARY INDEX ( EMPNO ) PARTITION BY RANGE_N(HIREDATE BETWEEN DATE '1980-01-01' AND DATE '1987-12-31' EACH INTERVAL '1' YEAR , DATE '1989-01-01' AND DATE '1990-12-31' EACH INTERVAL '1' YEAR , NO RANGE);
4、定义了no range分区的情况下,删除分区时分区被删除,分区的数据并未被删除,而是移动到no range分区。
下面删除1980年分区,1980年的一条数据并未删除,而是移到了10号no range分区中: ALTER TABLE SCOTT.EMP_PPI MODIFY PRIMARY INDEX drop RANGE BETWEEN DATE '1980-01-01' AND DATE '1980-12-31' EACH INTERVAL '1' year with delete;select partition,HIREDATE from scott.emp_ppi order by 1;
1 1981-12-03 1 1981-02-22 1 1981-09-08 1 1981-12-03 1 1981-05-01 1 1981-09-28 1 1981-04-02 1 1981-02-20 1 1981-11-17 1 1981-06-09 2 1982-01-23 7 1987-05-23 7 1987-04-19 7 1987-05-23 8 1989-05-23 10 1980-12-175、当不定义no range分区时,删除分区才是真正的删除掉数据,且只能删除头和尾的分区,中间的分区不允许删除。 创建不定义no range分区的表: CREATE MULTISET TABLE SCOTT.EMP_PPI_NORANGE ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( EMPNO INTEGER, ENAME VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC, JOB VARCHAR(9) CHARACTER SET LATIN NOT CASESPECIFIC, MGR INTEGER, HIREDATE DATE FORMAT 'YYYYMMDD', SAL DECIMAL(7,2), COMM DECIMAL(7,2), DEPTNO INTEGER) PRIMARY INDEX ( EMPNO ) partition by range_N( HIREDATE between date'1980-01-01' and date'1989-12-31' each interval '1' year); 插入样本数据: insert into SCOTT.EMP_PPI_NORANGE select * from scott.emp; select partition,hiredate from scott.EMP_PPI_NORANGE order by 1; 1 1980-12-17 2 1981-12-03 2 1981-12-03 2 1981-02-22 2 1981-09-08 2 1981-05-01 2 1981-04-02 2 1981-02-20 2 1981-09-28 2 1981-11-17 2 1981-06-09 3 1982-01-23 8 1987-05-23 8 1987-04-19 8 1987-05-23
删除1980分区:
ALTER TABLE SCOTT.EMP_PPI_NORANGE MODIFY PRIMARY INDEX drop RANGE BETWEEN DATE '1980-01-01' AND DATE '1980-12-31' EACH INTERVAL '1' year with delete; select partition,hiredate from scott.EMP_PPI_NORANGE order by 1; 1 1981-12-03 1 1981-12-03 1 1981-02-22 1 1981-09-08 1 1981-06-09 1 1981-04-02 1 1981-02-20 1 1981-11-17 1 1981-09-28 1 1981-05-01 2 1982-01-23 7 1987-05-23 7 1987-04-19 7 1987-05-23删除中间分区: ALTER TABLE SCOTT.EMP_PPI_NORANGE MODIFY PRIMARY INDEX drop RANGE BETWEEN DATE '1982-01-01' AND DATE '1982-12-31' EACH INTERVAL '1' year with delete; 报错: The facility of altering the partition of a non-empty table in this case has not been implemented yet;
转载地址:https://blog.csdn.net/thy822/article/details/49448457 如侵犯您的版权,请留言回复原文章的地址,我们会给您删除此文章,给您带来不便请您谅解!