博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Teradata SQL调优
阅读量:6573 次
发布时间:2019-06-24

本文共 4196 字,大约阅读时间需要 13 分钟。

hot3.png

1.优化过程:依照运行时间,数据量和复杂度来定位瓶颈。查看sql执行计划,判断其合理性。

性能监控 ==》目标选取 ==》性能分析 ==》过程优化 ==》运行跟踪(性能监控)

注意:每个过程中都会产生必须的文档

[@]2.性能分析:

• Review PDM

--表定义 --PI的选择 --表的记录数与空间占用

• Review SQL

--关联的表 --逻辑处理复杂度 --整体逻辑 --多余的处理

• 测试运行

--响应时间

• 查看EXPLAIN

--瓶颈定位

3.过程优化:

• 业务规则理解

--合理选取数据访问路径

• PDM设计

--调整PDM

• SQL写法不优化,忽略了Teradata的机理与特性

--调整SQL

• Teradata优化器未得到足够的统计信息

--Collect Statistics

 

4.Multiple Insert/select --> Multi-Statement Insert/Select

* 并行插入空表不记录Transient Journal

* 充分利用Teradata向空表Insert较快以及并行操作的特性如:

• 现状 INSERT INTO ${TARGETDB}.DES (Party_Id ,Party_Name ... )

SELECT … FROM SRC1 ;

INSERT INTO ${TARGETDB}.DES

(Party_Id ,Party_Name ... )

SELECT … FROM SRC2 ;

INSERT INTO ${TARGETDB}.DES

(Party_Id ,Party_Name ... )

SELECT … FROM SRC3 ;

说明:串行执行,多个Transaction

• 优化后: INSERT INTO ${TARGETDB}.DES (Party_Id ,Party_Name ... )

SELECT … FROM SRC1

;INSERT INTO ${TARGETDB}.DES

(Party_Id ,Party_Name ... )

SELECT … FROM SRC2

;INSERT INTO ${TARGETDB}.DES

(Party_Id ,Party_Name ... )

SELECT … FROM SRC3 ;

说明:并行执行,单个Transaction

5.Insert/Select with Union/Union all --> Multi-Statement Insert/Select

* Union 需要排除重复记录,Union all虽不需要排重,但都需要占用大量的Spool空间,都需要进行重新组织数据

如:现状:

INSERT INTO ${TARGETDB}.DES

(Party_Id ,Party_Name ... )

SELECT … FROM SRC1 ;

UNION ALL SELECT … FROM SRC2 ;

UNION ALL SELECT … FROM SRC3 ;

调整后:

INSERT INTO ${TARGETDB}.DES (Party_Id ,Party_Name ... )

SELECT … FROM SRC1

;INSERT INTO ${TARGETDB}.T01_DES

(Party_Id ,Party_Name ... )

SELECT … FROM SRC2

;INSERT INTO ${TARGETDB}.T01_DES

(Party_Id ,Party_Name ... )

SELECT … FROM SRC3 ;

6.排除重复记录

* 针对单表内的重复记录使用ROW_ NUMBER函数排重

* 排重方式多了一层子查询

* 增加了大量的数据重新分布的时间

现状:

……

INSERT INTO ${TARGETDB}.T01_INDIV (Party_Id ,Party_Name ... )

SELECT COALESCE(b1.Party_Id,'-1') ,

COALESCE(TRIM(b1.Party_name),'') ...

FROM

( select party_id party_name, … ,

ROW_NUMBER() OVER

(PARTITION BY Party_Id ORDER BY Party_Name )

as rownum from ${TEMPDB}.T01_INDIV b1 … ) AA

where AA.rownum = 1 ……

建议做法:

INSERT INTO ${TEMPDB}.T01_INDIV …

INSERT INTO ${TEMPDB}.T01_INDIV …

……

INSERT INTO ${TARGETDB}.T01_INDIV (Party_Id ,Party_Name ... )

SELECT party_id party_name, …

From ${TEMPDB}.T01_INDIV b1

Qualify ROW_NUMBER() OVER

(PARTITION BY Party_Id ORDER BY Party_Name ) = 1

• 运用Qualify + ROW_ NUMBER函数

• SQL语句简洁明了

• 避免子查询

优化前explain:

……

4) We do an all-AMPs STAT FUNCTION step from PTEMP.VT_T01_INDIV_cur by way of an all-rows scan with no residual conditions into Spool 5 (Last Use), which is assumed to be redistributed by value to all AMPs. The result rows are put into Spool 3 (all_amps), which is built locally on the AMPs.

5) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (all_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no confidence to be 6,781,130 rows. The estimated time for this step is 16.01 seconds.

6) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of an all-rows scan with a condition of ("ROWNUMBER = 1") into Spool 8 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 8 by row hash. The result spool file will not be cached in memory. The size of Spool 8 is estimated with no confidence to be 6,781,130 rows. The estimated time for this step is 1 minute.

7) We do an all-AMPs MERGE into PDATA.T01_INDIV from Spool 8 (Last Use).

优化后explain:

……

4) We do an all-AMPs STAT FUNCTION step from PTEMP.VT_T01_INDIV_cur by way of an all-rows scan with no residual conditions into Spool 5 (Last Use), which is assumed to be redistributed by value to all AMPs. The result rows are put into Spool 3 (all_amps), which is built locally on the AMPs.

5) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan with a condition of ("Field_10 = 1") into Spool 1 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 1 by row hash. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no confidence to be 6,781,130 rows. The estimated time for this step is 1 minute.

6) We do an all-AMPs MERGE into PDATA.T01_INDIV from Spool 1 (Last Use).

转载于:https://my.oschina.net/u/2381604/blog/741652

你可能感兴趣的文章
开发规范3:CSS
查看>>
MongoDB索引概念及使用详解
查看>>
MySQL主从的一致性校验及修复
查看>>
最长数组对 Maximum Length of Pair Chain
查看>>
我的友情链接
查看>>
JS让本地图片预览不再难-jquery插件
查看>>
mysql的安装以及开启远程访问
查看>>
我的友情链接
查看>>
Oracle IO问题解析(一)
查看>>
开源框架_Index
查看>>
Linux系统管理第七周作业【Linux微职位】
查看>>
python基础知识
查看>>
我的友情链接
查看>>
练习6
查看>>
Linux清除用户登录记录和命令历史方法
查看>>
第五章 shell学习之文件的排序、合并和分割
查看>>
翻译 Windows Server 2016和System Center 2016 技术预览版4 使创新更容易
查看>>
我的友情链接
查看>>
smokeping 安装与配置
查看>>
nginx访问控制allow、deny(ngx_http_access_module)
查看>>