1.优化过程:依照运行时间,数据量和复杂度来定位瓶颈。查看sql执行计划,判断其合理性。
性能监控 ==》目标选取 ==》性能分析 ==》过程优化 ==》运行跟踪(性能监控)
注意:每个过程中都会产生必须的文档
[@]2.性能分析:
• Review PDM
--表定义 --PI的选择 --表的记录数与空间占用
• Review SQL
--关联的表 --逻辑处理复杂度 --整体逻辑 --多余的处理
• 测试运行
--响应时间
• 查看EXPLAIN
--瓶颈定位
3.过程优化:
• 业务规则理解
--合理选取数据访问路径
• PDM设计
--调整PDM
• SQL写法不优化,忽略了Teradata的机理与特性
--调整SQL
• Teradata优化器未得到足够的统计信息
--Collect Statistics
4.Multiple Insert/select --> Multi-Statement Insert/Select
* 并行插入空表不记录Transient Journal
* 充分利用Teradata向空表Insert较快以及并行操作的特性如:
• 现状 INSERT INTO ${TARGETDB}.DES (Party_Id ,Party_Name ... )
SELECT … FROM SRC1 ;
INSERT INTO ${TARGETDB}.DES
(Party_Id ,Party_Name ... )
SELECT … FROM SRC2 ;
INSERT INTO ${TARGETDB}.DES
(Party_Id ,Party_Name ... )
SELECT … FROM SRC3 ;
说明:串行执行,多个Transaction
• 优化后: INSERT INTO ${TARGETDB}.DES (Party_Id ,Party_Name ... )
SELECT … FROM SRC1
;INSERT INTO ${TARGETDB}.DES
(Party_Id ,Party_Name ... )
SELECT … FROM SRC2
;INSERT INTO ${TARGETDB}.DES
(Party_Id ,Party_Name ... )
SELECT … FROM SRC3 ;
说明:并行执行,单个Transaction
5.Insert/Select with Union/Union all --> Multi-Statement Insert/Select
* Union 需要排除重复记录,Union all虽不需要排重,但都需要占用大量的Spool空间,都需要进行重新组织数据
如:现状:
INSERT INTO ${TARGETDB}.DES
(Party_Id ,Party_Name ... )
SELECT … FROM SRC1 ;
UNION ALL SELECT … FROM SRC2 ;
UNION ALL SELECT … FROM SRC3 ;
…
调整后:
INSERT INTO ${TARGETDB}.DES (Party_Id ,Party_Name ... )
SELECT … FROM SRC1
;INSERT INTO ${TARGETDB}.T01_DES
(Party_Id ,Party_Name ... )
SELECT … FROM SRC2
;INSERT INTO ${TARGETDB}.T01_DES
(Party_Id ,Party_Name ... )
SELECT … FROM SRC3 ;
6.排除重复记录
* 针对单表内的重复记录使用ROW_ NUMBER函数排重
* 排重方式多了一层子查询
* 增加了大量的数据重新分布的时间
现状:
……
INSERT INTO ${TARGETDB}.T01_INDIV (Party_Id ,Party_Name ... )
SELECT COALESCE(b1.Party_Id,'-1') ,
COALESCE(TRIM(b1.Party_name),'') ...
FROM
( select party_id party_name, … ,
ROW_NUMBER() OVER
(PARTITION BY Party_Id ORDER BY Party_Name )
as rownum from ${TEMPDB}.T01_INDIV b1 … ) AA
where AA.rownum = 1 ……
建议做法:
INSERT INTO ${TEMPDB}.T01_INDIV …
INSERT INTO ${TEMPDB}.T01_INDIV …
……
INSERT INTO ${TARGETDB}.T01_INDIV (Party_Id ,Party_Name ... )
SELECT party_id party_name, …
From ${TEMPDB}.T01_INDIV b1
Qualify ROW_NUMBER() OVER
(PARTITION BY Party_Id ORDER BY Party_Name ) = 1
• 运用Qualify + ROW_ NUMBER函数
• SQL语句简洁明了
• 避免子查询
优化前explain:
……
4) We do an all-AMPs STAT FUNCTION step from PTEMP.VT_T01_INDIV_cur by way of an all-rows scan with no residual conditions into Spool 5 (Last Use), which is assumed to be redistributed by value to all AMPs. The result rows are put into Spool 3 (all_amps), which is built locally on the AMPs.
5) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (all_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no confidence to be 6,781,130 rows. The estimated time for this step is 16.01 seconds.
6) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of an all-rows scan with a condition of ("ROWNUMBER = 1") into Spool 8 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 8 by row hash. The result spool file will not be cached in memory. The size of Spool 8 is estimated with no confidence to be 6,781,130 rows. The estimated time for this step is 1 minute.
7) We do an all-AMPs MERGE into PDATA.T01_INDIV from Spool 8 (Last Use).
优化后explain:
……
4) We do an all-AMPs STAT FUNCTION step from PTEMP.VT_T01_INDIV_cur by way of an all-rows scan with no residual conditions into Spool 5 (Last Use), which is assumed to be redistributed by value to all AMPs. The result rows are put into Spool 3 (all_amps), which is built locally on the AMPs.
5) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan with a condition of ("Field_10 = 1") into Spool 1 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 1 by row hash. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no confidence to be 6,781,130 rows. The estimated time for this step is 1 minute.
6) We do an all-AMPs MERGE into PDATA.T01_INDIV from Spool 1 (Last Use).