Thursday, September 15, 2011

Performance Considerations for SSIS

1.Packet size in connection should be equal to 32767
2.Consider using NOLOCK in source table
3.Select only columns you need in source query
4.Use Shared lookup cache in lookup transfer
5.Consider is the transfermation fall under syn/asynchronous type
6.Data types as narrow as possible forless memory usage
7.Do not perform excessive casting
8.sort:puch to source queries when possible-use sort transform for sorting cross database joins
9.Use merge instead of SCD
10.Use group by instead of aggregation
11.insert into instead of a data flow on a single sql instance
12.unnecessary delta detection vs. reload
13.use sql server destination
14.commit size 0 == fastest
15.drop some indexes based on load growth %
 clustered indexes:+inf%(dont drop)
 single nonclustered index: >=~100%
 multiple nonclusted index: >=~10%(varies)
16.truncate,not delete
17. In “Table or view” access mode, the OLE DB source adapter calls OpenRowset to get column metadata at Validate phase. The data returned from OpenRowset include more than just column metadata. Thus the adapter issues “SET ROWCOUNT 1” statement to retrieve column metadata information. “SET ROWCOUNT 1” causes an inefficient execution plan (i.e. Nested Loop) to be cached and later used in the subsequent Execute phase.
In “SQL command” access mode, the OLE DB source adapter calls “sp_prepare” to get column metadata at Validate phase, and “sp_execute” at Execute phase. The execution plan used at Execute phase is Hash Match which is more efficient than Nested Loop.
18. Avoid many small buffers. Tweak the values for DefaultMaxBufferRows and DefaultMaxBufferSize to get as many records into a buffer as possible. To optimize the Pipeline, the goal is to pass as many records as possible through the fewest number of buffers, especially when dealing with large data volumes.
19. DefaultMaxBufferRows and DefaultMaxBufferSize are configured separately for each Data Flow task. When integrating data from multiple data sources using a single Data Flow task, these two settings will only impact the data source components and transformations in that task.
20. Do not increase buffer size to the point where paging to disk starts to occur.
21. Design the package in such a way that it does a full pull of data only in the beginning or on-demand, next time onward it should do the incremental pull, this will greatly reduce the volume of data load operations, especially when volumes are likely to increase over the lifecycle of an application. For this purpose, use upstream enabled CDC (Change Data Capture) feature of SQL Server 2008; for previous versions of SQL Server incremental pull logic.
22. It is recommended to set two of the data flow task properties viz. BufferTempStoragePath and BLOBTempStoragePath instead of using default values. The location provided should be of some fast drives. It is also recommended that BufferTempStoragePath and BLOBTempStoragePath should point to drives which are on separate spindles in order to maximize I/O throughput

2 comments:

  1. This post is probably where I got the most useful information for my research. Thanks for posting, maybe we can see more on this.
    Are you aware of any other websites on this
    msbi


    ReplyDelete