- Operations – To identify potential tuning opportunities, first break down your integration solution into specific tasks or units of work and then fully define all data operations that are part of each task. Some of these operations may be easily identifiable, but some of the best performance tuning opportunities may be less obvious.
- Volume – With the operations fully defined, identify the anticipated data volumes at each stage of the process and think about how the volumes impact the performance of data operations. This is also a great opportunity to consider how data volumes are expected to change over time.
- Application – Given the operations and volume specifications, select the SQL Server application and/or technology that best fits the job. There is always more than one way to accomplish a task; the key is to identify which application provides you the right amount of functionality with the least amount of overhead.
- A single data source that is a file.
- A single destination that is SQL Server
- No data transformation requirements such as direct load from source to destination.
- No workflow management. Note that workflow management is not really applicable to BULK INSERT functionality, because its sole job is to load data into SQL Server.
- Multiple heterogeneous data sources and destinations.
- Data transformation requirements: aggregations, lookups, and new columns.
- Incorporation of other management tasks such as e-mail and File Transfer Protocol (FTP).
- Workflow management to control the order of many tasks and transformations
- Location – The final factor is location, which refers to the run-time environment for the data integration operations. Location is last because the parameters for the other performance factors will influence how you optimally configure the run-time environment.
The run-time engine
The data flow engine
Influencing buffer sizing
Other Performance considerations
18. Avoid many small buffers. Tweak the values for DefaultMaxBufferRows and DefaultMaxBufferSize to get as many records into a buffer as possible. To optimize the Pipeline, the goal is to pass as many records as possible through the fewest number of buffers, especially when dealing with large data volumes.
19. DefaultMaxBufferRows and DefaultMaxBufferSize are configured separately for each Data Flow task. When integrating data from multiple data sources using a single Data Flow task, these two settings will only impact the data source components and transformations in that task.
20. Do not increase buffer size to the point where paging to disk starts to occur.
21. Design the package in such a way that it does a full pull of data only in the beginning or on-demand, next time onward it should do the incremental pull, this will greatly reduce the volume of data load operations, especially when volumes are likely to increase over the lifecycle of an application. For this purpose, use upstream enabled CDC (Change Data Capture) feature of SQL Server 2008; for previous versions of SQL Server incremental pull logic.
22. It is recommended to set two of the data flow task properties viz. BufferTempStoragePath and BLOBTempStoragePath instead of using default values. The location provided should be of some fast drives. It is also recommended that BufferTempStoragePath and BLOBTempStoragePath should point to drives which are on separate spindles in order to maximize I/O throughput