1.SSIS is an in-memory pipeline, so ensure that all
transformations occur in memory.
The
purpose of having Integration Services within SQL Server features is to provide
a flexible, robust pipeline that can efficiently perform row-by-row
calculations and parse data all in memory.
While the extract and load phases of the pipeline will touch disk (read and write respectively), the transformation itself should process in memory. If transformations spill to disk (for example with large sort operations), you will see a big performance degradation. Construct your packages to partition and filter data so that all transformations fit in memory.
A great way to check if your packages are staying within memory is to review the SSIS performance counter Buffers spooled, which has an initial value of 0; above 0 is an indication that the engine has started swapping to disk.
While the extract and load phases of the pipeline will touch disk (read and write respectively), the transformation itself should process in memory. If transformations spill to disk (for example with large sort operations), you will see a big performance degradation. Construct your packages to partition and filter data so that all transformations fit in memory.
A great way to check if your packages are staying within memory is to review the SSIS performance counter Buffers spooled, which has an initial value of 0; above 0 is an indication that the engine has started swapping to disk.
2.Plan
for capacity by understanding resource utilization.
SQL Server Integration Services is designed to process large
amounts of data row by row in memory with high speed. Because of this, it is
important to understand resource utilization, i.e., the CPU, memory, I/O, and
network utilization of your packages.CPU Bound Seek to understand how much CPU is being used by Integration Services and how much CPU is being used overall by SQL Server while Integration Services is running. This latter point is especially important if you have SQL Server and SSIS on the same box, because if there is a resource contention between these two, it is SQL Server that will typically win – resulting in disk spilling from Integration Services, which slows transformation speed. The perfmon counter that is of primary interest to you is Process / % Processor Time (Total). Measure this counter for both sqlservr.exe and dtexec.exe. If SSIS is not able to drive close to 100% CPU load, this may be indicative of:
SSIS moves data as fast as your network is able to handle it. Because of this, it is important to understand your network topology and ensure that the path between your source and target have both low latency and high throughput. The following Network perfmon counters can help you tune your topology:
I/O Bound If you ensure that Integration Services is minimally writing to disk, SSIS will only hit the disk when it reads from the source and writes to the target. But if your I/O is slow, reading and especially writing can create a bottleneck. Memory bound A very important question that you need to answer when using Integration Services is: "How much memory does my package use?" The key counters for Integration Services and SQL Server are:
3. Baseline source
system extract speed.
Understand your source system and how
fast you extract from it. After all, Integration Services cannot be tuned
beyond the speed of your source – i.e., you cannot transform data faster than
you can read it.
Measure the speed of the source system
by creating a very simple package reading data from your source with the a
destination of "Row Count":
Execute the package from the command
line (DTEXEC) and measure the time it took for it to complete its task. Use
the Integration Services log output to get an accurate calculation of the
time. You want to calculate rows per second:
Rows / sec = Row Count / TimeData Flow
Based on this value, you now know the
maximum number of rows per second you can read from the source – this is also
the roof on how fast you can transform your data. To increase this Rows / sec
calculation, you can do the following:
• Improve drivers
and driver configurations: Make sure you are using the most up-to-date driver
configurations for your network, data source, and disk I/O. Often the default
network drivers on your server are not configured optimally for the network
stack, which results in performance degradations when there are a high number
of throughput requests. Note that for 64-bit systems, at design time you may
be loading 32-bit drivers; ensure that at run time you are using 64-bit
drivers.
• Start multiple
connections: To overcome limitations of drivers, you can try to start
multiple connections to your data source. As long as the source can handle
many concurrent connections, you may see an increase in throughput if you
start several extracts at once. If concurrency is causing locking or blocking
issues, consider partitioning the source and having your packages read from
different partitions to more evenly distribute the load.
• Use multiple
NIC cards: If the network is your bottleneck and you’ve already ensured that
you’re using gigabit network cards and routers, then a potential solution is
to use multiple NIC cards per server. Note that you will have to be careful
when you configure multiple NIC environments; otherwise you will have network
conflicts.
4. Tune your network.
A key network property is the packet
size of your connection. By default this value is set to 4,096 bytes. This
means a new network package must be assemble for every 4 KB of data. As noted
in
SqlConnection.PacketSize Property in
the .NET Framework Class Library, increasing the packet size will improve
performance because fewer network read and write operations are required to
transfer a large data set.
If your system is transactional in
nature, with many small data size read/writes, lowering the value will
improve performance.
Since Integration Services is all
about moving large amounts of data, you want to minimize the network
overhead. This means that the value 32K (32767) is the fastest option. While
it is possible to configure the network packet size on a server level using sp_configure,
you should not do this. The database administrator may have reasons to use a
different server setting than 32K. Instead, override the server settings in
the connection manager as illustrated below.
Another network tuning technique is to
use network affinity at the operating system level. At high throughputs, you
can sometimes improve performance this way.
For the network itself, you may want
to work with your network specialists to enable jumbo frames to increase the
default payload of 1,500 bytes to 9,000 bytes. By enabling jumbo frames, you
will further decrease the amount of network operation required to move large
data sets.
SSIS and Memory
One thing that you're bound to run into when doing SSIS is
memory issues. Buffer overflows. Virtual memory out of space.
Although I'm no expert in it, I can offer a few tips from things I've ran into. 1) If you're running out of physical memory, increase the pagefile amount. (Windows 7) Right click on "My Computer" --> "Properties" Select "Advanced System Settings" option on the left hand side On the "Advanced" tab, under the "Performance" group box, click [Settings] Click on the "Advanced" tab On the "Virtual memory" group box, click [Change...] Uncheck the "Automatically managed paging files for all drives" option as this puts a smaller page file Under "Custom Size", input the inital and max size. There is a recommended amount. I'd do 10-50% over the recommended amount. Note: Make sure that you have enough disk space for your file. Click [Set] Click [OK] Restart your system 2) SQL Server likes to eat up memory. It's a smart thing to set the maximum amount that it can consume, as it will consume everything until it's reached it's max - only to be reset when the service is restarted or the computer is rebooted. Open up SSMS. Right click on [Server Name] --> Properties Click on the "Memory" tab Set your max server memory to something manageable. If you have 8gb in your system, you don't want the max server memory to be 8gb since your os needs memory to also manage and there will be slowdown issues because of the imminent paging that will happen. I set it to 50% of my total memory. Once it reaches 50%, it will stay there regardless if it is currently using the entire space or not. Once it's reached the allocated amount. It stays. There - I've said the same thing 5 times. You get the idea.
What are the different properties which control
the number of rows to be stored in the data buffers?
How does BLOB data get stored in the data buffer
and what happens if SSIS does not find enough buffers to hold the BLOB data?
When and where are data buffers are spooled to
disk? How can this issue be avoided?
What is parallel processing in SSIS and how do
you control it?
|
Check it once through MSBI Online Training for more info.
ReplyDelete