Thursday, June 27, 2013

Basics on SSAS DW


When should you use a Star and when should you use Snowflake Schema ?

 

Star Schema :  If the performance is the priority Than go for Star Schema, Since here dimension tables are denormalized.

 

Snowflake Schema : If memory space is the priority than go for Snowflake Schema, since here dimension tables are normalized

What are the Advantage and disadvantage of Snowflake Schema ?

 

Advantage of Snowflake Schema

 

·         The main advantage of Snowflake Schema is the improvement of query performance due to minimized disk storage requirements and joining smaller lookup tables.

·         It is easier to maintain.

·         Increase flexibility.

 

 

Disadvantage  of Snowflake Schema

 

·         The main disadvantage of the Snowflake Schema is the additional maintenance efforts needed to the increase number of lookup tables.

·         Makes the queries much more difficult to create because more tables need to be joined.

 

What is the difference between a data source and data source view ?

 

A Data Source represents a connection to a particular data source using one of the supplied providers. A Data Source View is an abstraction layer that you design using a data source. The latter also contains expressions that create calculated fields

What is the difference between online and offline mode in SSAS ?

Online mode means that you are directly connected to a particular SSAS database instance. Any changes you make are implemented on save. Offline mode means that you are creating metadata files that can be deployed to the SSAS server when you want. An important consideration for offline mode is that BIDS includes no granular conflict resolution

What aggregation types are available for measures in BIDS ?

The default aggregation is SUM. Other available aggregation types include the following: MIN, MAX, COUNT, DISTINCT COUNT, AVERAGE OF CHILDREN, FIRST CHILD, LAST CHILD, FIRST NON EMPTY, LAST NON EMPTY. You can also design custom aggregations using MDX

What is star schema in SSAS ?

A star schema is a type of modeling used as a basis for  building an OLAP cube. It consists of at least one fact table and many dimension tables. The rows in the dimension tables have a 1:M relationship with the rows in the fact table.

In a star schema every dimension table is related directly to the fact table

What is Cube ?

A cube is a set of related measures and dimensions that is used to analyze data. A Cube is the primary storage mechanism for SSAS data. It consists of at least one fact table and many dimension tables. The rows in these tables have a key relationship to each other.

 

What is fact table ?

A fact table is the center table surrounded by a dimension tables. All the dimension tables directly or indirectly connected with the fact table.

A fact table contains foreign keys to relate the rows in the fact table to the dimension table rows and facts. Facts are key performance indicators, for example net sales dollar, net sales unit, etc. Facts are usually numeric and additive.

 

What is a dimension table ?

 

Dimension table consists of one or more source tables. Dimension tables have three types of columns. The first is the original primary key for each row. The second is a new primary key (Or surrogate key) for each row. This key is used to guarantee uniqueness, because the original data could come from multiple sources. The third is a column for each attribute that describes each dimensional value.

What is the difference between SSAS 2005 and SSAS2008?

1.  In 2005 its not possible to create an empty cube but in 2008 we can create an empty cube.

2.  A new feature in Analysis Services 2008 is the Attribute Relationships tab in the Dimension Designer . to implement attribute relationship is complex in ssas 2005

3.  we can create ONLY 2000 partitions per Measure Group in ssas 2005 and the same limit of partitions is removed in ssas 2008

What is datawarehouse in short DWH?

The datawarehouse is an informational environment that

§ Provides an integrated and total view of the enterprise

§ Makes the enterprise’s current and historical information easily available for decision making

§ Makes decision-support transactions possible without hindering operational systems

§ Renders the organization’s information consistent

§ Presents a flexible and interactive source of strategic information

OR a warehouse is a

§ Subject oriented

§ Integrated

§ Time variant

§ Non volatile for doing decision support

What is data mart?

A data mart is a subset of an organizational data store, usually oriented to a specific purpose or major data subject that may be distributed to support business needs. Data marts are analytical data stores designed to focus on specific business functions for a specific community within an organization.

Data marts are often derived from subsets of data in a data warehouse, though in the bottom-up data warehouse design methodology the data warehouse is created from the union of organizational data marts.

They are 3 types of data mart they are

1.   Dependent

2.   Independent

3.   Logical data mart

Have you ever worked on performance tuning, if yes what are the steps involved in it?

We need to identify the bottlenecks to tune the performance, to overcome the bottleneck we need to follow the following.

1.   Avoid named queries

2.   Unnecessary relationships between tables

3.   Proper attribute relationships to be given

4.   Proper aggregation design

5.   Proper partitioning of data

6.   Proper dimension usage design

7.   Avoid unnecessary many to many relationships

8.   Avoid unnecessary measures

9.   Set AttributeHierarchyEnabled = FALSE to Attributes that is not required

10.   Won’t take even single measure which is not necessary

What is named calculation?

A named calculation is a SQL expression represented as a calculated column. This expression appears and behaves as a column in the table. A named calculation lets you extend the relational schema of existing tables or views in a data source view without modifying the tables or views in the underlying data source.

Named calculation is used to create a new column in the DSV using hard coded values or by using existing columns or even with both.

What is named query?

Named query in DSV is similar to View in Database. This is used to create Virtual table in DSV which will not impact the underlying database. Named query is mainly used to merge the two or more table in the datasource view or to filter columns of a table.

What is dimension table?

A dimension table contains hierarchical data by which you’d like to summarize. A dimension table contains specific business information, a dimension table that contains the specific name of each member of the dimension. The name of the dimension member is called an “attribute”

The key attribute in the dimension must contain a unique value for each member of the dimension. This key attribute is called “primary key column”

The primary key column of each dimension table corresponding to the one of the key column  in any related fact table.

What is fact table?

A fact table contains the basic information that you wish to summarize. The table that stores the detailed value for measure is called fact table. In simple and best we can define as “The table which contains METRICS” that are used to analyse the business.

It consists of 2 sections

1) Foregine key to the dimesion

2) measures/facts(a numerical value that used to monitor business activity)

What is Factless fact table?

This is very important interview question. The “Factless Fact Table” is a table which is similar to Fact Table except for having any measure; I mean that this table just has the links to the dimensions. These tables enable you to track events; indeed they are for recording events.

Factless fact tables are used for tracking a process or collecting stats. They are called so because, the fact table does not have aggregatable numeric values or information. They are mere key values with reference to the dimensions from which the stats can be collected

§ What is attribute relationships, why we need it?

Attribute relationships are the way of telling the analysis service engine that how the attributes are related with each other. It will help to relate two or more  attributes to each other.Processing time will be decreased if proper relationships are given. This increases the Cube Processing performance and MDX query performance too.

In Microsoft SQL Server Analysis Services, attributes within a dimension are always related either directly or indirectly to the key attribute. When you define a dimension based on a star schema, which is where all dimension attributes are derived from the same relational table, an attribute relationship is automatically defined between the key attribute and each non-key attribute of the dimension. When you define a dimension based on a snowflake schema, which is where dimension attributes are derived from multiple related tables, an attribute relationship is automatically defined as follows:

§ Between the key attribute and each non-key attribute bound to columns in the main dimension table.

§ Between the key attribute and the attribute bound to the foreign key in the secondary table that links the underlying dimension tables.

§ Between the attribute bound to foreign key in the secondary table and each non-key attribute bound to columns from the secondary table.

§ How many types of attribute relationships are there?

They are 2 types of attribute relationships they are

1.   Rigid

2.   Flexible

Rigid: In Rigid relationships  where the relationship between the attributes is fixed, attributes will not change levels or their respective attribute relationships.

Example: The time dimension. We know that month “January 2009″ will ONLY belong to Year “2009″ and it wont be moved to any other year.

Flexible :   In Flexible relationship between the attributes is changed.

Example: An employee and department. An employee can be in accounts department today but it is possible that the employee will be in Marketing department tomorrow.

How many types of dimensions are there and what are they?

They are 3 types of dimensions:

1.   confirm dimension

2.   junk dimension

3.   degenerate attribute

What are confirmed dimensions, junk dimension and degenerated dimensions?

Confirm dimension: It is the dimension which is sharable across the multiple facts or data model. This is also called as Role Playing Dimensions.

junk dimension: A number of very small dimensions might be lumped (a small irregularly shaped) together to form a single dimension, a junk dimension – the attributes are not closely related. Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension is known as junk dimension.

Degenerated dimension: In this degenerate dimension contains their values in fact table and the dimension id not available in dimension table. Degenerated Dimension is a dimension key without corresponding dimension.

Example: In the PointOfSale Transaction Fact table, we have:

Date Key (FK), Product Key (FK), Store Key (FK), Promotion Key (FP), and POS Transaction Number

Date Dimension corresponds to Date Key, Production Dimension corresponds to Production Key. In a traditional parent-child database, POS Transactional Number would be the key to the transaction header record that contains all the info valid for the transaction as a whole, such as the transaction date and store identifier. But in this dimensional model, we have already extracted this info into other dimension. Therefore, POS Transaction Number looks like a dimension key in the fact table but does not have the corresponding dimension table.

How will you hide an attribute?

We can hide the attribute by selecting “AttributeHierarchyVisible = False” in properties of the attribute.

 How will you make an attribute not process?

By selecting  “ AttributeHierarchyEnabled = False”, we can make an  attribute not in process.

What is use of IsAggregatable property?

In Analysis Service we generally see all dimension has All member. This is because of IsAggregatable property of the attribute. You can set its value to false, so that it will not show All member. Its default member for that attribute. If you hide this member than you will have to set other attribute value to default member else it will pick some value as default and this will create confusion in browsing data if someone is not known to change in default member.

What are key, name and value columns of an attribute?

Key column of any attribute: Contains the column or columns that represent the key for the attribute, which is the column in the underlying relational table in the data source view to which the attribute is bound. The value of this column for each member is displayed to users unless a value is specified for the NameColumn property.

Name  column of an attribute: Identifies the column that provides the name of the attribute that is displayed to users, instead of the value in the key column for the attribute. This column is used when the key column value for an attribute member is cryptic or not otherwise useful to the user, or when the key column is based on a composite key. The NameColumn property is not used in parent-child hierarchies; instead, the NameColumn property for child members is used as the member names in a parent-child hierarchy.

Value columns of an attribute: Identifies the column that provides the value of the attribute. If the NameColumn element of the attribute is specified, the same DataItem values are used as default values for the ValueColumn element. If the NameColumn element of the attribute is not specified and the KeyColumns collection of the attribute contains a single KeyColumn element representing a key column with a string data type, the same DataItem values are used as default values for the ValueColumn element.

AttributeHierarchyDisplayFolder: Identifies the folder in which to display the associated attribute hierarchy to end users. For example if I set the property value as “Test” to all the Attributes of a dimension then a folder with the name “Test” will be created and all the Attributes will be placed into the same.

AttributeHierarchyOptimizedState: Determines the level of optimization applied to the attribute hierarchy. By default, an attribute hierarchy is FullyOptimized, which means that Analysis Services builds indexes for the attribute hierarchy to improve query performance. The other option, NotOptimized, means that no indexes are built for the attribute hierarchy. Using NotOptimized is useful if the attribute hierarchy is used for purposes other than querying, because no additional indexes are built for the attribute. Other uses for an attribute hierarchy can be helping to order another attribute.

AttributeHierarchyOrdered: Determines whether the associated attribute hierarchy is ordered. The default value is True. However, if an attribute hierarchy will not be used for querying, you can save processing time by changing the value of this property to False.

 

 

Thursday, June 6, 2013

Interview Questions in SSAS


What is Impersonation? What are the different impersonation options available in SSAS?

  • Impersonation allows to assume the identity/security context of the client application. it perform the server side data operations like data access, processing etc. As part of impersonation, the following options are available in SSAS:
    • Use a specific Windows user name and password: This option specify Windows account credentials which will be used by SSAS to perform operations like source data access, processing etc.
    • Use the service account: When this option is selected, SSAS uses the credentials of the service account under which the Analysis Services service is configured/running for source data access, processing etc.
    • Use the credentials of the current user: When this option is set, SSAS uses the credentials of the current user for performing operations like DMX Open Queries, Local cubes etc. This option cannot be used for performing server side operations like source data access, processing etc.
    • Inherit: This option in the SSAS server decide which impersonation mode is suitable for each type of operation. When this option is set, by default SSAS will use the service account for operations like processing and the credentials of the current user for operations like Local cubes, querying the data mining models, etc...

What is a Named Calculation?

  • A Named Calculation is a new column added to a Table in DSV and is based on an expression. This capability allows you to add an extra column into your DSV which is based on one or more columns from underlying data source Table(s)/View(s) combined using an expression without requiring the addition of a physical column in the underlying database Table(s)/View(s).

What is a Named Query?

  • A Named Query is a SQL query/expression in your DSV which acts as a Table. It is used to combine data from one or more Table(s)/View(s) from the underlying data source without requiring any schematic changes to the underlying data source Table(s)/View(s).

What are the pros and cons of using Tables and Named Queries in DSV?

 

Tables
Named Queries
Named Calculations can be added to Tables in DSV.
Named Calculations cannot be added to Named Queries in DSV.
Keys and Relationships are automatically set (by the wizard) based on the Keys and Relationships in the underlying database schema.
Keys and Relationships have to be set explicitly in the DSV.
Only one Table/View from the underlying data source can be referenced in DSV.
More than one Table/View from the underlying data source can be referenced using a SQL Expression in the DSV.
Any filter/limiting conditions cannot be applied on a table in DSV.
Filter/limiting conditions can be applied as part of the SQL expression in the Named Query in the DSV.

 

What is the purpose of setting Logical Keys and Relationships in DSV?

  • By default, the Data Source View Wizard detects the Physical Primary Keys and Relationships between the tables in the underlying source database and applies the same Keys and Relationships in DSV layer. However, Logical Keys and Relationships need to be defined explicitly in the following scenarios:
    • If the DSV table is referring to an underlying database View.
    • If the DSV table is created as a Named Query.
    • If any additional relationships need to be defined in the DSV layer apart from the ones that are physically defined in the underlying source database.

Is it possible to combine data from multiple data sources in SSAS?

  • SSAS allows combining data from multiple underlying data sources into a single DSV. To be able to add Table(s)/View(s) from multiple data sources, first you need to create a DSV using your first source and this source acts as the primary data source. Now after the initial DSV is created, you can add one or more data sources into DSV which will act as secondary data sources and you can choose additional Table(s)/View(s) from the secondary data sources which you want to include in your DSV.
  • The key thing while combining data from multiple data sources is that the Primary Data Source must support OPENROWSET queries. Hence in most cases, SQL Server is used as the Primary Data Source.
 
Q: What are the components of SSAS?
§     An OLAP Engine is used for enabling fast ad hoc  queries by end users. A user can interactively explore data by drilling, slicing or pivoting.
§     Drilling refers to the process of exploring details of the data.
§     Slicing refers to the process of placing data in rows and columns.
§     Pivoting refers to switching categories of data between rows and columns.
§     In OLAP, we will be using what are called as Dimensional Databases.
Q: What is FASMI ?
A database is called a OLAP Database if the database satisfies the FASMI  rules :
§     Fast Analysis– is defined in the OLAP scenario in five seconds or less.
§     Shared – Must support access  to data by many users in  the factors of Sensitivity and Write Backs.
§     Multidimensional – The data inside the OLAP Database must be multidimensional in structure.
§     Information – The OLAP database Must support large volumes of data..
Q: What languages are used in SSAS ?
§     Structured Query Language (SQL)
§     Multidimensional Expressions (MDX) - an industry standard query language orientated towards analysis
§     Data Mining Extensions (DMX) - an industry standard query language oriented toward data mining.
§     Analysis Services Scripting Language (ASSL) - used to manage Analysis Services database objects.
Q: How Cubes are implemented in SSAS ?
§     Cubes are multidimensional models that store data from one or more sources.
§     Cubes can also store aggregations
§     SSAS Cubes are created using the Cube Wizard.
§     We also build Dimensions when creating Cubes.
Cubes can see only the DSV( logical View).
Q: What is the difference between a derived measure and a calculated measure?
The difference between a derived measure and a calculated measure is when the calculation is performed. A derived measure is calculated before aggregations are created, and the values of the derived measure are stored in the cube. A calculated measure is calculated after aggregations are created, and the values of a calculated measure aren’t stored in the cube. The primary criterion for choosing between a derived measure and a calculated measure is not efficiency, but accuracy.
Q: What is a partition?
A partition in Analysis Services is the physical location of stored cube data. Every cube has at least one partition by default. Each time we create a measure group, another partition is created. Queries run faster against a partitioned cube because Analysis Services only needs to read data from the partitions that contain the answers to the queries. Queries run even faster when partition also stores aggregations, the pre calculated totals for additive measures. Partitions are a powerful and flexible means of managing cubes, especially large cubes.
Q: While creating a new calculated member in a cube what is the use of property
called non-empty behavior?
Nonempty behavior is important property for ratio calculations. If the denominator Is empty, an MDX expression will return an error just as it would if the denominator Were equal to zero. By selecting one or more measures for the Non-Empty Behavior property, we are establishing a requirement that each selected measure first be evaluated before the calculation expression is evaluated. If each selected measure is empty, then The expression is also treated as empty and no error is returned.
Q: What is a RAGGED hierarchy?
Under normal circumstances, each level in a hierarchy in Microsoft SQL Server 2005 Analysis Services (SSAS) has the same number of members above it as any other member at the same level. In a ragged hierarchy, the logical parent member of at least one member is not in the level immediately above the member. When this occurs, the hierarchy descends to different levels for different drilldown paths. Expanding through every level for every drilldown path is then unnecessarily complicated.
Q: What are the different ways of creating Aggregations?   
We can create aggregations for faster MDX statements using Aggregation Wizard or thru UBO – Usage Based Optimizations. Always, prefer UBO method in realtime performance troubleshooting.
Q: What is WriteBack? What are the pre-conditions?
The Enable/Disable Writeback dialog box enables or disables writeback for a measure group in a cube. Enabling writeback on a measure group defines a writeback partition and creates a writeback table for that measure group. Disabling writeback on a measure group removes the writeback partition but does not delete the writeback table, to avoid unanticipated data loss.
Q: How do I find the bottom 10 customers with the lowest sales in 2003 that were not null?

A: Simply using bottomcount will return customers with null sales. You will have to combine it with NONEMPTY or FILTER.

SELECT { [Measures].[Internet Sales Amount] } ON COLUMNS ,
BOTTOMCOUNT(
NONEMPTY(DESCENDANTS( [Customer].[Customer Geography].[All Customers]
, [Customer].[Customer Geography].[Customer] )
, ( [Measures].[Internet Sales Amount] ) )
, 10
, ( [Measures].[Internet Sales Amount] )
) ON ROWS
FROM [Adventure Works]
WHERE ( [Date].[Calendar].[Calendar Year].&[2003] ) ;
Q: How in MDX query can I get top 3 sales years based on order quantity?

By default Analysis Services returns members in an order specified during attribute design. Attribute properties that define ordering are "OrderBy" and "OrderByAttribute". Lets say we want to see order counts for each year. In Adventure Works MDX query would be:

SELECT {[Measures].[Reseller Order Quantity]} ON 0
, [Date].[Calendar].[Calendar Year].Members ON 1
FROM [Adventure Works];

Same query using TopCount:
SELECT
{[Measures].[Reseller Order Quantity]} ON 0,
TopCount([Date].[Calendar].[Calendar Year].Members,3, [Measures].[Reseller Order Quantity]) ON 1
FROM [Adventure Works];
Q: How do you extract first tuple from the set?

Use could usefunction Set.Item(0)
Example:

SELECT {{[Date].[Calendar].[Calendar Year].Members
}.Item(0)}
ON 0
FROM [Adventure Works]
Q: How can I setup default dimension member in Calculation script?

You can use ALTER CUBE statement. Syntax:
ALTER CUBE CurrentCube | YourCubeName UPDATE DIMENSION , DEFAULT_MEMBER=''