Thursday, June 27, 2013

Basics on SSAS DW

When should you use a Star and when should you use Snowflake Schema ?


Star Schema :  If the performance is the priority Than go for Star Schema, Since here dimension tables are denormalized.


Snowflake Schema : If memory space is the priority than go for Snowflake Schema, since here dimension tables are normalized

What are the Advantage and disadvantage of Snowflake Schema ?


Advantage of Snowflake Schema


·         The main advantage of Snowflake Schema is the improvement of query performance due to minimized disk storage requirements and joining smaller lookup tables.

·         It is easier to maintain.

·         Increase flexibility.



Disadvantage  of Snowflake Schema


·         The main disadvantage of the Snowflake Schema is the additional maintenance efforts needed to the increase number of lookup tables.

·         Makes the queries much more difficult to create because more tables need to be joined.


What is the difference between a data source and data source view ?


A Data Source represents a connection to a particular data source using one of the supplied providers. A Data Source View is an abstraction layer that you design using a data source. The latter also contains expressions that create calculated fields

What is the difference between online and offline mode in SSAS ?

Online mode means that you are directly connected to a particular SSAS database instance. Any changes you make are implemented on save. Offline mode means that you are creating metadata files that can be deployed to the SSAS server when you want. An important consideration for offline mode is that BIDS includes no granular conflict resolution

What aggregation types are available for measures in BIDS ?

The default aggregation is SUM. Other available aggregation types include the following: MIN, MAX, COUNT, DISTINCT COUNT, AVERAGE OF CHILDREN, FIRST CHILD, LAST CHILD, FIRST NON EMPTY, LAST NON EMPTY. You can also design custom aggregations using MDX

What is star schema in SSAS ?

A star schema is a type of modeling used as a basis for  building an OLAP cube. It consists of at least one fact table and many dimension tables. The rows in the dimension tables have a 1:M relationship with the rows in the fact table.

In a star schema every dimension table is related directly to the fact table

What is Cube ?

A cube is a set of related measures and dimensions that is used to analyze data. A Cube is the primary storage mechanism for SSAS data. It consists of at least one fact table and many dimension tables. The rows in these tables have a key relationship to each other.


What is fact table ?

A fact table is the center table surrounded by a dimension tables. All the dimension tables directly or indirectly connected with the fact table.

A fact table contains foreign keys to relate the rows in the fact table to the dimension table rows and facts. Facts are key performance indicators, for example net sales dollar, net sales unit, etc. Facts are usually numeric and additive.


What is a dimension table ?


Dimension table consists of one or more source tables. Dimension tables have three types of columns. The first is the original primary key for each row. The second is a new primary key (Or surrogate key) for each row. This key is used to guarantee uniqueness, because the original data could come from multiple sources. The third is a column for each attribute that describes each dimensional value.

What is the difference between SSAS 2005 and SSAS2008?

1.  In 2005 its not possible to create an empty cube but in 2008 we can create an empty cube.

2.  A new feature in Analysis Services 2008 is the Attribute Relationships tab in the Dimension Designer . to implement attribute relationship is complex in ssas 2005

3.  we can create ONLY 2000 partitions per Measure Group in ssas 2005 and the same limit of partitions is removed in ssas 2008

What is datawarehouse in short DWH?

The datawarehouse is an informational environment that

§ Provides an integrated and total view of the enterprise

§ Makes the enterprise’s current and historical information easily available for decision making

§ Makes decision-support transactions possible without hindering operational systems

§ Renders the organization’s information consistent

§ Presents a flexible and interactive source of strategic information

OR a warehouse is a

§ Subject oriented

§ Integrated

§ Time variant

§ Non volatile for doing decision support

What is data mart?

A data mart is a subset of an organizational data store, usually oriented to a specific purpose or major data subject that may be distributed to support business needs. Data marts are analytical data stores designed to focus on specific business functions for a specific community within an organization.

Data marts are often derived from subsets of data in a data warehouse, though in the bottom-up data warehouse design methodology the data warehouse is created from the union of organizational data marts.

They are 3 types of data mart they are

1.   Dependent

2.   Independent

3.   Logical data mart

Have you ever worked on performance tuning, if yes what are the steps involved in it?

We need to identify the bottlenecks to tune the performance, to overcome the bottleneck we need to follow the following.

1.   Avoid named queries

2.   Unnecessary relationships between tables

3.   Proper attribute relationships to be given

4.   Proper aggregation design

5.   Proper partitioning of data

6.   Proper dimension usage design

7.   Avoid unnecessary many to many relationships

8.   Avoid unnecessary measures

9.   Set AttributeHierarchyEnabled = FALSE to Attributes that is not required

10.   Won’t take even single measure which is not necessary

What is named calculation?

A named calculation is a SQL expression represented as a calculated column. This expression appears and behaves as a column in the table. A named calculation lets you extend the relational schema of existing tables or views in a data source view without modifying the tables or views in the underlying data source.

Named calculation is used to create a new column in the DSV using hard coded values or by using existing columns or even with both.

What is named query?

Named query in DSV is similar to View in Database. This is used to create Virtual table in DSV which will not impact the underlying database. Named query is mainly used to merge the two or more table in the datasource view or to filter columns of a table.

What is dimension table?

A dimension table contains hierarchical data by which you’d like to summarize. A dimension table contains specific business information, a dimension table that contains the specific name of each member of the dimension. The name of the dimension member is called an “attribute”

The key attribute in the dimension must contain a unique value for each member of the dimension. This key attribute is called “primary key column”

The primary key column of each dimension table corresponding to the one of the key column  in any related fact table.

What is fact table?

A fact table contains the basic information that you wish to summarize. The table that stores the detailed value for measure is called fact table. In simple and best we can define as “The table which contains METRICS” that are used to analyse the business.

It consists of 2 sections

1) Foregine key to the dimesion

2) measures/facts(a numerical value that used to monitor business activity)

What is Factless fact table?

This is very important interview question. The “Factless Fact Table” is a table which is similar to Fact Table except for having any measure; I mean that this table just has the links to the dimensions. These tables enable you to track events; indeed they are for recording events.

Factless fact tables are used for tracking a process or collecting stats. They are called so because, the fact table does not have aggregatable numeric values or information. They are mere key values with reference to the dimensions from which the stats can be collected

§ What is attribute relationships, why we need it?

Attribute relationships are the way of telling the analysis service engine that how the attributes are related with each other. It will help to relate two or more  attributes to each other.Processing time will be decreased if proper relationships are given. This increases the Cube Processing performance and MDX query performance too.

In Microsoft SQL Server Analysis Services, attributes within a dimension are always related either directly or indirectly to the key attribute. When you define a dimension based on a star schema, which is where all dimension attributes are derived from the same relational table, an attribute relationship is automatically defined between the key attribute and each non-key attribute of the dimension. When you define a dimension based on a snowflake schema, which is where dimension attributes are derived from multiple related tables, an attribute relationship is automatically defined as follows:

§ Between the key attribute and each non-key attribute bound to columns in the main dimension table.

§ Between the key attribute and the attribute bound to the foreign key in the secondary table that links the underlying dimension tables.

§ Between the attribute bound to foreign key in the secondary table and each non-key attribute bound to columns from the secondary table.

§ How many types of attribute relationships are there?

They are 2 types of attribute relationships they are

1.   Rigid

2.   Flexible

Rigid: In Rigid relationships  where the relationship between the attributes is fixed, attributes will not change levels or their respective attribute relationships.

Example: The time dimension. We know that month “January 2009″ will ONLY belong to Year “2009″ and it wont be moved to any other year.

Flexible :   In Flexible relationship between the attributes is changed.

Example: An employee and department. An employee can be in accounts department today but it is possible that the employee will be in Marketing department tomorrow.

How many types of dimensions are there and what are they?

They are 3 types of dimensions:

1.   confirm dimension

2.   junk dimension

3.   degenerate attribute

What are confirmed dimensions, junk dimension and degenerated dimensions?

Confirm dimension: It is the dimension which is sharable across the multiple facts or data model. This is also called as Role Playing Dimensions.

junk dimension: A number of very small dimensions might be lumped (a small irregularly shaped) together to form a single dimension, a junk dimension – the attributes are not closely related. Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension is known as junk dimension.

Degenerated dimension: In this degenerate dimension contains their values in fact table and the dimension id not available in dimension table. Degenerated Dimension is a dimension key without corresponding dimension.

Example: In the PointOfSale Transaction Fact table, we have:

Date Key (FK), Product Key (FK), Store Key (FK), Promotion Key (FP), and POS Transaction Number

Date Dimension corresponds to Date Key, Production Dimension corresponds to Production Key. In a traditional parent-child database, POS Transactional Number would be the key to the transaction header record that contains all the info valid for the transaction as a whole, such as the transaction date and store identifier. But in this dimensional model, we have already extracted this info into other dimension. Therefore, POS Transaction Number looks like a dimension key in the fact table but does not have the corresponding dimension table.

How will you hide an attribute?

We can hide the attribute by selecting “AttributeHierarchyVisible = False” in properties of the attribute.

 How will you make an attribute not process?

By selecting  “ AttributeHierarchyEnabled = False”, we can make an  attribute not in process.

What is use of IsAggregatable property?

In Analysis Service we generally see all dimension has All member. This is because of IsAggregatable property of the attribute. You can set its value to false, so that it will not show All member. Its default member for that attribute. If you hide this member than you will have to set other attribute value to default member else it will pick some value as default and this will create confusion in browsing data if someone is not known to change in default member.

What are key, name and value columns of an attribute?

Key column of any attribute: Contains the column or columns that represent the key for the attribute, which is the column in the underlying relational table in the data source view to which the attribute is bound. The value of this column for each member is displayed to users unless a value is specified for the NameColumn property.

Name  column of an attribute: Identifies the column that provides the name of the attribute that is displayed to users, instead of the value in the key column for the attribute. This column is used when the key column value for an attribute member is cryptic or not otherwise useful to the user, or when the key column is based on a composite key. The NameColumn property is not used in parent-child hierarchies; instead, the NameColumn property for child members is used as the member names in a parent-child hierarchy.

Value columns of an attribute: Identifies the column that provides the value of the attribute. If the NameColumn element of the attribute is specified, the same DataItem values are used as default values for the ValueColumn element. If the NameColumn element of the attribute is not specified and the KeyColumns collection of the attribute contains a single KeyColumn element representing a key column with a string data type, the same DataItem values are used as default values for the ValueColumn element.

AttributeHierarchyDisplayFolder: Identifies the folder in which to display the associated attribute hierarchy to end users. For example if I set the property value as “Test” to all the Attributes of a dimension then a folder with the name “Test” will be created and all the Attributes will be placed into the same.

AttributeHierarchyOptimizedState: Determines the level of optimization applied to the attribute hierarchy. By default, an attribute hierarchy is FullyOptimized, which means that Analysis Services builds indexes for the attribute hierarchy to improve query performance. The other option, NotOptimized, means that no indexes are built for the attribute hierarchy. Using NotOptimized is useful if the attribute hierarchy is used for purposes other than querying, because no additional indexes are built for the attribute. Other uses for an attribute hierarchy can be helping to order another attribute.

AttributeHierarchyOrdered: Determines whether the associated attribute hierarchy is ordered. The default value is True. However, if an attribute hierarchy will not be used for querying, you can save processing time by changing the value of this property to False.



No comments:

Post a Comment