When should you use a
Star and when should you use Snowflake Schema ?
Star Schema : If the performance is the priority Than
go for Star Schema, Since here dimension tables are denormalized.
Snowflake Schema : If memory space is the priority than go for Snowflake Schema,
since here dimension tables are normalized
What are the Advantage
and disadvantage of Snowflake Schema ?
Advantage of Snowflake
Schema
·
The main advantage of
Snowflake Schema is the improvement of query performance due to minimized disk
storage requirements and joining smaller lookup tables.
·
It is easier to
maintain.
·
Increase flexibility.
Disadvantage of
Snowflake Schema
·
The main disadvantage
of the Snowflake Schema is the additional maintenance efforts needed to the
increase number of lookup tables.
·
Makes the queries much
more difficult to create because more tables need to be joined.
What is the difference
between a data source and data source view ?
A Data Source represents a connection to a particular data
source using one of the supplied providers. A Data Source View is an
abstraction layer that you design using a data source. The latter also contains
expressions that create calculated fields
What is the difference
between online and offline mode in SSAS ?
Online
mode means that you are directly connected to a particular SSAS database
instance. Any changes you make are implemented on save. Offline mode means that
you are creating metadata files that can be deployed to the SSAS server when
you want. An important consideration for offline mode is that BIDS includes no
granular conflict resolution
What aggregation types are available for measures in BIDS ?
The
default aggregation is SUM. Other available aggregation types include the
following: MIN, MAX, COUNT, DISTINCT COUNT, AVERAGE OF CHILDREN, FIRST CHILD,
LAST CHILD, FIRST NON EMPTY, LAST NON EMPTY. You can also design custom
aggregations using MDX
What is star schema in
SSAS ?
A star schema is a type of modeling used as a basis for
building an OLAP cube. It consists of at least one fact table and many
dimension tables. The rows in the dimension tables have a 1:M
relationship with the rows in the fact table.
In a
star schema every dimension table is related directly to the fact table
What is Cube ?
A cube is a set of related measures and
dimensions that is used to analyze data. A Cube is the primary storage
mechanism for SSAS data. It consists of at least one fact table and many
dimension tables. The rows in these tables have a key relationship to each
other.
What is fact table ?
A fact table is the center table surrounded by
a dimension tables. All the dimension tables directly or indirectly connected
with the fact table.
A fact table contains foreign keys to relate
the rows in the fact table to the dimension table rows and facts. Facts are key
performance indicators, for example net sales dollar, net sales unit, etc.
Facts are usually numeric and additive.
What is a dimension
table ?
Dimension table consists of one or more source tables. Dimension
tables have three types of columns. The first is the original primary key for
each row. The second is a new primary key (Or surrogate key) for each row. This
key is used to guarantee uniqueness, because the original data could come from
multiple sources. The third is a column for each attribute that describes each
dimensional value.
What is the difference between SSAS 2005 and SSAS2008?
1. In 2005 its not possible to create an empty
cube but in 2008 we can create an empty cube.
2. A new feature in Analysis Services 2008 is the
Attribute Relationships tab in the Dimension Designer . to implement attribute
relationship is complex in ssas 2005
3. we can create ONLY
2000 partitions per Measure Group in ssas 2005 and the same limit of
partitions is removed in ssas 2008
What is datawarehouse in short DWH?
The
datawarehouse is an informational environment that
§ Provides an integrated and total
view of the enterprise
§ Makes the enterprise’s current
and historical information easily available for decision making
§ Makes decision-support
transactions possible without hindering operational systems
§ Renders the organization’s
information consistent
§ Presents a flexible and
interactive source of strategic information
OR
a warehouse is a
§ Subject oriented
§ Integrated
§ Time variant
§ Non volatile for doing decision
support
What is data mart?
A data mart is
a subset of an organizational data store, usually oriented to a specific
purpose or major data subject that may be distributed to support business
needs. Data marts are analytical data stores designed to focus on specific
business functions for a specific community within an organization.
Data marts are
often derived from subsets of data in a data warehouse, though in the bottom-up data warehouse design methodology the
data warehouse is created from the union of organizational data marts.
They are 3
types of data mart they are
1. Dependent
2. Independent
3. Logical data mart
Have you ever worked on performance tuning, if yes
what are the steps involved in it?
We need to
identify the bottlenecks to tune the performance, to overcome the bottleneck
we need to follow the following.
1. Avoid named queries
2. Unnecessary relationships between
tables
3. Proper attribute relationships to
be given
4. Proper aggregation design
5. Proper partitioning of data
6. Proper dimension usage design
7. Avoid unnecessary many to many
relationships
8. Avoid unnecessary measures
9. Set AttributeHierarchyEnabled =
FALSE to Attributes that is not required
10. Won’t take even single measure
which is not necessary
What is named calculation?
A named
calculation is a SQL expression represented as a calculated column. This
expression appears and behaves as a column in the table. A named calculation
lets you extend the relational schema of existing tables or views in a data
source view without modifying the tables or views in the underlying data
source.
Named
calculation is used to create a new column in the DSV using hard coded values
or by using existing columns or even with both.
What is named query?
Named query in
DSV is similar to View in Database. This is used to create Virtual table in DSV
which will not impact the underlying database. Named query is mainly used to
merge the two or more table in the datasource view or to filter columns of a
table.
What is dimension table?
A dimension
table contains hierarchical data by which you’d like to summarize. A dimension
table contains specific business information, a dimension table that contains
the specific name of each member of the dimension. The name of the dimension
member is called an “attribute”
The key
attribute in the dimension must contain a unique value for each member of the
dimension. This key attribute is called “primary key column”
The primary key
column of each dimension table corresponding to the one of the key column
in any related fact table.
What is fact table?
A fact table
contains the basic information that you wish to summarize. The table that
stores the detailed value for measure is called fact table. In simple and best
we can define as “The table which contains METRICS” that are used to analyse
the business.
It consists of
2 sections
1) Foregine key
to the dimesion
2)
measures/facts(a numerical value that used to monitor business activity)
What is Factless fact table?
This is very
important interview question. The “Factless Fact Table” is a table which is
similar to Fact Table except for having any measure; I mean that this table
just has the links to the dimensions. These tables enable you to track
events; indeed they are for recording events.
Factless fact
tables are used for tracking a process or collecting stats. They are called so
because, the fact table does not have aggregatable numeric values or
information. They are mere key values with reference to the dimensions from
which the stats can be collected
§ What is attribute relationships,
why we need it?
Attribute
relationships are the way of telling the analysis service engine that how the
attributes are related with each other. It will help to relate two or
more attributes to each other.Processing time will be decreased if proper
relationships are given. This increases the Cube Processing performance and MDX
query performance too.
In
Microsoft SQL Server Analysis Services, attributes within a dimension
are always related either directly or indirectly to the key attribute. When you
define a dimension based on a star schema, which is where all dimension
attributes are derived from the same relational table, an attribute
relationship is automatically defined between the key attribute and each
non-key attribute of the dimension. When you define a dimension based on a
snowflake schema, which is where dimension attributes are derived from multiple
related tables, an attribute relationship is automatically defined as follows:
§ Between the key attribute and
each non-key attribute bound to columns in the main dimension table.
§ Between the key attribute and the
attribute bound to the foreign key in the secondary table that links the
underlying dimension tables.
§ Between the attribute bound to
foreign key in the secondary table and each non-key attribute bound to columns
from the secondary table.
§ How many types of attribute
relationships are there?
They are 2
types of attribute relationships they are
1. Rigid
2. Flexible
Rigid: In Rigid relationships
where the relationship between the attributes is fixed, attributes will not
change levels or their respective attribute relationships.
Example: The
time dimension. We know that month “January 2009″ will ONLY belong to Year
“2009″ and it wont be moved to any other year.
Flexible
: In Flexible relationship between the attributes is changed.
Example: An
employee and department. An employee can be in accounts department today but it
is possible that the employee will be in Marketing department tomorrow.
How many types of dimensions are there and what are
they?
They are 3
types of dimensions:
1. confirm dimension
2. junk dimension
3. degenerate attribute
What are confirmed dimensions, junk dimension and
degenerated dimensions?
Confirm
dimension: It is the dimension which is sharable across the multiple
facts or data model. This is also called as Role Playing Dimensions.
junk dimension:
A number of very small dimensions might be lumped (a small irregularly shaped)
together to form a single dimension, a junk dimension – the attributes are not
closely related. Grouping of Random flags and text Attributes in a dimension
and moving them to a separate sub dimension is known as junk dimension.
Degenerated
dimension: In this degenerate dimension contains their values in fact
table and the dimension id not available
in dimension table. Degenerated Dimension is a dimension key
without corresponding dimension.
Example: In the
PointOfSale Transaction Fact table, we have:
Date Key (FK),
Product Key (FK), Store Key (FK), Promotion Key (FP), and POS
Transaction Number
Date Dimension
corresponds to Date Key, Production Dimension corresponds to Production Key. In
a traditional parent-child database, POS Transactional Number would be the
key to the transaction header record that contains all the info valid for the
transaction as a whole, such as the transaction date and
store identifier. But in this dimensional model, we have already
extracted this info into other dimension. Therefore, POS Transaction
Number looks like a dimension key in the fact table but does not have the
corresponding dimension table.
How will you hide an attribute?
We can hide the
attribute by selecting “AttributeHierarchyVisible = False” in properties of the
attribute.
How will you make an attribute not process?
By
selecting “ AttributeHierarchyEnabled = False”, we can make an
attribute not in process.
What is use of IsAggregatable property?
In Analysis
Service we generally see all dimension has All member. This is because of
IsAggregatable property of the attribute. You can set its value to false, so
that it will not show All member. Its default member for that attribute.
If you hide this member than you will have to set other attribute value to
default member else it will pick some value as default and this will create
confusion in browsing data if someone is not known to change in default member.
What are key, name and value columns of an attribute?
Key column of
any attribute: Contains the column or columns that represent the key for the
attribute, which is the column in the underlying relational table in the data
source view to which the attribute is bound. The value of this column for each
member is displayed to users unless a value is specified for the NameColumn
property.
Name
column of an attribute: Identifies the column that provides the name of the
attribute that is displayed to users, instead of the value in the key column
for the attribute. This column is used when the key column value for an
attribute member is cryptic or not otherwise useful to the user, or when the
key column is based on a composite key. The NameColumn property is not used in
parent-child hierarchies; instead, the NameColumn property for child members is
used as the member names in a parent-child hierarchy.
Value columns
of an attribute: Identifies the column that provides the value of the
attribute. If the NameColumn element of the attribute is specified, the same
DataItem values are used as default values for the ValueColumn element. If the
NameColumn element of the attribute is not specified and the KeyColumns
collection of the attribute contains a single KeyColumn element representing a
key column with a string data type, the same DataItem values are used as
default values for the ValueColumn element.
AttributeHierarchyDisplayFolder: Identifies the folder in which to display the associated
attribute hierarchy to end users. For example if I set the property value as
“Test” to all the Attributes of a dimension then a folder with the name “Test”
will be created and all the Attributes will be placed into the same.
AttributeHierarchyOptimizedState: Determines the level of
optimization applied to the attribute hierarchy. By default, an attribute
hierarchy is FullyOptimized, which means that Analysis Services builds indexes
for the attribute hierarchy to improve query performance. The other option,
NotOptimized, means that no indexes are built for the attribute hierarchy.
Using NotOptimized is useful if the attribute hierarchy is used for
purposes other than querying, because no additional indexes are built for the
attribute. Other uses for an attribute hierarchy can be helping to order
another attribute.
AttributeHierarchyOrdered: Determines whether the associated
attribute hierarchy is ordered. The default value is True. However, if an
attribute hierarchy will not be used for querying, you can save processing time
by changing the value of this property to False.