Saturday, 16 May 2015

Data Warehouse Concepts - Part 2



Type 3: Very limited history (finite number of changes) is maintained (generally current and

previous). The original record is modified to reflect the change.


Customer Key
Customer ID
Name
State
Prev_State
Last_Eff_Dt
101
2345
John
California
Illinois
01-Jan-2003
In this approach the number of histories that are to be maintained has to be determined at the very

beginning of the warehouse or dimension design. Based on that, if there has to 2 histories/changes allowed (1 current + 2 history) then you would add 2 additional columns (current, prev1, prev2) in the dimension table so that each change is tracked in one column.

Major 4 Steps in Dimensional Design Process:

1.   Select the business process to model - Example for business processes include raw materials purchasing, orders, shipments, invoicing, inventory, and general ledger.

2.   Declare the grain of the business process - Declaring the grain means specifying exactly what an individual fact table row represents. Examples for grain include

a.    An individual boarding pass to get on a flight


b.    A daily snapshot of the inventory levels for each product in a warehouse c.     A monthly snapshot for each bank account
3.   Choose the dimensions that apply to each fact table row - Examples of common dimensions

include date, product, customer, transaction type, and status.


4.   Identify the numeric facts that will populate each fact table row - Examples include quantity ordered, dollar cost amount etc


 




Approaches for loading Dimension and Fact tables:

     Dimensions are loaded following their corresponding SCD type whether the Dimension adopts Type 1 or 2 or 3 strategy.

        •   Fact tables load approach is depicted in the below diagram

                                               





Below picture depicts a relatively complex data warehouse environment (ETL + Reporting):




Topics you may be further interested in:


Types of Dimensions: Conformed Dimension:
A conformed dimension is a dimension that means the same thing with every possible fact table to

which it can be joined.


Generally this means that a conformed dimension is identically the same dimension in each data mart. Examples of obvious conformed dimensions include customer, product, and location, deal
(promotion), and calendar (time).


Date dimension is regarded as mandatory dimension in any business process (fact table/star schema) and it would generally be a conformed dimension shared by several fact tables/star schemas.




 
Degenerated Dimension:


The natural key of the parent (from OLTP source) is generally left over as an orphan in the design process. It also means it is not mandatory that ORDER_ID (or SALE_ID, INVENTORY_ID) which is a primary key in thetransaction table at the source (also called natural key) to load into fact table in warehouse. However, though it is not mandatory, ORDER_ID is also loaded into the fact table along with all other dimension related foreign keys and measures into ORDERS fact table and here we call
ORDER_ID as degenerate (or empty) dimension.



In other simple words, degenerated dimension is a dimension key, such as a transaction number, invoice number, ticket number, or bill-of-lading number, that has no attributes and hence does not join to an actual dimension table and so are stored directly in fact table.



Here Order Number is neither a dimension attribute (represented by FKs) nor it is a measure

(represented by fact).


Factless Fact Table:


Transactions often are modeled as a fact table containing a series of keys, each representing a participating dimension in the transaction. These fact tables sometimes have no obvious numeric facts associated with them (like quantity sold or ordered, billed amount, payed amount etc...) and hence are called factless fact tables. See the data model below...


The only peculiarity in this example is that we don’t have a numeric fact tied to this registration data. As such, analyses of this data will be based largely on counts (not sum, average, min, max etc...)

Even though this fact table does not contain any measures, it can answer few interesting questions like...

   Which students registered for which courses?


   How many declared engineering majors are taking an out-of-major finance course?


    How many students have registered for a given faculty member’s courses during the last three years?

   How many students have registered for more than one course from a given faculty member?


Few Reporting Terms:


Drill across: It is used to analyse data across multiple hierarchies. In reports you do this by requesting similarly labeled data from two or more fact tables. This almost always involves separate queries which are merged together in a second pass by matching row headers.

Drill down: By drilling down you would go from summary data to detailed data within the same hierarchy. In reports, you do this by adding or replacing a row header to break down the rows of the answer set more finely.

Drill up (or Roll up): By drilling down you would go from detailed data to summary data within the same hierarchy. In reports you do this by removing or replacing a row header in a report to summarize the rows of the answer set.


All the Best ... !!!




1 comment:

  1. I feel this is the best way of providing some useful and more aspects of Informatica in order to make it more usable for the IT world.

    informatica read rest api

    ReplyDelete