Tuesday, January 21, 2014

Scaling the heights of BI, one grain at a time.

My first step into the world of Business Intelligence, begins with the grain. No amount of emphasis can be made on the importance of this aspect. The Grain, (The Holy Grain as I would like to call it), is the single basic definition of the rows in a fact table, and by extension, the dimensions and the entire star schema.

For example when considering a purchase made at a vendors store, a decision needs to be made, whether the grain, and by extension, the fact table, needs to model the transaction, or the individual purchase of the item. Now, if we were to go with modelling the transaction as a whole, our fact table entries would contain all the specifics regarding the particular transaction including the bill amount, date, no of items purchased etc. On the other hand, if we were to model the individual purchase, we have on our hands, a whole bunch of fact tables which would need to be consolidated for every individual item that is purchased.

The above example is clear proof of the fact that, the choice of the grain can affect, not only how we design the fact tables and the dimensions, but also can alter the entire definition of the system.

Now that we have the grain and its atomicity established, let us discuss facts and dimensions. On a high level, a fact is nothing but real world data that has been recorded, in a business transaction or a business process. Each and every row in the fact table corresponds to the data that can be measured and related to that process. Generally, the data in the fact table is used to make calculations and manipulations in order to generate reports.

A dimension can be thought of as an extension to the fact table in that they contain all the information corresponding to and which describes the particular fact that is related to it. A very common dimension in most of the star schemas is the date.

Now that the concepts have been drilled into my head, I feel confident enough to tackle all the other challenges thrown at me. Now that the first steps have been taken, there is no turning back.


References : The Data Warehouse Toolkit 3rd Edition
                    Wikipedia
                    Class Slides