After settling down into a routine (classes and quizzes), its now time to dig a little deeper into some concepts. Similar to what was dealt with in class, two of the main concepts that I feel need some discussion are keys and dimensions ( having considerably spoken about facts in my previous post ). I will list some of the types of keys and dimensions that we may come across with examples.
1. Junk dimensions: We will frequently encounter business processes that may involve a number of miscellaneous attributes and indicators (yes/no) along with some flags. While we would normally like to create separate dimensions for each of these, it makes it easier to manage if a single dimension is created and all the data is entered into it.
For example in a transaction involving a item delivery, we might have several indicators like packaged, shipped, delivered, received, refunded etc. All of these attributes can be put into a junk dimension.
2. Outrigger dimensions: Very simply put, this is nothing but a dimension which has a reference to another dimension. For example car models dimensions can reference a separate dimension which represents a specific type of car. We should normally restrict the use of these dimensions and both of these should be in the fact table as separate foreign keys.
3. Conformed dimensions: When the attributes in separate dimensions have the same column names and the same meaning. A conformed dimension can refer to multiple tables. Two tables are said to be conformed if they are identical, or if one is the subset of the other. For example, in most organizations, the date dimension is the most common conformed dimension.
4. Shrunken dimensions: Shrunken dimensions are essentially conformed dimensions which contain a subset of rows and/or columns of a base dimension. This normally occurs when two dimensions are at the same level of detail but one contains only a small portion (subset) of the rows of the other table. For example, a month dimension could be a shrunken dimension of a date dimension.
Now having spoken about some of the dimensions that we would encounter, let us discuss some keys.
1. Degenerate dimension keys: We may face circumstances where a dimension key has no dimension table associated with it. This is most commonly seen when a fact tables grain is a single transaction. For example, if we have an invoice with no content associated with it, but we still need to model a particular invoice no as a key for the fact table.
2. Dimension surrogate keys: In essence, surrogate keys are those keys that join the dimension tables with the facts. However, sometimes, a single key can be used by multiple instances of the same entity. Therefore this key cannot be used since, over a period of time, there will be multiple rows in the dimension table that use this single key. Therefore, an integer system is used to keep track of keys, starting from 1, every time a new key is assigned. For example, when we model companies who are trying to integrate information from different sources, or after mergers and acquisitions, we need to use surrogate keys because we might have conflicting keys across dimensions.
3. Natural keys: A natural key is a key that exists in the real world. Where this is different from surrogate keys discussed above is that the surrogate keys have no meaning outside the database environments, while a natural key does. For example, a persons SSN could be a natural key.
4. Durable keys/ Durable Supernatural keys: Time to introduce a new concept (dimension ?) in data warehousing, time. A durable key is one that uniquely identifies a record over time. When we integrate data from multiple sources, the natural key might change. The way to tackle this is having keys in a format that is totally independent of the process model. For example, these could be simple integers whose values begin from 1.
I believe with these concepts we will gain a foothold in the vast data warehousing universe. in the words of Ronald Case, a Nobel Prize Laureate, torture the data, and it will confess to anything.
Reference : Wikipedia
http://www.kimballgroup.com/
1. Junk dimensions: We will frequently encounter business processes that may involve a number of miscellaneous attributes and indicators (yes/no) along with some flags. While we would normally like to create separate dimensions for each of these, it makes it easier to manage if a single dimension is created and all the data is entered into it.
For example in a transaction involving a item delivery, we might have several indicators like packaged, shipped, delivered, received, refunded etc. All of these attributes can be put into a junk dimension.
2. Outrigger dimensions: Very simply put, this is nothing but a dimension which has a reference to another dimension. For example car models dimensions can reference a separate dimension which represents a specific type of car. We should normally restrict the use of these dimensions and both of these should be in the fact table as separate foreign keys.
3. Conformed dimensions: When the attributes in separate dimensions have the same column names and the same meaning. A conformed dimension can refer to multiple tables. Two tables are said to be conformed if they are identical, or if one is the subset of the other. For example, in most organizations, the date dimension is the most common conformed dimension.
4. Shrunken dimensions: Shrunken dimensions are essentially conformed dimensions which contain a subset of rows and/or columns of a base dimension. This normally occurs when two dimensions are at the same level of detail but one contains only a small portion (subset) of the rows of the other table. For example, a month dimension could be a shrunken dimension of a date dimension.
Now having spoken about some of the dimensions that we would encounter, let us discuss some keys.
1. Degenerate dimension keys: We may face circumstances where a dimension key has no dimension table associated with it. This is most commonly seen when a fact tables grain is a single transaction. For example, if we have an invoice with no content associated with it, but we still need to model a particular invoice no as a key for the fact table.
2. Dimension surrogate keys: In essence, surrogate keys are those keys that join the dimension tables with the facts. However, sometimes, a single key can be used by multiple instances of the same entity. Therefore this key cannot be used since, over a period of time, there will be multiple rows in the dimension table that use this single key. Therefore, an integer system is used to keep track of keys, starting from 1, every time a new key is assigned. For example, when we model companies who are trying to integrate information from different sources, or after mergers and acquisitions, we need to use surrogate keys because we might have conflicting keys across dimensions.
3. Natural keys: A natural key is a key that exists in the real world. Where this is different from surrogate keys discussed above is that the surrogate keys have no meaning outside the database environments, while a natural key does. For example, a persons SSN could be a natural key.
4. Durable keys/ Durable Supernatural keys: Time to introduce a new concept (dimension ?) in data warehousing, time. A durable key is one that uniquely identifies a record over time. When we integrate data from multiple sources, the natural key might change. The way to tackle this is having keys in a format that is totally independent of the process model. For example, these could be simple integers whose values begin from 1.
I believe with these concepts we will gain a foothold in the vast data warehousing universe. in the words of Ronald Case, a Nobel Prize Laureate, torture the data, and it will confess to anything.
Reference : Wikipedia
http://www.kimballgroup.com/
No comments:
Post a Comment