Friday, September 15, 2006

The Kimball Method

A dimensional (denormalized) data warehouse is the product of a series of translations.

1. Data Sources are analyzed.
2. A normalized model called the Operational Data Store is created to represent the enterprise.
3. A denormalized Dimensional Data Store is created based on the normalized Operational Data Store.
4. ETL routines are written to move data from the Source Systems to the Operational Data Store
5. ETL routines are written to move data from the Operational Data Store to the Dimensional Data Store
6. Data is then extracted transformed and loaded from the Data Sources into the Operational Data Store
7. Data is then extracted transformed and loaded from the Operational Data Store into the Dimensional Data Store
8. Analytical tools are used against the Dimensional Data Store for reporting
9. The two step ETL process (6, 7 & 8) is repeated on a schedule as frequently as every day

This is the Kimball method.

Similar processes are used to create subsets that function as Data Marts.

I can't comment on Inmon's strategy except that he is in favor of normalized models (Operational Data Stores or even properly normalized Source Systems) being used without modification as processing power increases and makes the performance barriers of multiple joins go down. Kimball's rebuttal is that a denormalized dimensional model is easier to grasp by the end user than a complex normalized data model.
__________________

Wednesday, September 06, 2006

Borland Together

Monday, September 04, 2006

Logic's Role in Analysis

1. Deductive: What applies to all cases applies to all observed cases. This is possibility. Philosophy.
2. Inductive: What applies to all observed cases applies to all cases. This is probability. Science.
3. Didactic: What applies to one case applies to all observed cases. This is morality. Fable.
4. Anecdotal: What applies to one observed case applies to all cases. This is biography. History.

All of these forms of logic come into play when you are designing a database. Preference has to be given to inductive logic because it is using repeated, measured experience to guide design.

Deductive reasoning often happens when the governing user dictates the business rules without consulting with the frontline users.

Didactic reasoning is encountered when scenarios are invented to deal with edge cases that have never happened.

Anecdotal reasoning is encountered when scenarios that only occurred once are factored into the design.

Deductive, Didactic and Anecdotal reasoning bogs down an otherwise clean analysis and design. Recognize them for what they are and work them out during analysis if possible and definitely during design.
A categorization of data models

Note that all of the following are Data Models.
  1. An Entity Relationship Diagram (ERD) only shows entities and relationships between entities. It is the simplest logical model.
  2. A Key-Based Model shows the entities, their primary keys, alternate keys, foreign keys and the relationships between entities. It is the intermediate logical model.
  3. A Fully Attributed Model shows entities, keys, non-key attributes and the relationships between entities. It is the final logical model.
  4. A Physical Model converts entities to tables, attributes to columns and applies the constraints of the host RDBMS to the Logical Model.
  5. The Database Schema is the implementation of the Physical Model on the host RDBMS.