Skip to content

Data Warehouse

Data Warehouse

It is a system (including processes, technologies & data representations) that enables us to support analytical processes.

  • It represents an abstracted picture of the business organized by subject area.
  • It is highly transformed and structured.
  • Data is not loaded to the data warehouse until the use for it has been defined.
  • It generally follows an established methodology.

When a data warehouse is developed, a significant amount of effort occurs during the initial stages to analyze data sources and understand business processes. Decisions are made regarding what data to include and exclude from the warehouse.

A Business Perspective

You are in charge of a retailer’s data infrastructure. Let’s look at some business activities.

  • Customers should be able to find goods & make orders
  • Inventory Staff should be able to stock, retrieve, and re-order goods
  • Delivery Staff should be able to pick up & deliver goods
  • HR should be able to assess the performance of sales staff
  • Marketing should be able to see the effect of different sales channels
  • Management should be able to monitor sales growth

Ask yourself:

  • Can I build a database to support these activities?
  • Are all of the above questions of the same nature?

Let's take a closer look at details that may affect your data infrastructure.

  • Retailer has a nation-wide presence → Scale?
  • Acquired smaller retailers, brick & mortar shops, online store → Single database? Complexity?
  • Has support call center & social media accounts → Tabular data?
  • Customers, Inventory Staff and Delivery staff expect the system to be fast & stable → Performance
  • HR, Marketing & Sales Reports want a lot information but have not decided yet on everything they need → Clear Requirements?

A Technical Perspective

A data warehouse is a copy of transaction data specifically structured for query and analysis.

A data warehouse is a subject-oriented, integrated, nonvolatile and time-variant collection of data in support of management's decisions.

A data warehouse is a system that retrieves and consolidates data periodically from source systems into a dimensional or normalized data store. It usually keeps years of history and is queried for business intelligence or other analytical activities. It is typically updated in batches, not every time a transaction happens in the source system.

Data Warehouse Goals

  • Simple to understand
  • Performant
  • Quality Assured
  • Handles new questions well
  • Secure

A data warehouse only includes data that is processed (structured) and only the data that is necessary to use for reporting or to answer specific business questions.

Facts & Dimenstions

If you are unsure if a column is a fact or dimension, the simplest rule is that a fact is usually: Numeric & Additive

Fact tables:

  • Record business events, like an order, a phone call, a book review
  • Fact tables columns record events recorded in quantifiable metrics like quantity of an item, duration of a call, a book rating.

Dimension tables:

  • Record the context of the business events, e.g: who, what, where, why etc.
  • Dimension tables columns contain attributes like the store at which an item is purchased, or the customer who make the call, etc.