Abdulrahman AlQallaf

Decluttering my mind into the web ...









Chapter 3

Descriptive Analytics 2: business intelligence and data warehousing.





1. Defining a Data Warehouse



What is a Data Warehouse (DW)?

  • a repository of current and historical data …
  • … of potential interest to employees throughout the organization …
  • … with the main purpose of supporting decision making.


The historical road leading to the development of a DW:

figure 3.2



Characteristics of DWs:

  • Primary
    • nonvolatile
    • integrated data (providing synergy)
    • time variant (time is the one important dimension that all DWs must support)
  • Secondary / Additional
    • support for web based applications
    • real time support
    • metadata support

Metadata: data about data

  • in terms of usage
    • technical
    • business
  • in terms of pattern
    • syntactic (describing syntax)
    • structural (describing structure)
    • semantic (describing meaning)


Three main types of DWs:

  • Data Mart (DM)
    • focus on a particular subject/department.
    • if it is dependent on the DW, then consistency (with regards to how things are defined) should be provided.
  • Operational Data Store (ODS)
    • used as an interim staging area for a data warehouse.
    • continously updated (similar to a short term memory).
    • consolidates data from multiple sources.
  • Enterprise Data Warehouse (EDW)
    • a large scale DW that is used across the enterprise for decision support.



2. Building a Data Warehouse - Business / High Level Perspective



Business benefits of a DW:

  • reduced infrastructure cost.
  • increased infrastructure standardization.
  • data synergy, consistency, and faster reporting/analytics.
  • data timeliness.
  • relieve processing from production systems.


The TDWI maturity model:

https__dataonwheels_files_wordpress_com_2013_01_image1.png



The two views on how to build a DW:

  1. the EDW approach (top down, Inmon)
  2. the DM approach (bottom up, Kimball)

table 3.3
table 3.4



Issues while building a DW:

  • Main
    • most DW “projects” fail, treat it as a “process” instead.
    • besides, a DW should be in a countinous state of development, this is why using the word “process” is more suitable here (quantity and data sources are increased all the time).
  • Others
    • starting with the wrong sponsorship chain.
    • setting expectations that you cannot meet.
    • engaging in politically naive behavior.
      • don’t say that it will help tem make better decisions (implies that they are currently making bad decisions).
      • instead say that they will get the information they need when they need it to aid them in decision making.
    • loading the DW with data just because it is available.
    • choosing a DW manager who is technology oriented rather than user oriented.
    • focusing only on traditional / internal data, and not supplementing it with non-traditional / external data.
    • no consistency with regards to data defnitions.
    • believing that your problems are over when the data warehouse is up and running.
    • focusing on ad hoc data mining and periodic reporting instead of alerts.

In many organizations, a DW will be successful only if there is strong senior management support for its development and if there is a project champion who is high up in the organization chart.


Managing a DW:

  • need for technical staff that possess a solid buisness understanding.
  • need for employees across the data science pyramid (data analysts as well as data scientists) that possess a solid buisness understanding.
  • need for people to handle security, privacy, and data governance.



3. Building a Data Warehouse - Technical / Medium Level Perspective



Generic DW layout:

figure 3.4

Notes:

  • Normally an ODS layer is added before the EDW.
  • A DW architecture can be broken down into the following:
    • backend / data acquisition.
    • the data warehouse itself.
    • frontend / client software.


Alternative DW architectures and their assessment:

figure 3.8

table 3.1



4. Building a Data Warehouse - ETL Technologies and the Star Schema



Data integration technologies:

  • enterprise application integration (EAI)
  • service oriented architecture (SOA)
  • enterprise information integration (EII)
  • extract, transform, load (ETL)

Important points:

  • When data are managed correctly as an enterprise asset, ETL efforts are significantly reduced, and redundant data are completely eliminated.
  • This leads to huge savings in maintenance and greater effeciency in new development while also improving data quality.


The star schema is designed to provide fast query-reponse time, simplicity, and ease of maintenance for read-only database structures.

figure 3.10



5. Data Lakes



Data Lake (DL)

“If you think of a data mart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.”

Important remarks:

  • A DL is not a replacement for a DW, they are complementary.
  • A DL is not DW 2.0.
  • Use the best option for the job at hand.


A comparison between a DW and a DL:

table 3.6





  • web, social media, and big data
  • open source software
  • cloud computing
  • columnar databases
  • real-time DW
  • DW appliances
  • the need for a master data management
  • in-database processing
  • in-memory storage



7. Business performance Management (BPM)



Business performance Management (BPM) refers to the business processes, methodologies, metrics, and technologies used by enterprises to measure, monitor, and manage business performance.

figure 3.12



Other notable remarks:

  • All measurement is about comparisons, raw numbers are rarely of value.
  • Key Performance Indicators (KPIs) can be of two types:
    • lagging indicators – measure the output of past activity (often financial in nature, such as revenue)
    • leading indicators – measure activities that have a significant impact on the lagging indicators (such as the number of sales leads)



8. Performance Measurement Systems - Balance Scorecards (BSC)



figure 3.13


Causal chain (as mentioned in the book):

learning and growth --> internal business process --> customer outcomes --> financial objective

In BSC, the term “balanced” arises because the combined set of measures is supposed to encompass indicators that are both:

  • financial and nonfinancial
  • leading and lagging
  • internal and external
  • quantitative and qualitative
  • short term and long term



9. Performance Measurement Systems - Six Sigma



The DMAIC performance model:

  1. Define
  2. Measure
  3. Analyze
  4. Improve
  5. Control



10. BSC vs. Six Sigma



table 3.7