Chapter 3
Descriptive Analytics 2: business intelligence and data warehousing.
1. Defining a Data Warehouse
What is a Data Warehouse (DW)?
- a repository of current and historical data …
- … of potential interest to employees throughout the organization …
- … with the main purpose of supporting decision making.
The historical road leading to the development of a DW:
Characteristics of DWs:
- Primary
- nonvolatile
- integrated data (providing synergy)
- time variant (time is the one important dimension that all DWs must support)
- Secondary / Additional
- support for web based applications
- real time support
- metadata support
Metadata: data about data
- in terms of usage
- in terms of pattern
- syntactic (describing syntax)
- structural (describing structure)
- semantic (describing meaning)
Three main types of DWs:
- Data Mart (DM)
- focus on a particular subject/department.
- if it is dependent on the DW, then consistency (with regards to how things are defined) should be provided.
- Operational Data Store (ODS)
- used as an interim staging area for a data warehouse.
- continously updated (similar to a short term memory).
- consolidates data from multiple sources.
- Enterprise Data Warehouse (EDW)
- a large scale DW that is used across the enterprise for decision support.
2. Building a Data Warehouse - Business / High Level Perspective
Business benefits of a DW:
- reduced infrastructure cost.
- increased infrastructure standardization.
- data synergy, consistency, and faster reporting/analytics.
- data timeliness.
- relieve processing from production systems.
The TDWI maturity model:
The two views on how to build a DW:
- the EDW approach (top down, Inmon)
- the DM approach (bottom up, Kimball)
Issues while building a DW:
- Main
- most DW “projects” fail, treat it as a “process” instead.
- besides, a DW should be in a countinous state of development, this is why using the word “process” is more suitable here (quantity and data sources are increased all the time).
- Others
- starting with the wrong sponsorship chain.
- setting expectations that you cannot meet.
- engaging in politically naive behavior.
- don’t say that it will help tem make better decisions (implies that they are currently making bad decisions).
- instead say that they will get the information they need when they need it to aid them in decision making.
- loading the DW with data just because it is available.
- choosing a DW manager who is technology oriented rather than user oriented.
- focusing only on traditional / internal data, and not supplementing it with non-traditional / external data.
- no consistency with regards to data defnitions.
- believing that your problems are over when the data warehouse is up and running.
- focusing on ad hoc data mining and periodic reporting instead of alerts.
In many organizations, a DW will be successful only if there is strong senior management support for its development and if there is a project champion who is high up in the organization chart.
Managing a DW:
- need for technical staff that possess a solid buisness understanding.
- need for employees across the data science pyramid (data analysts as well as data scientists) that possess a solid buisness understanding.
- need for people to handle security, privacy, and data governance.
3. Building a Data Warehouse - Technical / Medium Level Perspective
Generic DW layout:
Notes:
- Normally an ODS layer is added before the EDW.
- A DW architecture can be broken down into the following:
- backend / data acquisition.
- the data warehouse itself.
- frontend / client software.
Alternative DW architectures and their assessment:
4. Building a Data Warehouse - ETL Technologies and the Star Schema
Data integration technologies:
- enterprise application integration (EAI)
- service oriented architecture (SOA)
- enterprise information integration (EII)
- extract, transform, load (ETL)
Important points:
- When data are managed correctly as an enterprise asset, ETL efforts are significantly reduced, and redundant data are completely eliminated.
- This leads to huge savings in maintenance and greater effeciency in new development while also improving data quality.
The star schema is designed to provide fast query-reponse time, simplicity, and ease of maintenance for read-only database structures.
5. Data Lakes
Data Lake (DL)
“If you think of a data mart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.”
Important remarks:
- A DL is not a replacement for a DW, they are complementary.
- A DL is not DW 2.0.
- Use the best option for the job at hand.
A comparison between a DW and a DL:
6. New / Future Technologies and Trends
- web, social media, and big data
- open source software
- cloud computing
- columnar databases
- real-time DW
- DW appliances
- the need for a master data management
- in-database processing
- in-memory storage
Business performance Management (BPM) refers to the business processes, methodologies, metrics, and technologies used by enterprises to measure, monitor, and manage business performance.
Other notable remarks:
- All measurement is about comparisons, raw numbers are rarely of value.
- Key Performance Indicators (KPIs) can be of two types:
- lagging indicators – measure the output of past activity (often financial in nature, such as revenue)
- leading indicators – measure activities that have a significant impact on the lagging indicators (such as the number of sales leads)
Causal chain (as mentioned in the book):
learning and growth --> internal business process --> customer outcomes --> financial objective
In BSC, the term “balanced” arises because the combined set of measures is supposed to encompass indicators that are both:
- financial and nonfinancial
- leading and lagging
- internal and external
- quantitative and qualitative
- short term and long term
The DMAIC performance model:
- Define
- Measure
- Analyze
- Improve
- Control
10. BSC vs. Six Sigma