Comprehensive datawarehouse project report for Horizon Telecom

Data Warehouse Report for Horizon
Contents Data WareHouse Report for Horizon Introduction Need for data Warehouse Business Objectives Architectural components of the Data warehouse Source Data Data Staging Data Storage Information Delivery Metadata Dimensional Model Business Processes: Facts and Dimensions: Dimensional Model: Customer Churn Analysis: Rate Plan Analysis: Sales Representative Analysis: Channel Analysis: OLAP Reports Rate Plan Analysis Sales Rep Analysis Channel Performance Analysis Customer Churn analysis Appendices Infrastructure Meta Data Size of Data warehouse Peer Review Forms References 
 

Introduction

 

Need for data Warehouse

Horizon, a leading telecommunications giant was looking for ways to improve their data storage to answers myriad of business questions that can help improve upon their businesses. They decided to implement a data warehouse to store millions of records collected every day to analyze their billing revenue, rate plans, sales representative’s performance, etc. A data warehouse would open new realms of information from the peta bytes of data stored daily. Horizon approached Analytrix, a leading name in the field of data science to implement a data warehouse in multiple phases to answer a set of business questions.
 
Every day Horizon generates millions of records from each service line. A customer may have one or more service lines and hence, his bill may include multiple service lines’ details. Each service line incorporates multitude of details like customer personal details, demographics, rate plan details, taxes, number of minutes used, monthly data usage, etc. All this data can be used for Business Intelligence. It can be stored at a central location to support analysis functions. It can be used to make informed decisions by quickly accessing large amount from central data source. Data from multiple sources is cleaned and transformed into a common data set to store data in a standard format in a warehouse. It also stores large amount of historical data which makes it easier to study trends and patterns of customer usage, rate plan analysis, sales representative performance analysis, etc. It is very easy to create reports for answering business questions. Users can access data through GUI and create custom reports. Since Time is an important dimension in every data warehouse, users can drill down or drill up into details. They can slice and dice the data to look for more meaningful data in the context of the reports. All these reasons make data warehouse an ideal investment for Horizon.
 

Business Objectives

 
Horizon wants to implement a data warehouse in several iterations, with the first phase being that of its billing data. This data is very rich and diverse. The business wants Analytrix to review the data and on the higher level perform the following analysis,
  1. Revenue generated
  2. Monthly call and data usage
  3. Rate Plan Analysis
  4. Sales Representative Analysis
  5. Channel Performance Analysis
  6. Customer retention and churn
Within each analysis, the business wants Analytrix to analyze several trends hidden in the data and answer the following business questions,
  1. Rate Plan Analysis
Horizon has had 5 rate plans called Loved Ones, Night Calling, International, Family Plan, and Gossip Plan. For these plans the BI questions are,
  1. What is the trend in call/data usage from different rate plans in the past 4 years?
  2. What is the increase/decrease in revenue generated from different rate plans?
  3. Usage pattern based on marital status?
  4. Usage pattern based on network called?
  5. Average call duration per hour for different networks called?
 
2. Sales Representative Analysis
3. Channel analysis
4. Customer Churn analysis
In addition to these requirements, we went a step ahead and performed Customer Churn Analysis. The competition in the telecommunications market is so fierce today that customers have the privilege of choosing the best fit for them and enjoying a smooth transition to another service provider. It is becoming increasingly difficult to retain customers. With this in mind, Analytix sought out to gain an understanding on customer usage pattern of those that have changed their service providers from Horizon to a new one.
 

Architectural components of the Data warehouse

 
The architecture which we have implemented for our data warehouse is a centralized data warehouse. The diagram below gives the summary of the architecture and then the various building blocks of the warehouse architecture are explained.
 
 
The major building blocks of our data warehouse are listed below:-
  • Source Data
  • Data staging
  • Data storage
  • Information delivery
  • Metadata
 

Source Data

Source data component deals with the different data sources from which data is collected and integrated to be loaded into the data warehouse. We had the following external data sources:-
 
  • Mockaroo: Mockaroo is online realistic data generator. Most of the data for the dimensions are generated from this tool and saved either in a excel file or as a SQL script.
 
 
  • Adventure Works dataset: The data for the ‘date’, ‘customer’ and ‘geography’ is borrowed from the Adventure Works dataset. The data is in available in excel sheets.
  • BigML website: The customer churn data was collected from BigML site which has a repository for large data sets. The data was made available in .XLS format.
 
 

Data Staging

 
Data staging is the component where data from different sources are integrated. Certain data transformations have to be performed to make sure the data types are same across multiple tables that are integrated.
For the project we had 4 XLS files which had to be cleaned and transformed to be loaded into the data warehouse. SSIS tool of the SQL server 2012 was used to perform the extraction, transformation and loading.
 
The SSIS packages are shown in the screenshots below.
 
 
For all the .XLS files, the data is stored in string format. Before loading the data into the warehouse, data conversions had to be done. An example of conversions done for the customer .XLS file is shown in the screenshot below.
 
 

Data Storage

 
The architecture that we have followed is the centralized data-warehouse architecture. Hence we have one centralized data warehouse and reports and OLAP cubes are generated from the warehouse.
 
We created a data warehouse called “Horizon_Telecom_Database” which resides on the server “infodata.tamu.edu”.
 
It has the following dimensions and fact tables. The dimensional model is explained in the next section.
 
 

Information Delivery

 
This component of the data warehouse deals with how the information or strategic data is disseminated amongst the management or employees needing the information. There can be multiple ways such as OLAP cubes, Queries, Online portal, reports etc.
 
For our vendor we have the following ways of disseminating strategic information.
 
  • OLAP cubes: An OLAP cube is a multidimensional database that is optimized for data warehouse and online analytical processing (OLAP) applications. An OLAP cube is a method of storing data in a multidimensional form, generally for reporting purposes [3].
 
Cubes offers a holistic view of the strategic data. The measures can be viewed and analyzed across various dimensions. A glimpse of one of the various cubes that have been generated for our project is given below:
 
  • Excel Reports/dashboards: These are reports that are made in excel via the Cubes. Power pivot is used to display data and dashboards are built.
 
 

Metadata

 
This component contains information about the datatypes, tables, facts, dimensions, various conversions that have been done, the movement files etc. For more information please refer the APPENDIXB.
 
 

Dimensional Model

 

Business Processes:

The business processes and respective Fact Tables for Horizon Telecom are as follows:
Business Process
Fact Tables
Churn Analysis
CustomerChurnPrediction, Churn
Rate Plan Analysis
RatePlanAnalysis
Sales Representative Analysis
SalesRepAnalysis
Channel Analysis
ChannelCampaign
 
Each fact table measures different metrics. We will discuss each fact table and their measure in detail in later sections.

Facts and Dimensions:

The following table lists fact tables and dimensions it is associated with.
Fact Table
Dimensions
Churn
Network, Customer, RatePlan, Date
CustomerChurnPrediction
Network, Customer, RatePlan, Date
RatePlanAnalysis
Network, Customer, RatePlan, Date
SalesRepAnalysis
SalesRepresentative, Channel Customer, RatePlan, Date
ChannelCampaign
Campaign, Channel, Date
 

Dimensional Model:

The diagram shows complete multidimensional model for Horizon including all Fact tables and relevant Dimensions.
 
From the above diagram, we can see that each fact table is associated with different dimensions.
Now that we have an idea of overall dimensional model, let’s discuss each process and their measures in detail:

Customer Churn Analysis:

The following diagram shows Fact Churn and associated Dimensions.
 
Measures

Rate Plan Analysis:

The following diagram shows Fact RatePlanAnalysis and associated Dimensions.
 
 
 
Calculated Measures
Revenue – A fact measure calculated from call duration and costPerMinute of the Rate Plan Dimension
 

Sales Representative Analysis:

The following diagram shows Fact SalesRepAnalysis and associated Dimensions.
 
 

Channel Analysis:

The following diagram shows Fact ChannelCampaign and associated Dimensions.
 
 
 
 

OLAP Reports

The BI questions can best be visualized using Pivot Charts. They help identify the patterns and trends easily with the help of visual aids. The reports are followed by analysis gathered from studying the trend displayed in the report.
 

Rate Plan Analysis

  1. What is the trend in call/data usage from different rate plans in the past 4 years?
 
The report studies the data usage and call durations of two rate plans called Loved Ones and Night Calling over a period of 2001 – 2004. The bars shown in the graph below indicate data usage pattern. Although it shows a steady increase in the usage from 2001-2002 but since 2002, it has been on a decline. Usage dropped heavily from 2003 to 2004. Similarly for the call duration patterns, the Night Calling has shown a steep decline from 2002-2004 while the Loved Ones plan was stable after a great increase from 2001-2003. 2003 onwards it has reduced tremendously. The same analysis can be easily used to gather data about all the rate plans.
 
 
b. What is the increase/decrease in revenue generated from different rate plans?
The doughnut pivot chart allows to showcase the revenue generated from each plan over a period of four years from 2001 to 2004. The chart is self-explanatory. It displays the percentage of profit generated from each rate plan for the innermost year, 2001 going out progressively.
 
c. Usage pattern based on marital status?
The gender based usage patterns make for an interesting study. It is very useful for designing new promotional campaigns, customer retention policies, and designing new rate plans. The below reports studies the international call duration pattern of single individuals to specific networks.
 
d. Average call duration per hour for different networks called?
The below report displays a trend of the number of calls made per hour based per network called. It can be seen that the most number of calls per hour are made to the rival, AT&T network. Horizon network is the second most called network. This could lead us to many potential suggestions for improvement of calls per hour to our own network.
 
 

Sales Rep Analysis

a. How does percentage of complaints resolved compare with complaints/sales rep globally during 2001-2004?
The below report displays the number of complaints resolved in a graph along with the number of complaints received during the period of 2001 – 2004 from France. It can be drilled up to obtain a global picture.
 
b. What is the increase in number of lines sold in 2003 for different rate plans over the previous year?
The report talks compares the sales of the year 2003 to that of 2001 and 2002. It helps the business analyze the steps it had taken during 2003. This report helps them study the effects of those measures and whether they should continue them in the future or not.
 
c. Compare the performance using revenue generated per sales representative for France region during 2002-2004.
The business is interested in knowing about the revenue generated by sales representatives based on gender in France. They have a new plan coming up for the France region and want to analyze the sales by various sales representatives based on gender.
 
d. Compare the attrition rate among male/female sales representatives for France during 2004
As the business has new plans for France, they decided to understand the attrition rate of their sales representatives based on gender. This will help them in understanding their employees and employ those with lower attrition rate in the future.
 

Channel Performance Analysis

Channel refers to the various methods by which a company reached to their potential customers. Here we are considering two types of channels:
  1. Online channels : includes web and mobile channels
  2. Offline channels : includes onshore and off-store channels
 
Channel analysis gives a great insight into how the promotional money is being spent on the different channels and how it is performing. This data can provide an overview to the management as to where to spend more money to generate better revenues for the company. In addition to the efficient channel optimization we also wanted to know where and why we are leaking revenues. Our task was to analyze the possibility of generating new business stream by analyzing channels performance data. In this analysis we would address such questions. Business Intelligence questions that we asked are listed below.
a. What is the relation between cost of promotional activities and revenue?
The revenue and cost has been consistent till year 2003 and sees a sudden shift from 2003-2004 for offline promotional activities
 
b. Where are the regions of revenue leakage and why
For this we have done profit analysis across years. As evident from first point we are losing profit from 2003-2004 period. There could be various reasons for this revenue leakage so we needed to further drill down to find out the precise reasons for such an occurrence.
 
We drilled down and found that since we are getting less leads for period 2003-2004 compared to 2000-2003 thus our profits are down.
 
Further analysis suggests that In France we have lowest number of leads and customers. Thus this analysis gives us a geographic location to target for promotional activities.
 
c. Recommendations for future marketing for effective targeting
In order to understand what could be an effective future marketing strategy we focused on the results from above analysis. We closely analyzed consumer patterns in France and came to a conclusion that Male users were using the plans lesser compared to the Female users. This gives a clear idea that we should target to French Male user to gain more market share in our future marketing campaigns.
 

Customer Churn analysis

Churn is a term used for customers that have changed a network. They were once loyal customers of Horizon but due to certain circumstances, they have switched over to another network. Customer churn analysis provides a view into how many customers have churned across the dimensions of time, geography, rate plan etc. The various OLAP reports for customer churn are given below:
  1. Customer churn for the different rate plans across the different locations where Horizon is operational?
 
Using the following OLAP report, the management can view how many customers have churned in the various regions where Horizon is operational and across rate plans. We have implemented the customer location hierarchy and hence the management can drill down the locations. They can view data for a particular country, state or city.
They can also filter data based on various rate plans.
 
b. Customer churn trend for different genders
Using this report, the management can view the trend of customer churn during a period of time across the genders. The report shows an increase or decrease in male and female churners. This is very important to consider when building a marketing campaign. As the report shows, male churners are increasing, hence the management should take drill down to find the reasons for an increase in the male churners.
 
c. Relation between the number of services calls made by customers and customer churn – (can include other measures as well)
 
This report can be used to find correlation between the number of customer services calls made by the customer and the churn. Usually it can be assumed that if the number of customer service calls go up, then the customers are unhappy. Hence it could be said that they are directly proportional. But as it can be seen from the graph, even though males have less number of customer service calls, the churn is more.  
 
d. Customer churn across different customer occupations
 
This report provides an insight into the churn that is happening across the different professions of the customers. From the report it can be clearly seen that the “professionals” and the “skilled manual” customers are switching over to a new network.
 

Appendices

Infrastructure

Data warehouse infrastructure includes all the foundational elements that enable the architecture to be implemented. The infrastructure includes several elements such as server hardware, operating system, network software, database software, the LAN and WAN, vendor tools for every architectural component, people, procedures, and training. [4]
The project infrastructure is divided into 2 categories:
Operational infrastructure: Operational infrastructure includes:-
    • People: The entire team was involved in various aspects of the data warehouse implementation.
    • Procedure: A team contract and was formulated listing down some ground rules and procedures for going about building the data warehouse.
Physical infrastructure
This involves the computing platform, the hardware, software, and the various tools used during building the centralized data warehouse for Horizon.
The hardware specifications for the warehouse is given below:
Unit
Requirements
Server – infodata.tamu.edu
Linux, 1TB, 16GB RAM
Client
Microsoft windows 8, SQL Server 2012 package installed, 500GB, 16GB RAM
The various tools and software used during the implementation of the warehouse are given below:
Source data
Tools used
Description
Mockaroo
Online realistic data generator
BIGML
Online repository for huge datasets
Microsoft Excel
Data stored in XLS files.
 
Data Staging
For the staging process, Microsoft SSIS was used. Using the server data tools software, SSIS packages can be made to extract the data from various flat files and then loaded into the warehouse after performing data transformations and conversions.
A snippet of the tool and the package is given below:
 
Data storage
The ware-house was implemented on SQL Server 2012. Using the Microsoft Server management Studio the tables were made. A snippet for the tool is given below:
 
Information Delivery
For information delivery, the SSAS tool of the Server data tools was used. Cubes can be made by connecting to the database which is hosted on the “infodata.tamu.edu” server. We have to select the database corresponding to the “Horizon” data warehouse. Then the fact tables and their measures are selected and a cube is made from them. After deploying the cube, the user can browse the cube. A snippet of the SSAS tool is given below:

Meta Data

 
Table_Name
Type
Description
Fact_Churn
Fact
A fact table to measure the customer churn and various other measures that affect customer churn
Fact_RatePlanAnalysis
Fact
A time based collection of all the records that make up the Fact table for Rate Plan Analysis
Fact_SalesRepAnalysis
Fact
A fact table to measure performance metrics for sales representatives
Fact_ChannelCampaign
Fact
A fact table that collects all the necessary data for channel analysis through Campaign and promotion
Dim_Date
Dimension
Contains details of Date and its hierarchy
Dim_Branch
Dimension
Contains details of Organization’s Branch
Dim_RatePlan
Dimension
Contains details of all the rate plans rolled out by Horizon
Dim_Customer
Dimension
Records customer details including personal information
Dim_Sales_Representative
Dimension
Contains details of sales Representatives
Dim_Geography
Dimension
This table records different levels of a location
Dim_Network
Dimension
This table contains details of Horizon Network
Dim_Channel
Dimension
This dimensional table has details for channels we use for promotional activities
Dim_Campaign
Dimension
This dimension table stores details related to the each campaign

Fact_Churn
Field
Data_type
Description
Customer_ID
int
Foreign key referring to the primary key in the customer table
Time_ID
int
To track the time when customers are churning
RatePlan_ID
int
The rate plan associated with the customer
Network_ID
int
The network to which the calls have been made
Day_Mins
int
A measure to accommodate the number of mins called during day
Day_calls
int
A measure to accommodate the number of calls made during the day
Day_charge
int
A measure to accommodate the cost
Eve_Mins
int
A measure to accommodate the number of mins called during eve
Eve_Calls
int
A measure to accommodate the number of calls made during the night
Eve_Charge
int
A measure to accommodate the cost
Night_Mins
int
A measure to accommodate the number of mins called during night
Night_Calls
int
A measure to accommodate the number of calls made during the night
Night_Charge
int
A measure to accommodate the cost
Int_Mins
int
A measure to accommodate the number of mins called to an international number
Int_Calls
int
A measure to accommodate the number of calls made to an international number
Int_Charge
int
A measure to accommodate the cost
Number_text_msgs
int
A measure to count the number of messages
AverageBillAmount
int
A measure to track the average bill amount
CustomerServiceCalls
int
A measure to track the number of service calls made
NumberOfCallsDropped
int
A measure to rack the number of calls dropped
Churn
int
Customer churn
 
Fact_RatePlanAnalysis
Field
Data_type
Description
RatePlanKey
int
Surrogate key referencing Rate Plan Key from the Rate Plan dimension
CustomerKey
int
Foreign key referencing Customer ID from the Customer dimension
TimeKey
int
Foreign key referencing Time ID from the Time dimension
Number_of_calls_per_hour
int
A fact measure to indicate the number of calls made per hour
Call_Duration
int
A fact measure to indicate duration of calls made by each customer per rate plan
Evening_Call_Duration
int
A fact measure to indicate duration of calls made by each customer per rate plan, in the evening hours (4pm to 8pm)
Morning_Call_Duration
int
A fact measure to indicate duration of calls made by each customer per rate plan, in the morning hours (6am to 11am)
Night_Call_Duration
int
A fact measure to indicate duration of calls made by each customer per rate plan, in the night (8pm to 5am)
Called_Network
int
A fact measure to identify the network called
International_Call_Duration
int
A fact measure to call duration of all international calls
Data_Usage
decimal
A fact measure to capture the data usage of customers
Calculated Measure: Revenue
Decimal
A fact measure calculated from call duration and costPerMinute of the Rate Plan Dimension
 
Fact_SalesRepAnalysis
Field
Data_type
Description
Time_Id
int
To track the time when customers are churning
Customer_Id
int
Foreign key referencing Customer ID from the Customer dimension
SalesRep_Id
int
Foreign key referencing SalesRep ID from the SalesRep dimension
RatePlan_Id
int
Foreign key referencing RatePlan ID from the RatePlandimension
Channel_Id
int
Foreign key referencing Channel ID from the Channel dimension
Manager_Id
int
Id of the SalesRep Manager
NumOfLinesSold
int
Number of service lines sold
NumOfActiveLines
int
Number of active lines
NumOfComplaints
int
Number of complaints received by sales representatives
NumOfComplaintsResolved
int
Number of complaints resolved by sales representatives
Revenue
money
Revenue generated by sales representatives
Cost
money
Cost incurred by sales representatives
 
Fact_ChannelCampaign
Field
Data_type
Description
id
int
Unique ID for the table
channel_id
int
Unique ID for the channel
start_date
int
Time ID for campaign start date
end_date
int
Time ID for campaign end date
campaign_id
int
Campaign ID for this channel
cost
decimal
Total cost of the promotion
sales
decimal
Total sales due to that promotion
type
varchar
type of promotion
no_of_leads
int
No of people registered
no_of_conversions
int
No of paying customers
customer_id
int
Unique ID of customers
 
campaign_manager
varchar
ID of the person assigned to the campaign
campaign_manager_contact
varchar
Contact of the above person
sales_rep
int
Sales Representative ID
Comments
varchar
Comments associated with channel campaign
location
int
Geographic location of the campaign
 
Dim_Date
Field
Data_type
Description
TimeKey
int
Primary key
FullDateAlternateKey
date
Surrogate Key
DayNumberOfWeek
nvarchar
Day Number of Week
EnglishDayNameOfWeek
nvarchar
Day Name of Week
DayNumberOfMonth
nvarchar
Day Number of Month
DayNumberOfYear
nvarchar
Day Number of Year
WeekNumberOfYear
nvarchar
Week Number in Year
EnglishMonthName
nvarchar
Month Name
MonthNumberOfYear
nvarchar
Month Number
CalendarQuarter
nvarchar
Quarter Number
CalendarYear
nvarchar
Year value
CalendarSemester
nvarchar
Semester Number
FiscalQuarter
nvarchar
Quarter number in fiscal year
FiscalYear
nvarchar
Fiscal year
FiscalSemester
nvarchar
Semester Number in fiscal year
hour_of_day
int
Hour of Day
 
Dim_Branch
 
Field
Data_type
Description
Branch_Id
int
Unique key for branch
Location_Id
int
Location id foreign key from Geography dimension
Manager_Id
int
Unique id of branch manager
Branch_Name
varchar
Name of the branch
Branch_Type
varchar
Type of the branch
PhoneNumber
varchar
Branch phone number
 
Dim_RatePlan
Field
Data_type
Description
RatePlanID
int
Source primary key
Name
varchar
Name of the rate plan
Description
varchar
Description of the rate plan
Peak_hour_Start
time
Indicates the start of the peak hour for that particular rate plan
Peak_hour_End
time
Indicates the end of the peak hour for that particular rate plan
CostPerMinute
decimal
Indicates the charges levied per minute for a call of that rate plan
RatePlanKey
int
Surrogate key to reference in the Fact table
 
Dim_Customer
Field
Data_type
Description
CustomerKey
int
Unique id of customer
GeographyKey
int
Foreign key referencing Location ID from the Geography dimension
CustomerAlternateKey
nvarchar
Alternate customer key
Title
nvarchar
Title of the customer
FirstName
nvarchar
First name of the customer
MiddleName
nvarchar
Middle name of the customer
LastName
nvarchar
Last name of the customer
NameStyle
nvarchar
NameStyle of the customer
MaritalStatus
nvarchar
‘M’ if married, else ‘U’
Suffix
nvarchar
Suffix
Gender
nvarchar
Gender
EmailAddress
nvarchar
Email Id
YearlyIncome
int
Customer annual income
TotalChildren
int
Number of customer children
NumberChildrenAtHome
int
Number of children at home
EnglishEducation
nvarchar
Education level of the customer
EnglishOccupation
nvarchar
Occupation of the customer
HouseOwnerFlag
int
‘Y’ if house owner, else ‘N’
NumberCarsOwned
int
Number of cars owned
AddressLine1
nvarchar
Street address
AddressLine2
nvarchar
City, State, Country
Phone
nvarchar
Phone number of customer
 
Dim_Sales_Representative
Field
Data_type
Description
SalesRep_Id
int
Unique key for sales representative
FirstName
varchar
First name of sales representative
MiddleName
varchar
Middle name of sales representative
LastName
varchar
Last name of sales representative
Branch_Id
int
Foreign Key id of associated branch
Title
varchar
Title of sales representative
Manager_Id
int
Id of manager assigned to sales representative
Email
varchar
Email Id of sales representative
PhoneNumber
varchar
Phone number of sales representative
Start_Date
date
Date on which sales representative joined
End_Date
varchar
Date on which sales representative ended tenure
Gender
varchar
Gender of the sales representative
 
Dim_Geography
Field
Data_type
Description
GeographyKey
int
Unique key of location
City
nvarchar
Name of the CIty
StateProvinceCode
nvarchar
State Code
StateProvinceName
nvarchar
State Name
CountryRegionCode
nvarchar
Country Code
EnglishCountryRegionName
nvarchar
Coountry Name
PostalCode
nvarchar
Zip Code
SalesTerritoryKey
int
Territory key assigned by Horizon
 
Dim_Network
 
Field
Data_type
Description
Network_ID
int
Unique network id
Name
varchar
Name of the network
Country
varchar
Country name in which it is active
Network_Key
int
Key associated with network
 
Dim_Channel
Field
Data_type
Description
id
int
Unique ID for channel
type
varchar
type of channel , online , offline
comments
varchar
Comments associated with Channel
 
Dim_Campaign
Field
Data_type
Description
id
int
Unique ID for campaign
type
varchar
type of campaign
description
text
Description of the Campaign
 

Size of Data warehouse

 

References

  1. Paulraj pooniah – Data warehousing: A complete guide for IT professionals

Leave a Reply

Your email address will not be published / Required fields are marked *