Just Analytics Blog | Performance Management News, Views and Op-ed

Migrating the Data warehouse to the cloud does not need to take years

Written by Dang Trung Tin | Oct 30, 2017 10:47:08 AM

 

A few weeks ago, we had a chance meeting with the CIO of one big company in the Asia region. He shared that his team is having some performance issues with the existing data warehouse system and it was taking very long time for the data to be ready for end users every day and asked for our help.

 

We took this opportunity to address his issues and trying to "Sharpen the Saw" around our large volume Data Platform activities, we built a very quick PoC to demonstrate how the Azure Platform (specifically SQL Data Warehouse and Analysis Services) would resolve their pain points, and also align with the best practices in term of designing data warehouse model, applying ETL framework to control the data quality & capture audit information and some analytical stories in their industry.

 

As this is not an atypical use case, I thought it would be useful to share our journey and findings. Below is what we proposed and helped them to achieve.

 

Before Architecture

 

Customer is using Power BI service to connect directly to their DW system sit on premises and the data is imported from DW to Power BI every day using Power BI gateway. The data volume of the existing DW system is around 100 GB and customer has a road map to extend the business requirement to different subject areas and expected data size to be 10 TB within the next 3 years.

 

Problem

 

With the before architecture, customer is facing some issues which are listed below.

  • The limited dataset refreshment (1GB) and time consumption for data available in Power BI Cloud.
  • Power BI Data set take 4 hours to complete.
  • Unable to meet new business requirements due to exceeding Power BI 1GB limit.
  • Current workaround of splitting the few PBI files, but new issues on some existing reports and overall timing is more than 4 hours.

 

Customer Expectation

 

Below is what customer expected as outcome of the project.

  • Propose the good Data Warehouse (DW) and Business Intelligence (BI) solution architecture with the best resources sizing, costing plan for the propose architecture.
  • Show the real data quality control cases with dimension/hierarchy and transactional data.
  • The proof of data availability in the shorted time, before 7AM daily.
  • Minimum data period is 15 months.
  • Interaction time on dashboards/reports is less than 10 seconds.
  • Dashboards/Reports loading time is less than 10 seconds.

 

Proposed Architecture

 

While in the long run we want the customer to migrate their ETL processing to the cloud, we wanted to have some quick wins. With that in mind we proposed the architecture below which will help them to resolve all the issues and achieve good performance including data refresh and query performance in a very short period.

 

 

The data will be copied incrementally every day from on premises Data Warehouse to Azure SQL Data Warehouse using Azure Data Factory with Poly-Base feature. After that, it will be incrementally imported to Azure Analysis which is the in-memory engine and the centralized model.

 

Power BI connects to Azure Analysis Services (in-memory engine) using Live Connection mode.

 

Why Proposed Architecture

Approach

Azure SQL Data Warehouse and Analysis Services

Cloud Approach

PaaS -  Fully Managed

Data Architecture

MPP + In Memory Cache

Scalability

SQL Warehouse - Unlimited

Analysis Services - Multiple instances

Data Model

Multiple AS models by Subject Area

Uptime

SQL Warehouse - During batch time

Analysis Services - Always On

Focus

Performance (Batch & Runtime)

Expected Monthly Consumption

USD 1-3k

 

Results

 

Below are the results we helped the customer achieve in a matter of weeks. More than that, we also helped them finalize their road map to move their existing data warehousing system to cloud-based with Azure SQL DW.

 

 

Conclusion

 

With Azure SQL Data Warehouse, the system can quickly run complex queries across petabytes of data. Integrating with Azure Data Factory with Poly-Base and using power of Massively Parallel Processing (MPP) you can build high performance data pipelines to meet any kind of analytics on the cloud.

 

This combined with Azure Analysis Services, a BI professional can create a semantic model over the raw data and share it with business users so that all they need to do is connect to the model from any BI tool and immediately explore the data and gain insights. Azure Analysis Services uses a highly optimized in-memory engine to provide responses to user queries at the speed of thought.

 

Reach out to us if you want to know more about this or other projects we have sucessfully migrated to Azure.