We want to thank Vincent McBurney from MLC Life Insurance and Certus for their guest blog.
In this blog post, guest author Vincent McBurney from MLC Life Insurance shares how to implement Data Vault efficiently to save cost.
Choosing Data Vault as your data warehousing methodology is a smart business decision. However, it’s not a guarantee for success and cost savings.
Done wrong, Data Vault can be costly, while those who get it right from the get-go stand to save $250K-750K up front and $50K-200K per month for an active Data Vault that is continuously adding new data sources.
In some extreme instances, it saves the entire cost of the Enterprise Data Warehouse (EDW), which can be millions, by preventing it from failing within 2 years. And failure is more common than you might think. The head of Data Warehousing at Gartner finds that a lot of Data Vault projects across the USA were failing within 18 months due to poor return on investment.
Therefore, for anyone considering the Data Vault methodology – or any EDW method for that matter – is advised to plan the execution carefully.
There are things that mark your Data Vault for certain death, including:
- Pitching your Data Vault to the executive team and stakeholders as faster, easier and cheaper than traditional EDW, and then spending 6 months on architecture/design/load frameworks, which means delivering your first reports ends up costing as much as a traditional approach to building your EDW.
- Giving your Data Vault build to a team who doesn’t understand the core concepts of Data Warehouse automation or the fundamental principles of Data Vault (Raw Vault versus Business Vault and other principles). They will end up refactoring their entire build several times as they learn through their mistakes, taking as long or longer than a traditional EDW.
- Trying to do this in a Data Lake as your first Data Lake and/or Data Vault project. Data Lake experts are java programmers who don’t understand DW modelling. Information Management experts are SQL experts and data modellers who don’t understand NoSQL. Pick a team who know how to pull it off.
- Trying to build a Data Vault with tools that have no metadata exchange – modelling, mapping and loading.
- Persisting to hold on to old practices and approaches – Data Vault is a fundamental shift on how to approach your information design. Trying to do Data Vault with dimensional modelling lenses on or trying to abstract your Data Vault to a logical ideal state will result in continuous refactoring, negating the benefits Data Vault brings to the table.
Let’s assume you manage to avoid these certain-death mistakes, how can you ensure your Data Vault execution ends up saving you money – both upfront and monthly?
The upfront saving comes from getting an out-of-the-box load framework that lets the team start populating a Data Vault in the first sprint taking out the following costs:
- Defining an architecture, approach and design of Data Vault loads. ($50K)
- Building data modelling automation into a modelling tool ($50K). Or don’t and increase your monthly manual modelling costs.
- Build a Data Vault load framework in your ELT/ETL tool ($150-750K). It’s not unusual to discover after a year that your DIY framework doesn’t scale well or use set based SQL loads effectively and needs to be rebuilt.
On a per-month basis, you can achieve savings of $50K-200K by avoiding these inefficiencies:
- The Data Modeller has to manually build a Data Vault model by dragging and dropping columns and adding fields. This can be time consuming and error prone.
- Someone has to populate mapping specifications or mapping metadata for those Data Vault models because most modelling tools do not store mappings. Thus, the manual Data Vault design work performed by the modeller has to be repeated manually by a mapper (usually in a spreadsheet or database table).
- Someone has to load those mappings into the ETL/ELT framework and add the bespoke coding steps (hard rules) such as type conversion. Most ETL/ELT tools are not good at receiving mapping data so this has to be manually done repeating the process already done by the modeller and the mapper.
That cost scales up. If you have a larger EDW you duplicate these three processes and incur additional costs. Larger organisations tend to have poorly balanced teams or teams reporting to different managers or in the worst cases, each team is in a different organisation working under a different contract. For those cases, building a Data Vault can be incurring monthly costs above $200K for the work that a single person with a Data Vault automation tool could do. Instead of one person adding 50 tables per month you have three teams with duplicated overheads of consulting firms, consisting of PMs, partners, architects, BAs and the like.
What does proper Data Vault Automation do?
A properly configured Data Vault automation solution like WhereScape, VaultSpeed, or ErWin Edge lets one person make Data Vault design decisions (business keys, links, satellite splitting) with input from SMEs and BAs. It then does everything else – it generates the data model, data model DDL file, mapping from Landing to Staging to Data Vault, SQL scripts or generated ETL/ELT jobs to load the data.
We’ve got over 100 tables in our EDW after 2-3 months and I’ve had just one person doing the entire Data Vault layer. The rest of the team are free to build business vault, info marts and reports and this is where the business value is realised.
The easiest way to show the value
We created a project cost estimate based on our existing tools and created line items for building the framework. I then showed the monthly run costs where every Data Vault table had to be delivered with an equal portion of time by a modeller, BA (mapper) and data engineer ETL). I then cut the same plan with these three people replaced by one person with DV automation. It became easy to get approval.
That’s just the savings on the Raw Data Vault. Automation tools don’t have a big saving when it comes to Business Vault or Information Marts because those layers tend to require a lot of bespoke design and data transformation code and benefit less from pattern-based code generation. There are cost savings of Business Vault over traditional EDW but that’s a different discussion.
Recent Comments