Data Flexibility is surely the goal. I can’t believe that it has taken me so long to write a post on this! How much time have we spent ‘dragging and dropping’ only to create rigid patterns that snap within months? Happily, we now have options!
Data Flexibility is the ability for a solution to grow, shrink or change to meet a revised set of data needs or requirements. Change is so prevalent that we must ensure the data we manage can cope.
The post explains flexibility in terms of:
At the end of the day, how do we avoid rigidity?
This a simple, one-word answer, it is Cloud. AWS, AZURE and GCP have come to the rescue. The following are some common scenarios where cloud service providers provide the ability to:
- Add new services at a click. An example is where an Azure client needs to discover and understand their data sources more completely. Within hours they can add Azure Data Catalog to facilitate this.
- Automatically scale both horizontally and vertically. An example is an AWS client that has periodic peak loads that require them to initiate EC2 Auto Scaling to meet performance expectations.
- Create temporary environments to carry out periodic activities. For example a GCP client needs to test their Disaster Recovery (DR) procedure and therefore spins up replica Compute Engines so they can validate the recovery procedure.
I will be honest; this is my favourite! For far too long we have been creating rigid, point to point, integration patterns that snap if there is a change. Hand-coded Extract Transform and Load (ETL) patterns were the default option.
Integration flexibility is now provided by:
- Rapidly loading data structures, that do not require complex business rules (e.g. Landing, Staging, Raw Data Vault or Data Lake), with minimal hand-coding. A Meta Data-Driven Framework or tool is a must.
- Using a Pub/Sub pattern if the integration needs ‘many’ system integration points, multiple protocols (e.g. FTP, HTTP, etc.) and message routing. This will help avoid:
- Integration issues where either Publisher or Subscriber are periodically off-line.
- Performance bottlenecks due to a subscriber client process that can’t acknowledge messages as rapidly as the publisher can send.
- Using a Micro Service integration pattern to deliver Service Orientated Architecture (SOA) in a targeted, flexible and incremental manner. This means the client can avoid the traditional monolithic approach where all the eggs are all in one basket.
Data Flexibility would be impossible without a recent change in storage options. Schema definition was previously required before data could be stored. This meant ingestion was costly and slow due to the modelling overhead. We are now able to choose from a range of approaches where the schema is only really relevant when reading data. Examples of these include Amazon S3 or Azure Blob storage.
This shift now means that we have the flexibility to store structured, semi-structured or unstructured data when we want.
Data modelling, and the number of data layers used, was a simpler choice in the past. Now we have Lake, Vault, Lake House, Warehouse, Mart, ODS, etc. Each of the approaches has its own merits and can deliver Data Flexibility to a greater or less extent. Regardless of your modelling preference, flexibility is supported by:
- Raw layers (e.g. Land, Stage, Raw Vault, Lake, etc.) that are simple and extremely quick to build and change
- Conformed Business layer that share dimensions
- A minimum number of data layers. Each one adds rigidity and cost.
- Standard load pattern. Variation impedes rapid change.
We now have rapid data ingestion enabled by Data Lakes, Cloud Storage and Meta Data-Driven integration. The addition of self-service data visualisation tools now means that business users have the flexibility to access the raw data straight after ingestion and prior to modelling.
Data Flexibility in Conclusion
Delivering an architecture for Data Flexibility comes with some risks that we also need to consider. For example:
- We can now store any data we want, however, is there value in doing so? ‘Just in case’ storage is not a good enough reason.
- How data is architected is one of the areas where flexibility can easily be lost. I am not saying that the items below are without merit, I am saying that they will, however, create Data Rigidity:
- A number of layers (e.g. Lake, Stage, Operation Data Store, Data Vault, Archive, Presentation, etc.)
- Changes in modelling paradigm (e.g. from 3NF to Raw Vault to Business Vault to Dimensional)
- Business rules:
- Implementation consistency (e.g. are some in Staging, some in Presentation and some in the Data Visualisation tool?)
- Method of creation (e.g. Hand-coding vs Meta Data-Driven)
- On the surface letting users access raw data sounds like a perfect solution, however, it presents a set of unique governance, privacy and budget challenges that can make it very hard to justify spending money to model it ‘properly’ later.
- Just because a cloud service provider enables us to add additional services at the click of a button, does the solution does warranty it? The temptation to add them can be very difficult to fight, especially if you are risk-averse or prone to future-proofing.