Data Flexibility

Data Flexibility is surely the goal. I can’t believe that it has taken me so long to write a post on this! How much time have we spent ‘dragging and dropping’ only to create rigid patterns that snap within months? Happily, we now have options!

Data Flexibility is the ability for a solution to grow, shrink or change to meet a revised set of data needs or requirements. Change is so prevalent that we must ensure the data we manage can cope.

The post explains flexibility in terms of:

Platform
Integration
Storage
Access

At the end of the day, how do we avoid rigidity?

Platform

This a simple, one-word answer, it is Cloud. AWS, AZURE and GCP have come to the rescue. The following are some common scenarios where cloud service providers provide the ability to:

Add new services at a click. An example is where an Azure client needs to discover and understand their data sources more completely. Within hours they can add Azure Data Catalog to facilitate this.
Automatically scale both horizontally and vertically. An example is an AWS client that has periodic peak loads that require them to initiate EC2 Auto Scaling to meet performance expectations.
Create temporary environments to carry out periodic activities. For example a GCP client needs to test their Disaster Recovery (DR) procedure and therefore spins up replica Compute Engines so they can validate the recovery procedure.

Integration

I will be honest; this is my favourite! For far too long we have been creating rigid, point to point, integration patterns that snap if there is a change. Hand-coded Extract Transform and Load (ETL) patterns were the default option.

Integration flexibility is now provided by:

Rapidly loading data structures, that do not require complex business rules (e.g. Landing, Staging, Raw Data Vault or Data Lake), with minimal hand-coding. A Meta Data-Driven Framework or tool is a must.
Using a Pub/Sub pattern if the integration needs ‘many’ system integration points, multiple protocols (e.g. FTP, HTTP, etc.) and message routing. This will help avoid:
- Integration issues where either Publisher or Subscriber are periodically off-line.
- Performance bottlenecks due to a subscriber client process that can’t acknowledge messages as rapidly as the publisher can send.

Using a Micro Service integration pattern to deliver Service Orientated Architecture (SOA) in a targeted, flexible and incremental manner. This means the client can avoid the traditional monolithic approach where all the eggs are all in one basket.

Storage

Data Flexibility would be impossible without a recent change in storage options. Schema definition was previously required before data could be stored. This meant ingestion was costly and slow due to the modelling overhead. We are now able to choose from a range of approaches where the schema is only really relevant when reading data. Examples of these include Amazon S3 or Azure Blob storage.

This shift now means that we have the flexibility to store structured, semi-structured or unstructured data when we want.

Data Architecture

Data modelling, and the number of data layers used, was a simpler choice in the past. Now we have Lake, Vault, Lake House, Warehouse, Mart, ODS, etc. Each of the approaches has its own merits and can deliver Data Flexibility to a greater or less extent. Regardless of your modelling preference, flexibility is supported by:

Raw layers (e.g. Land, Stage, Raw Vault, Lake, etc.) that are simple and extremely quick to build and change
Conformed Business layer that share dimensions
A minimum number of data layers. Each one adds rigidity and cost.
Standard load pattern. Variation impedes rapid change.

Access

We now have rapid data ingestion enabled by Data Lakes, Cloud Storage and Meta Data-Driven integration. The addition of self-service data visualisation tools now means that business users have the flexibility to access the raw data straight after ingestion and prior to modelling.