In my career I have used and built many ‘frameworks’. Typically they are focused on Data Warehouses delivery involving the Extraction, Transformation and Loading (ETL) of data, however, some have also been .Net based. This post covers why they are a good idea and also, what to avoid doing if you want to successfully create and use one.
Firstly, if someone asks you to build, or use a framework, I would assume that they are talking about..
Ways of working, standards and possibly even technologies that enables a group of technical resources to deliver a solution in an efficient and consistent manner.
If you would like an alternative definition check out wikipedia
When I start using a framework, in a data related project, I expect to see the following:
- On boarding (e.g. what does a new resource need to know in order to become effective in the use of the framework)
- Architectural guidance (e.g. what layers exist, when to use them, etc.)
- Environment details (server addresses, database names, network shares, etc.)
- Data Handling (e.g. when to MERGE, when to UPDATE\INSERT, etc.)
- Code Management (e.g. Branching etc.)
- Code Examples (e.g. how to ingest data that is real time, micro batch, batch, etc.)
- A mechanism to capture operational meta data, if it is not natively done, to the desired level, in the underlying technology.
- Mechanism to orchestrate the framework operation, if it is not natively done, to the desired level, in the underlying technology.
- Functionality to deliver Meta Data Driven Integration\ETL\ELT. Any framework that relies only on hand coding has missed out on delivering development efficiency.
The advantages are numerous, and I would never dissuade anyone from using a framework. An appropriate framework will:
- Increase developer productivity
- Drive standardization across the development team
- Enable CI/CD
- Deliver a solution that:
- Is of high quality. New developers are shielded from making mistakes as the framework provides them with a safety net to avoid mistakes.
- Is more easily supported. This is because there are few variations between projects where they take different approaches.
- Can be rapidly rebuilt or extended if a core business rules change. (e.g. add a new column to 50 staging tables and populate it with a historical load)
Risks and Pitfalls
- Framework architects often focus on keeping the schema and supporting applications in sync between environments (DEV, SIT, UAT, etc.). It is easy for them to overlook the configuration data (aka metadata) that drives the framework. Differences are less easily detected between environments and the data can be more difficult to migrate, unless a feature is included in the framework for configuration data migration.
- Frameworks are typically based on a core technology (e.g. .Net, SSIS, Informatica, Matillion, etc.), if the creator(s) of the framework have chosen to develop their own features, that are already available in the base technology, then they are making a mistake. If it is available in the base technology, don’t rebuild you own version in the framework, just because you can. This adds cost, risk and confusion.
- Frameworks don’t stand still, they evolve and mature as each project uses them to deliver functionality. However, extension of the framework is often not catered for (i.e. funded) in projects that use it. In addition, changing a framework inevitably involves a large amount of regression testing to make sure that it still supports projects that have already been released.
- A project starts using the framework before it is finished and thoroughly tested. The consequence of this is that when defects are found it is a painful process to test resolve them and then release them to the projects that are in flight using the framework.
- Build the framework to improve development efficiency. This needs to be the number 1 goal.
- Leave the base technology to do what it is designed to do, just because you may not like it doesn’t mean you should rebuild core functionality .
- Focus on maintaining and migrating configuration data\metadata as much as you do the framework code.
- Make sure you are not left hand coding anything that can be reasonably achieved through a meta data driven ETL process or similar.
- Acknowledge that there will still be areas that have to be hand coded due to the level of complexity in the data or the need to apply complex business rules.
- Take people on the journey with you. If developers and operational staff don’t understand the framework, it will quickly become accused of being too complex and therefore slow to use.
Hopefully these warnings and suggestions will help you when you realise that
You need a framework…