This blog describes a template-driven Business Intelligence development model using a classical data warehouse architecture. This model is built on the iterative and incremental development methodology. The model is driven by a pre-defined data warehouse architecture which includes high-level architecture definition, templates and standards for technical architectural elements, and coding templates for physical designs and actual documentations. The following considerations must be well apprehended in order to go deeper into the model.
- Iterative and incremental development
This BI development model is created based on an understanding of being agile and deliver in iterative processes. Therefor all team members must understand that the project is running on an iterative mode in order to build and improve the solution through different increments. It is important that project find out a practical way of scoping, monitoring and controlling (and budgeting) the project in such a process.
- Freedom of selecting activities, deciding phase milestones and sprint autonomy
This model provides a list of “best practice” activities which are typically useful in a data warehouse implementation project. A project have the full freedom to decide which activities and processes should be used by the project or not. Furthermore, the project have the freedom to decide if any check points or milestones are necessary before one iteration or increment is completed. Within an iteration of a project, the technical members should have the autonomy to work on the tasks that have been agreed upon at the start of the increment.
- Build and improve based on prototypes and fast deliveries
One of the key features of this BI development model is to start with prototypes. Both the analysis and specification phases are using double iterations to strengthen the importance of making up the product prototypes as well as first version products. In the analysis phase, the prototype helps both the business end-users and the project members to find a clear definition of the deliverables. In the specification phase, the first version BI solution utilizes the “FastTrack” concept to ensure the time-to-market delivery. These prototypes and first-version products keep the business stakeholders close to the project team, which ensures the success of the final delivery.
- Design with pre-defined templates
The architecture, the data model, the ETL jobs, the database designs and the documentations in the BI development model are all based on pre-defined templates. This makes the project development simple. 80 percent of the design decisions have already been made according to the templates.
- Test driven development
Test data and test cases are essential in this BI development model. A well-functioning test tools or test system should be used as part of the project. In all phases of the project, the iterations and sprints are all driven by tests.
The following text gives a high-level description the model.
Overview of the model
Traditionally, the development of a data warehouse system is like the other software processes. Models like the waterfall also apply to a data warehouse project. With the popularity of the agile concept, the data warehousing and BI world has also adapted the concepts of iterative and incremental development.
As illustrated below, the data warehouse project starts with an activity of business discovery, which finds out the business case for running the project and defines the high-level expectations. Understanding about the data warehouse and BI platform should be communicated and agreed through the activity “Architecture Alignment”, which defines the overall agreement of how this platform should be like and what kind of technology, infrastructure, logical layers of architecture, data modelling methodologies, and so on. One should of course understand that architecture, is just a model with different layers of software components. An architecture includes different perspectives, the process, the functionality, the data, the user experience, the people, the infrastructure, the operation, the security and so on. When the architecture alignment is completed, the project goes on with the resource planning phase, which finds out the people, the hardware and software, and the concrete execution plan.
These three activities, “Business Discovery,” “Architecture Alignment,” and “Resource Planning” can be seen as a “pre-phase” of the development of a data warehouse system. After such a phase, the data warehouse project enters into an iterative process, which includes “Analysis and Prototyping,” “Specification and Physical Design,” “Implementation and Acceptance Test” and “Deployment and Operation Handover” phases.
Analysis and Prototyping
In a traditional software project, the analysis phase typically involves the software architect, the business developer and the business end-users to have multiple rounds of escalations on the use cases, the user stories and documentations of various requirements. However, a BI/DW project is not like this. With a quite clear architecture template, the BI project do not have lots of process or functionality requirement. Even the design of user experiences is limited to a pre-defined the BI toolset.
As illustrated below, most activities in the “Analysis and Prototyping” phase are about defining the data requirements and building the data models and mappings. To be specific, the business end-users must first define the “information requirement” which is a description of the general business process and business data that must be captured in the data warehouse. Mock-ups and data requirements are the next level specification of the requirement which gives a more clear input of the data and reporting needs. With these input, the project can start digging source data, finding out the gap between the source data and requirement, and the relevant security needs on the data.
With the initial data requirement completed, the project goes on to define the data delivery from the source systems, design the logical data models, and finding out the grouping and mapping of the data. At the end of this phase, the physical data model (which is the database design) and the initial mapping document are ready for the ETL development. The last step of this phase is to ensure how the mapping design and the database design comply with template ETL architecture according to the data patterns defined in this phase.
It must be noticed that this phase also has an internal iteration called “prototyping circle.” This is a separate task iteration to creating a prototype BI front-end for the end-users. This inner circle can be executed in parallel to the outer circle. The purpose of this inner circle are, first, to come closer to the user requirements on the BI front-end, second, to keep a close interaction with the business users.
Specification and Physical Design
With the output from the Analysis phase, the Specification and Physical Design phase is to develop the key assets such as the database and the ETL jobs. As illustrated below, this phase starts with a refinement of the physical data model and mapping documents. Then the project goes not to develop the ETL jobs. When jobs at different layers are completed, the next task loads the sample data throughout the architectural layers. The findings in this task helps with the refinement and optimization of database and ETL designs. The last task of this phase is the data load acceptance test, which loads sample data through all ETL jobs and database layers to check against all the test cases. When all the test runs are completed with success, this phase is completed.
Like the Analysis phase, this phase also includes an inner circle, which actually develops the first version of the BI Front-end solution. In this first version, the data for the front-end solution does not come from the data warehouse (because it is still being built right now), but comes from a FastTrack database. “FastTrack database” means it is just a copy of source system data at a separate database. It is similar to the concept of Operational Data Store (ODS). The purpose of this inner circle is to deliver the BI front-end as fast as possible. When both the inner and outer circle of this phase is completed, the project has completed with all relevant ETL flows and has delivered the first version of the BI front-end to the end-users.
Implementation and Acceptance Test
The specification phase actually has completed most of the development activities at the “Development” environment, including both the front-end and back-ends according to the Data Warehouse architecture. The “Implementation and Acceptance Test” is more about to implement and deploy the code and designs to the TEST and UAT (User Acceptance Test) environments and make sure the all the elements run smoothly and passes all the relevant tests.
As illustrated below, this phase starts with test of the data load to front-end as well as an “end-to-end” test which starts the data load from source to the rest of the architecture until the front-end. The next tasks are about to deploy to the TEST and UAT environments and commit different types of test in order to get ready for the final deployment. There are two steps that involve the reconciliation of data and design. These tasks involve the business end-users since the decision about data is “correct” or “correct according to the requirement” should be made by the end-users.
Unlike the previous two phases, this phase does not have an inner circle. The project team actually is focused on assembling all pieces of code and designs. The first version of the BI front-end operates as it was delivered in the previous phase.
Deployment and Operation Handover
When the Implementation phase completes, the project has an almost “product-ready” sets of code and designs to be deployed and executed in the PRODUCTION environment. So the focus of this phase is, first, to make sure the code executes in the PRODUCTION environment smoothly, second, the make sure relevant documentations and handover activities are completed safely.
The picture below shows the tasks at this phase. The first sets of tasks are focused on deploying and testing in Pre-PRODUCTION environment. With the activity of deployment to PRODUCTION, the project is getting ready for documentations and organization implementations. After the PRODUCTION delivery is officially started, the project will close down the FastTrack solution (which is the first-version front-end solution), and start with the user training as well as the handover of documentations and test cases. Relevant management procedures are established for the data warehouse. And this phase ends with the review of project process.
The four phases described above are executed in iterations. In other words, they can be put into different releases according to the roadmap of a data warehouse project program. In a large enterprise, building a data warehouse is a multi-year project program. It is not possible to tackle the whole elephant down in one struggle. Executing the 4 phases does not mean to take all these activities one by one again and again. One should pick and utilize the relevant activities as needed for each release.