I recently attended the Gartner BI Summit in Los Angeles. At the conference Stephen Brobst, CTO of Teradata, led a great presentation on Enterprise Data Warehousing at Target. During the presentation, Mr. Brobst talked about employing a phased approach to implementing data warehouses. He stressed the importance of completing a logical data model for the data warehouse prior to implementation. He warned that without such a model, 20% of the effort expended in each phase would be rework of the existing implementation due to the lack of that logical model.I was struck by this metric, but not because I thought it was surprising or excessive. In fact, this sounds like an entirely reasonable percentage to me and I readily accept it. I was more struck by the fact that this level of rework should be considered is a bad thing. Does this mean that the project experienced 20% wasted effort each phase, and that if the team had completed analysis and definition of the logical data model, this number would be zero?Not necessarily. If a team is planning to adopt any sort of phased implementation model, they should expect some level of rework. And this is a good thing! By choosing to deliver a data warehouse solution incrementally, the development team has, accidentally or intentionally, created an opportunity to generate feedback. Once software is put into their users hands, the team will have a chance to determine if the requirements they expressed and the designs they produced are accurate. invariably they discover that while some features meet the user’s expectations, others did not. Some reports that were requested might turn out to be incomplete; requiring additional information to meet the business needs. Other reports or analytics might prove to be redundant. A new, automated alert or dashboard might eliminate the need for a certain report. And regardless of the efforts made to verify requirements prior to or during implementation, these discoveries are often only made once the solution is in the users hands.Fortunately, though, because the development team chose to adopt a phased approach, they have an opportunity to remedy that. These changes can (and should) be introduced into subsequent phases of the project. Therefore, the team should assume some level of rework associated with each phase of the project. And while this may limit their ability to deliver new functionality in subsequent phases, it also enables them to deliver a higher quality product to their user community.So let’s assume that some portion of the 20% was dedicated to work that resulted from user feedback, and not purely from incomplete analysis and design. The second point I would like to consider is the cost to complete the logical data model. If the team failed to fully complete the model initially, I must assume that some analysis and design work was left on the table. Whether the team consciously decided to defer some work, or merely missed some detail, the effort they expended on the initial analysis and design was less than if they had completed the model. Therefore, that rework effort most likely represents some level of deferred, rather than wasted, effort. And in many cases, deferring work to the last responsible moment will do more to reduce waste rather than generate it.For example. Let’s say that your development team is currently working on the topic of Products within an enterprise data warehouse for a manufacturing company. The team’s objective is to represent the cost of manufacturing products. The cost calculation may be highly complex, involving the current market price of raw materials at the time of manufacture. Instead of performing the required analysis and modeling up front, the team could decide to initially employ a standard cost for raw materials, not only making the model simpler but also deferring efforts to acquire and transform market pricing. By making this decision, they recognize they are deferring some analysis, design and implementation work that will need to be addressed later. At the same time, this decision might have enabled the team to more rapidly deliver a solution that is satisfactory for most use cases. The cost model can then be addressed at a later point, when it is a higher priority for the product owner. Therefore, the rework incurred at that point is really not additional effort. And by properly planning, and employing Agile testing and refactoring practices, the team can minimize the additional effort involved in implementing this feature later in the project.Many still equate the work ‘rework’ as a failure to properly plan and design up front. But rework is not synonymous with waste. Martin Fowler once told a team of developers, “We call it software because it is soft. It is pliable and can be changed without tremendous cost” There is some level of up front design that is required to avoid excessive rework and waste, but we if we work too hard to avoid rework, we can often generate more waste.
I recently heard something at a BI conference that made me shudder. A speaker advocating for Big Data solutions used the phrase “Build it and they will come.” This is a slightly modified quote from the movie “Field of Dreams” (1989). In the movie, the ghost of “Shoeless Joe Jackson”, a famous but disgraced baseball player convinces Kevin Costner to convert his cornfield into a baseball field as a way to connect with his dead father. The message of the movie, “Ignore the naysayers. Pursue your dreams and they will come true”, was enthusiastically embraced by the emerging data warehousing industry in the 1990’s. Vendors repeated this mantra over and over again to new practitioners, seeking to convince IT organizations that, if you just build an enterprise data warehouse and fill it with all of your organizations data, your business users will flock to it. Establishing this foundation will enable user not only to meet current requirements, but also address future requirements. The enterprise data warehouse would become an informational platform and strategic asset to the business.While a number of very successful projects emerged, this approach largely led to failure. One of the primary flaws with this thinking is the assumption that all of the organization’s data is of equal value to the business, and that investing more time to consolidate more data would naturally provide more value. This has proven repeatedly to be a very false and very dangerous assumption. All data is not created equal. Some informational assets can have a profound impact on the business, others a negligible impact. Furthermore, the acquisition costs do not necessarily have a direct relationship to the value. By ignoring these two principles, IT organizations have invested millions of dollars acquiring and processing low-value data in the spirit of building complete, detailed representations of their business. I have personally observed numerous cases where business stakeholders, data analysts, modelers and ETL developers have wasted countless hours seeking to achieve a highly complex and detailed representation of their business model, where a simpler (and much cheaper) representation would have delivered the same value.In the time since, practitioners have learned that a phased approach that focuses on delivering the highest value information as quickly as possible. This principle is ultimately achieved in the application of Agile practices to support continuous delivery of analytics solutions. However, hearing this comment made me think that, with the excitement around Big Data technologies and opportunities, this lesson might need to be relearned.
Once upon a time, I had been an active data warehousing practitioner; building data warehouses and data marts to enable clients to gain information and knowledge from the data they accumulated. Then, in 2003, I encountered ThoughtWorks and Agile Software Development and passionately immersed myself in that world. However, I never lost my passion for data, and spent the next nine years trying to convince Roy Singham, founder of ThoughtWorks, that data warehousing and business intelligence were areas in which ThoughtWorks should become adept. Finally, Roy has relented, we’ve kicked off an Analytics practice (with Ken Collier at the lead) and I can return to the world of data. I am documenting my attempt to catch up with the industry, and our efforts to bring the application of Agile practices into the mainstream of the Business Intelligence and Analytics communities.