Home arrow Articles arrow THE DYNAMIC DATA WAREHOUSE
THE DYNAMIC DATA WAREHOUSE
User Rating: / 6
PoorBest 
Written by Stephen Lahanas   

ImageFor nearly twenty years, our industry has viewed data architecture as a fractured set of separate yet related components. At the heart of this collection of capabilities we often find what is referred to as an ‘Enterprise Data Warehouse’ (EDW). For the last decade though, the EDW has been evolving into something more – a more flexible yet comprehensive approach to the complete data layer. Making this transition involves not just technical improvements but a philosophical re-evaluation as well in regards to how data management, discovery and exploitation are viewed. This new philosophy and solution approach can be referred to as the Dynamic Data Warehouse.

Introduction

The average enterprise is a fairly complex place, hosting a variety of solutions at various stages of their overall lifecycle – each one perhaps representing differing philosophical and technical approaches popular when they when instantiated. However, all of these disparate pieces are expected to work together, somehow.  These solutions also go through cycles of centralization and de-centralization and seemingly these competing philosophical approaches to information management have yet to be reconciled. This is sometimes characterized as a battle between more comprehensive solution governance and emerging capability realization. The disruptive technologies often appear to foil the central plan and solution and begin growing the enterprise again in a heterogeneous fashion.

The Enterprise Data Warehouse emerged in the mid-1990s in response to the explosion of distributed systems that began proliferating across enterprise environments a few years prior. The promise of inexpensive, immediate capability was quite appealing at first, but people soon realized that there were many other unanticipated costs involved. The question that plagued IT managers was “how do we control the flow of data between all of these emerging systems and how do we ensure quality in the midst of this chaos”? Out of this problem space, the premise of the Data Warehouse was born. The EDW represented a swing back towards more comprehensive IT control, through centralization of all core business data within one system. Order from Chaos, “The Single Version of Truth,” the EDW philosophically and technically tackled the most pervasive problem in IT at the time. Unfortunately, it was then and still remains largely out of reach for most potential adopters. Why is that?

The Philosophy of EDW

The Data Warehouse concept is built atop the notion that all data related to the enterprise can be captured and centrally or holistically managed. This is a powerful idea, yet there is more than one way to achieve that goal. The traditional view of the EDW attacked the problem from a very DBMS-centric perspective. This is primarily why EDW projects become so expensive, difficult and ultimately hard to adopt. The typical EDW approach attempted to gather all of the data related to the enterprise and place it into one massive repository structure. Whether this approach was attempted in chunks or as a “Big Bang” assault made little difference in the long the run as the byproducts of the practice were the same; those byproducts included:

  • A more bureaucratic management approach to the data layer in general.
  • An added degree of separation between the data owners and the data developers.
  • A certain level of inflexibility in regards to how data was updated, corrected or otherwise transformed.
  • An added degree of separation between database developers and data exploitation developers.
  • An added degree of separation between database developers and application developers.
  • An inability to quickly respond to major changes in the business.
  • Dependence upon a sub-set of industry experts and equipment that is more expensive than the industry norm.
  • A higher cost associated with scalability in general.

It is worth examining some of the core concepts associated with Data Warehousing a little bit closer to help understand why these outcomes tend to result from the traditional EDW approach.

DBMS Focus – At the time when EDWs became popular, other areas of data architecture were only just beginning to blossom. Today’s Business Intelligence platforms represent much more than mere reporting engines. Metadata management was only just beginning to be understood in the mid-1990’s and focus on Semantic technologies was virtually non-existent. The world according to DBMS in 1995 had a relational management system in the middle with ETL feeding into and reports coming out. This might be thought of as a three layer, stove-piped database systems view of the data architecture.

The Enterprise Single Instance – While consolidating like capabilities into marts or stores or some other ‘functional single instance’ approach has achieved quite a bit of success over the past two decades, attempting to manage all data in one structure has proven much more difficult. This is why the notion of Massively Parallel Processing (MPP) was needed to make it viable back in the 1990s. MPP was expensive though and perhaps failed to recognize the power of networked processors on inexpensive hardware (i.e. the Google scalability model). The other key consideration here was the added steps that were needed in order to make such a system perform within reasonable parameters. So, the single instance enterprise faced and still faces major hurdles in terms of costs, manageability and performance. 

EDW Fallacies

If we were to directly challenge the core EDW assumptions and illustrate the fallacies associated with the philosophy, our list would resemble the following:

  • The Business will remain static over a relatively long period of time.
  • The Enterprise will remain static over a relatively long period of time.
  • That source data and data exploitation should not be managed synergistically, in other words that Decision Support or Business Intelligence solutions built on top of EDW source data should be viewed as separate, albeit related efforts.
  • That the data layer and the application layer can or should be viewed or designed separately.
  • That computer Hardware would not catch up to the processing load – i.e. that the data layer would always require Massively Parallel Processing (MPP) in order to manage very large quantities of data. Furthermore, this assumption also implied the data would remain in a single instance data source.
  • That network architecture, data architecture, application / SOA architecture and enterprise architecture are separate.
  • That the Internet (Cloud) would not represent a viable mechanism for connecting to distributed data sources.
  • That unstructured data was not as valid as structured data (mainly because no mechanism existed to incorporate into the traditional database management approaches).
  • That most major transformations need to occur before data is placed into the primary storage / management entity (i.e. DBMS, warehouse).
  • That there is a single version of the truth, period. This is perhaps the biggest fallacy behind all data warehouse, MDM and governance solutions. Data can be managed, but it is dynamic and all always will be. Viewing data as incontrovertible, orthodox truth immediately eliminates much of the value that data otherwise provides. Situations change, and every stakeholder views the whole from their unique perspectives. Yet, there can still be order in a relativistic environment (much as there is in the real world). 

Enter, The Dynamic Data Warehouse

The Dynamic Data Warehouse (DDW) represents more than the evolution of the EDW approach. The DDW is a paradigm shift for data architecture, but this shift is made possible by an evolution in our thinking. Many in our industry now realize that the data layer is more than sum of its pieces; it is a continuum of capabilities managed within a single unified lifecycle. Data resources can be logically integrated and holistically managed and the primary activities related to data are intricately inter-related. Data management, discovery and exploitation are part of the same process, same lifecycle and support the same business goals.

The Philosophy behind the DDW begins with the following assumptions:

  1. That distributed capability can be managed centrally.
  2. That all aspects of data architecture belong within a single, unified Lifecycle framework.
  3. That data stored without regard to anticipated value through exploitation is useless.
  4. That data architecture must be user-centric, responsive and Agile.
  5. That data transformations can occur anywhere within the architecture, as long as they are understood and managed through policy.
  6. That performance and usability always outweigh perceived solution manageability. (if the custom doesn’t use it, nothing else matters, period)
  7. That change is inevitable and attempting to create a static, perfectly defined enterprise will end in failure.
  8. That the Right solution is the one that works for the customer, not the one that comes closest to adhering to prescriptive industry definitions.
  9. Recognition that there are now two clouds to consider, the one inside the enterprise and the one outside.
  10. That all parts of data architecture represent a pool of data resources or services. This pool comes closer than the original concept of EDW to representing the true single data management framework. The DDW is the logical counterpart to EDW without the restrictions and with added interoperability across elements that had previously been managed separately. The DDW is a single solution – but is also the complete solution needed to ensure enterprise success.

Although there aren’t any major COTS vendors who are offering all aspects of the DDW within a single stack of off-the-shelf products just yet, some companies are coming very close. The reason that the industry is moving in this direction is because it is hard for folks to see big picture within the myriad of complex and partially related efforts they now have to support (ETL, MDM, DBMSs or warehouse, BI etc.). Without that ability, it is difficult to gain value from any one piece of the puzzle no matter how well it may be carried out. Separating data architecture efforts due to arbitrary focus on specific components prevents organizations from understanding the full implications of the eventual synergistic combination of these pieces.

Systems Integrators have understood this for some time and managed projects within a unified framework successfully in many instances. For these providers a data warehouse project has always involved the full spectrum of related capabilities, from preliminary transformations through deployment of sophisticated BI or Performance Management platforms. The exact nature of each installation or architecture is largely dependent on the client and their needs, which is exactly as it should be. In these cases, the advent of a new technology doesn’t interject changes into the management paradigm without first being examined within the context of the unified plan. The new technology must be assessed and blended into the overall approach. This all happens not in the span of a single project, but across the lifespan of the enterprise. The DDW is not a build and leave it type of approach. It is Dynamic largely because of the implicit understanding that the solution will change on a continual basis. New requirements will be added, data perspectives will change, disruptive technologies will show up, the DDW puts all of this into single solution approach that can be managed and tracked holistically yet still remain flexible.

The DDW is a philosophical approach, a comprehensive architectural framework for the data layer, a combination of related products and customer domains – and it is a solutions integration paradigm that allows the enterprise once and for all to manage its data within a unified fusion of governance processes.

I first realized that this approach was not only possible, but preferable to the traditional EDW approach while working in support of various USAF projects several years ago. I had the opportunity to evaluate a number of different approaches in action and gauge the results of those practices. In each case, where the systems integrators followed the DDW approach, the USAF achieved unparalleled return on investment. The best example of this occurred and is still occurring with the Commanders Resource Integration System (CRIS), a solution managed by the Teksouth Corporation. Teksouth applied Master Data Management, revolutionary caching technology and a federated data approach to produce an agile, cost effective and extremely popular system. It still remains the most effective data warehouse solution in the DoD both in terms of performance and value. More importantly, their philosophical and architectural approach makes it scalable enough to tackle Financial Management Analytics for the entire DoD. Other vendors and integrators have been trying to comply with federal mandates delivered in 2001 to make the DoD’s financial systems fully accountable, yet of all of the money spent, no system has come close to what CRIS accomplished within two years of being launched.

Conclusion

Maybe the DDW is merely the common sense approach dictated by our years of experience in the industry and the technology available to us right now. We understand what works and what doesn’t through that experience and we’ve seen all too clearly that the either / or dichotomy of centralize vs. distribute isn’t as black and white as it once appeared to be. We can have the best of both worlds as long as we remember that whatever we build meets customer expectations before satisfying our own sense aesthetics.

Stephen Lahanas
About the author:
 Mr. Lahanas is principal consultant and co-founder of Semantech Inc. Mr. Lahanas has served as a CIO, a Chief Engineer in two USAF Decision Support Program Management Offices (PMO's) and also served as a lead Enterprise Architect for projects at US Army NETCOM, USAF Cyber Command, the FAA and The Department of Homeland Security. He has been working with Semantic technology since 1999 when he served on the Cisco E-Learning Architecture team and has been applying aspects of it to enterprise-scale projects ever since.
Read More >>


 
< Prev   Next >

Search

Advertisement
Advertisement
© 2010 Data Strategy Journal
The Data Strategy Journal