|
THE DYNAMIC DATA WAREHOUSE |
|
Written by Stephen Lahanas
|
For nearly twenty years, our industry has viewed data architecture as a fractured set of separate yet related components. At the heart of this collection of capabilities we often find what is referred to as an ‘Enterprise Data Warehouse’ (EDW). For the last decade though, the EDW has been evolving into something more – a more flexible yet comprehensive approach to the complete data layer. Making this transition involves not just technical improvements but a philosophical re-evaluation as well in regards to how data management, discovery and exploitation are viewed. This new philosophy and solution approach can be referred to as the Dynamic Data Warehouse.
Introduction
The average enterprise is a fairly complex place, hosting a variety of
solutions at various stages of their overall lifecycle – each one
perhaps representing differing philosophical and technical approaches
popular when they when instantiated. However, all of these disparate
pieces are expected to work together, somehow. These solutions also go
through cycles of centralization and de-centralization and seemingly
these competing philosophical approaches to information management have
yet to be reconciled. This is sometimes characterized as a battle
between more comprehensive solution governance and emerging capability
realization. The disruptive technologies often appear to foil the
central plan and solution and begin growing the enterprise again in a
heterogeneous fashion.
The Enterprise Data Warehouse emerged in the mid-1990s in response to
the explosion of distributed systems that began proliferating across
enterprise environments a few years prior. The promise of inexpensive,
immediate capability was quite appealing at first, but people soon
realized that there were many other unanticipated costs involved. The
question that plagued IT managers was “how do we control the flow of
data between all of these emerging systems and how do we ensure quality
in the midst of this chaos”? Out of this problem space, the premise of
the Data Warehouse was born. The EDW represented a swing back towards
more comprehensive IT control, through centralization of all core
business data within one system. Order from Chaos, “The Single Version
of Truth,” the EDW philosophically and technically tackled the most
pervasive problem in IT at the time. Unfortunately, it was then and
still remains largely out of reach for most potential adopters. Why is
that?
The Philosophy of EDW
The Data Warehouse concept is built atop the notion that all data
related to the enterprise can be captured and centrally or holistically
managed. This is a powerful idea, yet there is more than one way to
achieve that goal. The traditional view of the EDW attacked the problem
from a very DBMS-centric perspective. This is primarily why EDW
projects become so expensive, difficult and ultimately hard to adopt.
The typical EDW approach attempted to gather all of the data related to
the enterprise and place it into one massive repository structure.
Whether this approach was attempted in chunks or as a “Big Bang”
assault made little difference in the long the run as the byproducts of
the practice were the same; those byproducts included:
-
A more bureaucratic management approach to the data layer in general.
-
An added degree of separation between the data owners and the data developers.
-
A certain level of inflexibility in regards to how data was updated, corrected or otherwise transformed.
-
An added degree of separation between database developers and data exploitation developers.
-
An added degree of separation between database developers and application developers.
-
An inability to quickly respond to major changes in the business.
-
Dependence upon a sub-set of industry experts and equipment that is more expensive than the industry norm.
-
A higher cost associated with scalability in general.
It is worth examining some of the core concepts associated with Data
Warehousing a little bit closer to help understand why these outcomes
tend to result from the traditional EDW approach.
DBMS Focus – At the time when EDWs became popular, other areas of data
architecture were only just beginning to blossom. Today’s Business
Intelligence platforms represent much more than mere reporting engines.
Metadata management was only just beginning to be understood in the
mid-1990’s and focus on Semantic technologies was virtually
non-existent. The world according to DBMS in 1995 had a relational
management system in the middle with ETL feeding into and reports
coming out. This might be thought of as a three layer, stove-piped
database systems view of the data architecture.
The Enterprise Single Instance – While consolidating like capabilities
into marts or stores or some other ‘functional single instance’
approach has achieved quite a bit of success over the past two decades,
attempting to manage all data in one structure has proven much more
difficult. This is why the notion of Massively Parallel Processing
(MPP) was needed to make it viable back in the 1990s. MPP was expensive
though and perhaps failed to recognize the power of networked
processors on inexpensive hardware (i.e. the Google scalability model).
The other key consideration here was the added steps that were needed
in order to make such a system perform within reasonable parameters.
So, the single instance enterprise faced and still faces major hurdles
in terms of costs, manageability and performance.
EDW Fallacies
If we were to directly challenge the core EDW assumptions and
illustrate the fallacies associated with the philosophy, our list would
resemble the following:
-
The Business will remain static over a relatively long period of time.
-
The Enterprise will remain static over a relatively long period of time.
-
That source data and data exploitation should not be managed
synergistically, in other words that Decision Support or Business
Intelligence solutions built on top of EDW source data should be viewed
as separate, albeit related efforts.
-
That the data layer and the application layer can or should be viewed or designed separately.
-
That computer Hardware would not catch up to the processing load – i.e.
that the data layer would always require Massively Parallel Processing
(MPP) in order to manage very large quantities of data. Furthermore,
this assumption also implied the data would remain in a single instance
data source.
-
That network architecture, data architecture, application / SOA architecture and enterprise architecture are separate.
-
That the Internet (Cloud) would not represent a viable mechanism for connecting to distributed data sources.
-
That unstructured data was not as valid as structured data (mainly
because no mechanism existed to incorporate into the traditional
database management approaches).
-
That most major transformations need to occur before data is placed
into the primary storage / management entity (i.e. DBMS, warehouse).
-
That there is a single version of the truth, period. This is perhaps
the biggest fallacy behind all data warehouse, MDM and governance
solutions. Data can be managed, but it is dynamic and all always will
be. Viewing data as incontrovertible, orthodox truth immediately
eliminates much of the value that data otherwise provides. Situations
change, and every stakeholder views the whole from their unique
perspectives. Yet, there can still be order in a relativistic
environment (much as there is in the real world).
Enter, The Dynamic Data Warehouse
The Dynamic Data Warehouse (DDW) represents more than the evolution of
the EDW approach. The DDW is a paradigm shift for data architecture,
but this shift is made possible by an evolution in our thinking. Many
in our industry now realize that the data layer is more than sum of its
pieces; it is a continuum of capabilities managed within a single
unified lifecycle. Data resources can be logically integrated and
holistically managed and the primary activities related to data are
intricately inter-related. Data management, discovery and exploitation
are part of the same process, same lifecycle and support the same
business goals.
The Philosophy behind the DDW begins with the following assumptions:
-
That distributed capability can be managed centrally.
-
That all aspects of data architecture belong within a single, unified Lifecycle framework.
-
That data stored without regard to anticipated value through exploitation is useless.
-
That data architecture must be user-centric, responsive and Agile.
-
That data transformations can occur anywhere within the architecture,
as long as they are understood and managed through policy.
-
That performance and usability always outweigh perceived solution
manageability. (if the custom doesn’t use it, nothing else matters,
period)
-
That change is inevitable and attempting to create a static, perfectly defined enterprise will end in failure.
-
That the Right solution is the one that works for the customer, not the
one that comes closest to adhering to prescriptive industry
definitions.
-
Recognition that there are now two clouds to consider, the one inside the enterprise and the one outside.
-
That all parts of data architecture represent a pool of data resources
or services. This pool comes closer than the original concept of EDW to
representing the true single data management framework. The DDW is the
logical counterpart to EDW without the restrictions and with added
interoperability across elements that had previously been managed
separately. The DDW is a single solution – but is also the complete
solution needed to ensure enterprise success.
Although there aren’t any major COTS vendors who are offering all
aspects of the DDW within a single stack of off-the-shelf products just
yet, some companies are coming very close. The reason that the industry
is moving in this direction is because it is hard for folks to see big
picture within the myriad of complex and partially related efforts they
now have to support (ETL, MDM, DBMSs or warehouse, BI etc.). Without
that ability, it is difficult to gain value from any one piece of the
puzzle no matter how well it may be carried out. Separating data
architecture efforts due to arbitrary focus on specific components
prevents organizations from understanding the full implications of the
eventual synergistic combination of these pieces.
Systems Integrators have understood this for some time and managed
projects within a unified framework successfully in many instances. For
these providers a data warehouse project has always involved the full
spectrum of related capabilities, from preliminary transformations
through deployment of sophisticated BI or Performance Management
platforms. The exact nature of each installation or architecture is
largely dependent on the client and their needs, which is exactly as it
should be. In these cases, the advent of a new technology doesn’t
interject changes into the management paradigm without first being
examined within the context of the unified plan. The new technology
must be assessed and blended into the overall approach. This all
happens not in the span of a single project, but across the lifespan of
the enterprise. The DDW is not a build and leave it type of approach.
It is Dynamic largely because of the implicit understanding that the
solution will change on a continual basis. New requirements will be
added, data perspectives will change, disruptive technologies will show
up, the DDW puts all of this into single solution approach that can be
managed and tracked holistically yet still remain flexible.
The DDW is a philosophical approach, a comprehensive architectural
framework for the data layer, a combination of related products and
customer domains – and it is a solutions integration paradigm that
allows the enterprise once and for all to manage its data within a
unified fusion of governance processes.
I first realized that this approach was not only possible, but
preferable to the traditional EDW approach while working in support of
various USAF projects several years ago. I had the opportunity to
evaluate a number of different approaches in action and gauge the
results of those practices. In each case, where the systems integrators
followed the DDW approach, the USAF achieved unparalleled return on
investment. The best example of this occurred and is still occurring
with the Commanders Resource Integration System (CRIS), a solution
managed by the Teksouth Corporation. Teksouth applied Master Data
Management, revolutionary caching technology and a federated data
approach to produce an agile, cost effective and extremely popular
system. It still remains the most effective data warehouse solution in
the DoD both in terms of performance and value. More importantly, their
philosophical and architectural approach makes it scalable enough to
tackle Financial Management Analytics for the entire DoD. Other vendors
and integrators have been trying to comply with federal mandates
delivered in 2001 to make the DoD’s financial systems fully
accountable, yet of all of the money spent, no system has come close to
what CRIS accomplished within two years of being launched.
Conclusion
Maybe the DDW is merely the common sense approach dictated by our years
of experience in the industry and the technology available to us right
now. We understand what works and what doesn’t through that experience
and we’ve seen all too clearly that the either / or dichotomy of
centralize vs. distribute isn’t as black and white as it once appeared
to be. We can have the best of both worlds as long as we remember that
whatever we build meets customer expectations before satisfying our own
sense aesthetics.
|
Stephen Lahanas |
| About the author: |
| Mr. Lahanas is principal consultant and co-founder of Semantech Inc. Mr. Lahanas has served as a CIO, a Chief Engineer in two USAF Decision Support Program Management Offices (PMO's) and also served as a lead Enterprise Architect for projects at US Army NETCOM, USAF Cyber Command, the FAA and The Department of Homeland Security. He has been working with Semantic technology since 1999 when he served on the Cisco E-Learning Architecture team and has been applying aspects of it to enterprise-scale projects ever since.
|
| Read More >> |
|
|