Friday, March 14, 2008

Canonical Best Practices

“Canonical” is a typical IT industry buzzword - it is an overloaded term with multiple meanings and no clear agreement on its definition. There appears to be at least three uses for canonical modeling.

-- Canonical Data Modeling
-- Canonical Interchange Modeling
-- Canonical Physical Formats

One of the challenges at most large corporations is to achieve efficient information exchanges in a heterogeneous environment. The typical large enterprise has hundreds of applications which serve as systems of record for information and were developed independently based on incompatible data models, yet they must share information efficiently and accurately in order to effectively support the business and create positive customer experiences.

The key issue is one of scale and complexity and is not evident in small to medium sized organizations. The problem arises when there are a large number of application interactions in a constantly changing application portfolio. If these interactions are not designed and managed effectively, they can result in production outages, poor performance, high maintenance costs and lack of business flexibility.

Canonical Data Modeling is a technique for developing and maintaining a logical model of the data required to support the needs of the business for a subject area. Some models may be relevant to an industry supply chain, the enterprise as a whole, or a specific line of business or organizational unit. The intent of this technique is to direct development and maintenance efforts such that the internal data structures of application systems conform to the canonical model as closely as possible. In essence, this technique seeks to eliminate heterogeneity by aligning the internal data representation of applications with a common shared model. In an ideal scenario, there would be no need to perform any transformations at all when moving data from one component to another, but for practical reasons this is virtually impossible to achieve at an enterprise scale. Newly built components are easier to align with the common models, but legacy applications may also be aligned with the common model over time as enhancements and maintenance activities are carried out.

Canonical Interchange Modeling is a technique for analyzing and designing information exchanges between services that have incompatible underlying data models. This technique is particularly useful for modeling interactions between heterogeneous applications in a many-to-many scenario. The intent of this technique is to make data mapping and transformations transparent at build time. This technique maps data from many components to a common Canonical Data Model which thereby facilitates rapid mapping of data between individual components since they all have a common reference model.

Canonical Physical Format prescribes a specific runtime data format and structure for exchanging information. The prescribed generic format may be derived from the Canonical Data Model or may simply be a standard message format that all applications are required to use for certain types of information. The formats are frequently independent of either the source or the target system, and requires that all applications in a given interaction transform the data from their internal format to the generic format. The intent of this technique is to eliminate heterogeneity for data in motion by using standard data structures at run-time for all information exchanges.

Canonical techniques are valuable when used appropriately in the right circumstances. Some recommendations are:

-- Use canonical data models in business domains where there is a strong emphasis to “build” rather than “buy” application systems.
-- Use canonical interchange modeling at build-time to analyze and define information exchanges in a heterogeneous application environment.
-- Use canonical physical formats at run-time in many-to-many or publish/subscribe integration patterns. In particular in the context of a business event architecture.
-- Plan for appropriate tools to support analysts and developers. Excel is not sufficient for any but the simplest canonical models.
-- Develop a plan to maintain and evolve the canonical models as discrete enterprise components. The ongoing costs to maintain the canonical models can be significant and should be budgeted accordingly.

How can we minimize dependencies when integrating applications that use different data formats? Conventional thinking suggests that a Canonical Data Model, one that is independent from any specific application, is a best practice. By requiring each application to produce and consume messages in this common format, components in a SOA are more loosely coupled.

Here is what Gregor Hohpe has to say in his book Enterprise Integration Patterns:

The Canonical Data Model provides an additional level of indirection between application's individual data formats. If a new application is added to the integration solution only transformation between the Canonical Data Model has to be created, independent from the number of applications that already participate.
The promised benefits of canonical best practices generally include increased independence between components so that one can change without affecting other components and simplified interactions because all applications use common definitions for interactions. As a result solutions are expected to be lower cost to develop, easier to maintain, higher quality in operation, and quicker to adapt to changing business needs.

While canonical best practices can indeed provide benefits there is also a cost that must be addressed. The canonical model and interchanges themselves are incremental components which must be managed and maintained. Canonical data models and interchanges a) require effort to define, b) introduce a middleware layer (either at the build time or run time depending on which techniques are used), and c) incur ongoing maintenance costs. These costs can exceed the benefits that the canonical best practices provide unless care is taken to use the techniques in the right circumstances.

For a large scale system-of-systems in a distributed computing environment, the most desirable scenario is to achieve loose coupling and high cohesion resulting in a solution that is highly reliable, efficient, easy to maintain, and quick to adapt to changing business needs. Canonical techniques can play a significant role in achieving this ideal state.

The three canonical best practices are generally not used in isolation; they are typically used in conjunction with other methods as part of an overall solutions methodology. As a result, it is possible to expand, shrink, or move the “sweet spot” subject to how it is used with other methods. This posting does not address the full spectrum of dependencies with other methods and their resultant implications, but it does attempt to identify some common anti-patterns to be avoided.

Anti-Patterns:

Peanut Butter: One anti-pattern that is pertinent to all three canonical best practices is the “Peanut Butter” pattern which basically involves applying the methods in all situations. To site a common metaphor, “to a hammer everything looks like a nail”. It certainly is possible to drive a screw with a hammer, but it’s not pretty and not ideal. It is also possible to make a hole with hammer – but it might have some very jagged edges.

When, and exactly how, to apply the canonical best practices should be a conscious, well-considered decision based on a keen understanding of the resulting implications.

Canonical Data Model Best Practice

The sweet spot for a Canonical Data Model (or logical data model) is in directing the development of data structures for custom application development. This could apply for new applications built from scratch or modifications to purchased applications. This discipline dates back to the early 1990’s when the practice of enterprise data modeling came into the mainstream. It continues to be widely practiced and effective method for specifying data definitions, structures and relationships for large complex enterprise solutions. Vendors like SAP and Oracle use this method for their ERP solutions.

This technique, when applied effectively, tends to result in a high level of cohesion between system components. It is highly cohesive since its attributes are grouped into entities of related information. It also tends to result in tightly coupled components (since the components are very often highly interdependent) which are able to support very high performance and throughput requirements.

Anti-patterns:

Data model bottleneck: A Canonical Data Model is a centralization strategy that requires an adequate level of ongoing support to maintain and evolve it. If the central support team is not staffed adequately, it will become a bottleneck for changes which could severely impact agility.

1 comment:

Jack van Hoof said...

Hi Renee,

You might be interested in these posts on canonical modelling:

Canonical Data Model is the incarnation of Loose Coupling

And This one

-Jack