DDML: An Object-Oriented Approach to Dynamic Data Modeling
Luis A. Ramos
Object Design, Inc.
Burlington, MA
ramos@odi.com
1.Introduction
Among object-oriented programming languages, C++ [Stroustrup 1991] and Java [Gosling et. al. 1996] are in wide use today. A search in one of the popular portals on the web gives a good indication of their popularity as shown in the graph below. {footnote: We used Yahoo's advanced search feature to find web sites that contain the programming language and the two keywords "program" and "object", for example, "Java program object". It is interesting to note that object-oriented COBOL returned more hits than CLOS.} Applications that are developed in C++ and Java mostly involve "static" (as opposed to dynamic) data modeling, where developers employ object-oriented features and practices including inheritance, encapsulation, and polymorphism to achieve highly extensible and reusable software. Consequently, one can realize the benefits of reduced time to market and decreased maintenance costs.
Dynamic data modeling has emerged in response to the growing need to adapt to rapidly changing business requirements. Wouldn't it be great if one could employ the same object-oriented techniques to model dynamic models as well as static models so that dynamic applications can be developed in a more scalable and maintainable way?
This paper describes a toolkit that enables an object-oriented approach to modeling dynamic data. The product, Dynamic Data Modeling Library or DDML, is commercially available in C++ and Java. The following key requirements drove its design: modeling flexibility, extensibility of dynamic types and behavior, reusability of dynamic types, dynamic schema integrity, performance, and interoperability with existing code. It utilizes an object database ObjectStore [Lamb el. al. 1991] from Object Design, Inc.
As early as 1996, Object Design Professional Services has been developing applications with dynamic models for customers. Some examples include Earthlink's personalized start page and a personalization application for NBC. In mid-1997, Object Design productized some of this work to develop a more general-purpose dynamic modeling toolkit. The initial release R1.0 of the product was launched in late 1997. Several minor releases, R1.1 and R1.2, followed shortly thereafter in 1998. The next release R6.0 will be available September of 1999 and will be packaged with ObjectStore 6.0, Object Design's flagship database product. DDML will be offered as part of the database's core technology that any ObjectStore customer can use. To date, DDML has provided key dynamic modeling technology to several deployed customer applications including Knight-Ridder Real Classifieds, Cable & Wireless, Nippon Telegraph & Telephone Corporation (NTT), Kerry Corporation, Sothebys, Enron, LiquidMarket, and many others.
In this paper, we present an object-oriented approach to dynamic modeling. We begin by introducing basic data modeling features of DDML in Section 2. The background information can be helpful in understanding the patterns that will be presented later. Then in Section 3, we present patterns that offer an object-oriented approach to dynamic modeling and show how DDML directly supports them. Whenever appropriate, we elaborate on patterns that were used to design relevant DDML features. Patterns are presented in a format that is relatively less rigid but at the very minimum they describe the motivation or problem, related design patterns, and implementation. Finally in Section 4, we draw our conclusions. This paper contains an Appendix Section which concentrates on database performance and scalability related design patterns that DDML directly supports. They are included here for the interested reader.
2. DDML Background
DDML employs a metalevel architecture to realize dynamic schema. To create a dynamic class, one allocates a class object (this term is taken from [Riehle & Mätzel 1998]) called DynamicClass. Attributes are defined by invoking a method addAttribute(). To create a dynamic instance, one allocates an instance of DynamicInstance and attaches it to a DynamicClass. Once attached, you can get and set values for attributes that are defined in the dynamic class. The semantics of an attachment is that the object is a dynamic instance of the dynamic class. For the remainder of this section, we highlight features of DDML and their details.
Dynamic Attributes. Attributes can be added and removed on the fly. This clearly depicts a DECORATOR PATTERN. Attributes are always added to the dynamic class and not directly to the dynamic object. Even though the direct attachment of properties to objects provides greater flexibility, this restriction exists in order to facilitate the preservation of the dynamic schema's integrity. Furthermore, it is not object-oriented.
Attribute types. Basic types such as integer, string, float, and boolean are supported. Most of these types can be constrained. Moreover, any attribute type can be created. In addition, attributes can be multivalued. With these latter two features, relationships can be defined dynamically including one-to-one, one-to-many, and many-to-many relationships. Finally, attribute display names are directly supported.
Reflection. The class DynamicClass provides functionality to reflect its dynamic schema. Methods that retrieve all or a specific attribute are available. In addition, methods to retrieve a given dynamic class or all of the dynamic classes in the database are provided.
Extending C++/Java classes with dynamic attributes. A dominant requirement that drove the design of DDML was the ability to augment your C++/Java classes with dynamic attributes. This is achieved by inheritance. The DynamicInstance class consists of a "dictionary," which holds dynamic data (also known as name-value pairs or property lists), and methods to access and manipulate them. Existing classes become dynamic-capable by inheriting this class. Consequently, during the design and analysis stage, data modelers will have the flexibility to define object properties as either static (data members) or dynamic attributes. Why not define all properties as dynamic? The reason is related to performance. A dynamic attribute will certainly incur more space and time overhead than a regular data member. Consequently, if a property is not expected to change (e.g., the attribute is removed or its type is changed), then we suggest modeling them as data members.
Virtual Methods. All public methods in the DynamicClass and DynamicInstance classes are virtual to enable you to specialize their behavior. Any code (handlers) that uses these APIs are reused through polymorphism. For example, you can create a Category class which inherits from DynamicClass then specialize inherited methods and add new methods. In another example, you can create a Product class which inherits from DynamicInstance then specialize its attribute accessor methods. This is equivalent to binding PRE and POST hook functions.
Query Support. In DDML, dynamic attributes can be queried and indexed just like regular data members. Whenever a dynamic attribute is created, DDML automatically creates an accessor function which can be used in a query string to query the attribute. These functions have scope within the query string only. Internally, DDML creates an invocation object (this is term taken from [Riehle & Mätzel 1998]) for each dynamic attribute and is used during the execution of queries and the maintenance of indexes. The same pattern is used by ObjectStore's query engine to support regular C++/Java accessor methods in query strings. Consequently, dynamic attributes are queried and indexed in exactly the same way as regular data members and data member accessors so that queries on regular data members and attributes can be mixed in the same query string.
For example, suppose that a database had car objects that have a data member Price and a dynamic attribute Doors for the number of doors. To find cars that cost less than $20,000 and have 2 doors, the query string would be formulated as "(Price <20,000) && (getDoors() == 2)". When this string is parsed, the query compiler will encounter the query function "getDoors()" which it will lookup in the dictionary. If it is a valid attribute, the corresponding accessor object will have the behavior to access the attribute value. Now, if the class had a C++ method called getPrice(), then the method could be registered as a query function so an alternative query string could be formulated as "(getPrice() <20,000) && (getDoors() == 2)".
3. Design Patterns Used to Develop DDML
3.1 Dynamic Inheritance Pattern
Reuse and encapsulate dynamic attributes across multiple dynamic classes.
Consider the following small subset of categories for an on-line classified advertising application as shown in the figure below. Both Residential and Transportation categories have common attributes Contact Name and Phone Number. When the dynamic model is changed so that an attribute is added to each category (or a subtree), then metadata must be created and attached to each one. The problem with this approach is that multiple copies of the metadata will exist so keeping them in synch could become difficult as soon as an attribute is modified. One could employ a FLYWEIGHT PATTERN by creating a single instance of the metadata that all categories can share. But how is this managed? Where shall the metadata be attached? Place it in a "base class" and inherit it.
In DDML, the DynamicClass maintains a dictionary of attributes. Furthermore, it has two collections: one for its parents (dynamic base classes) and another for its children (dynamic derived classes). When an attribute is added to a dynamic class, it is semantically inherited by all its descendants. With these structures, one can grow and prune inheritance trees dynamically.
A dynamic class can directly inherit from one or more dynamic classes (multiple inheritance). Except for cyclic inheritance, there are no restrictions as to the inheritance tree structure. Base classes can have a common dynamic base class since DDML can handle an analogous virtual base class problem. The class DynamicClass provides methods that reflect dynamic inheritance hierarchy, including the retrieval of all its attributes (both directly defined and inherited), base classes, ancestor classes, derived classes, and descendants classes.
So far, DDML supports three ways to extend your data model. First, you can use standard C++ or Java inheritance to add regular data members to your classes. DDML does not inhibit you from creating or reusing your own "static" data models. Second, you can dynamically add attributes on the fly. One can take an existing class and derive from the DynamicInstance class which encapsulates a dynamic set of name-value pairs. And third, you can reuse attributes that are defined in other categories through DDML's dynamic inheritance.
How about schema contractions (the reverse of extension)? How is this handled in DDML? If a dynamic instance is attached to a dynamic class, its attribute values are dependent on the class' schema. If an attribute is dropped, then all instances that have a value set for it must remove that attribute value. DDML automatically maintains this for all instances that belong to all of the dynamic class' descendants (inclusively). Furthermore, any index that uses the removed attribute as a key is automatically dropped. On the other hand, when an attribute is added to a dynamic class, adjustments to the instances are unnecessary. Values are created and added lazily to the instance's dictionary at the time that the attribute value is set. When a dynamic class A disinherits another dynamic class B, then attributes of A are no longer inherited by B. In this case, all instances that belong to the dynamic class B and its descendants undergo attribute cleansing. There is indirect support for other schema changes such as changes to the attribute type, generalization (moving a common attribute up to a common base class) specialization (adding unique attributes to a derived class), and instance reclassification (when a new derived class is defined and instances of the base class may need to be "reinstantiated").
The Gang of Four [Gamma et. al. 1995] remark that the formulation of a pattern is determined by what can and cannot be implemented easily based on what is directly supported by the programming language. Furthermore, they state that if one were to utilize a procedural language such as C, then object-oriented features such as "Inheritance" could be considered a pattern. Based on the same argument, dynamic inheritance is not built into C++ and Java. Instead, we have C++/Java augmented by the DDML toolkit. When used together, we propose dynamic inheritance as a design pattern.
3.2 Smart Attribute Pattern
Define a hierarchy of concrete classes, which implement an abstract class, to encapsulate data and behavior that is used to augment dynamic attributes.
A toolkit that provides dynamic modeling capability comes with built in support for data types (e.g., integer and string) and behavior. If a developer requires a type or behavior that is not natively supported, then the developer may have to undertake all sorts of hoops and workarounds or roll their own toolkit.
In DDML, an attribute's type and behavior can be extended (at compile time) using a STRATEGY PATTERN. The developer can attach a user-defined annotation object to the attribute which encapsulates the additional information and behavior. Examples of additional data could be the display string (there could be several of these under different languages--Japanese, German, French, etc.), flag to indicate that the attribute is queryable, and display status (e.g., hidden, display in a summary, display all details). An abstract class can be created for these annotation objects so that the same handler code can be reused for all annotations. For example, a display method can be implemented for each attribute so that a common handler can manipulate all attributes through the same interface. Furthermore, annotations can be used to realize the Environmental Acquisition abstraction mechanism [Gil & Lorenz 1996] in order to display the attribute in various languages, depending on web user's profile (i.e., geographic location).
The SMART ATTRIBUTE PATTERN is similar to the SMART VARIABLES PATTERN [Foote & Yoder 1998]. The difference is that the "smartness" is applied to the attributes rather than the variable. Furthermore, data is augmented in addition to the behavior. When this pattern is applied in conjunction with dynamic inheritance, interesting features become possible. For instance, a single instance for a default value can be attached to an attribute in order to reduce database footprint. This is a direct application of the FLYWEIGHT PATTERN. If the attribute's class is defined for a dynamic class, then all its descendant classes will inherit this default value. The default value can be changed at one of its descendants and it will only apply to that descendant's subtree. In DDML, default values are directly supported this way. The same technique can be used for other data and methods.
3.3 Dynamic Method Pattern
Encapsulate functions with an object and associate it with a dynamic class.
So far, we have discussed patterns for extending and contracting dynamic data. But how about behavior? Once an application is compiled, byte code generated, and an executable linked, how does one extend behavior at run-time without shutting down the application? With Java, one can load and unload class files to a running program dynamically. Once loaded, byte code can be executed. This is an approach used by eXcelon, an XML data server from Object Design, Inc. In eXcelon, Java server extensions are loaded and registered then the behavior becomes part of the server's suite of services. Similarly for C++, shared libraries can be loaded and unloaded dynamically and functions executed. However unlike Java, object code from C++ is not as portable as Java byte code.
With DDML, we plan to employ the same approach but enhanced to enforce object-oriented rigor. Functions that are loaded and registered will be associated with a dynamic class and subject to the same inheritance semantics as dynamic attributes. In this way, we can achieve the same benefits from inheritance and encapsulation. The pattern described in this subsection is a proposed design that has not yet been implemented.
Using C++/Java to implement dynamic behavior is like a double-edged sword. On one hand, it compels domain experts, who are most knowledgeable about their domain's business rules, to write programs (ok, maybe its not as bad in Java). On the other hand, it could be an advantage because we avoid inventing or learning yet another scripting language.
3.4 Dynamic Polymorphism Pattern
Organize dynamic methods under an abstract class.
Once dynamic methods are realized using the DYNAMIC METHOD PATTERN, these methods can be organized under a common interface to enhance the dynamic code's reusability. The pattern described in this subsection are proposed designs that has not yet been implemented in DDML. The elements involved in this pattern are sketched as follows:
3.5 Dynamic Instantiation Pattern
Detach a dynamic instance from its dynamic class and reattach it to another. Maintain the dynamic schema integrity of the dynamic instance.
In E-Commerce applications, it is sometimes necessary to reclassify an object from one category to another (product reclassification). We refer to this as dynamic reinstantiation. When this occurs, the object's dynamic data must conform to the new category's dynamic schema.
With DDML, an instance can be detached from one dynamic class then reattached to another. When detached, its attribute values are preserved in the object and when the object is attached to a dynamic class (that could be an ancestor, descendant, sibling, or same class), its attribute values are validated against the target class' dynamic schema.
3.6 Multiple Instantiation Pattern
Attach a dynamic instance to more than one dynamic class.
In E-Commerce applications, products can realistically belong to more than one category. For example, the HP OfficeJet R40 can function as a printer, scanner, and copier so it could be classified under three separate categories. One way to model this is apply a FACADE PATTERN by creating a single category which inherits from all three categories. The product can then be attached to that category. Unfortunately, this approach can potentially result in an explosion in the number dynamic classes due to the large number of possible combinations between categories.
A simpler approach is to allow a dynamic instance to belong to more than one category. This has no analogy in C++ or Java since an object cannot be an instance of more than one class at the same time, except with multiple inheritance.
DDML provides direct supports for the MULTIPLE INSTANTIATION PATTERN. It provides functionality to retrieve a dynamic instance's dynamic classes through an iterator. This applies the ITERATOR PATTERN to encapsulate access to the dynamic classes.
3.7 Common API to access Dynamic Attributes and Data Members
Provide a common API to access dynamic attributes and data members. This allows a common style for accessing them and encapsulates the actual implementation of the data.
With the introduction of dynamic attributes, developers are faced with the burden of two sets of APIs to access dynamic attributes and data members. For instance, if Age were a data member, one might use an accessor method getAge(). On the other hand, if Age were a dynamic attribute, then one might utilize the DynamicInstance method getIntValue("Age"). One way to unify these APIs is to encapsulate the getIntValue("Age") in the implementation of accessor method getAge(). This way, both dynamic attributes and data members can be accessed through the same API. This pattern appears to be a specialized variation of the FACADE PATTERN.
int MyClass::getAge() { return getIntValue("Age"); }
Alternatively, the getAge() can be encapsulated in getIntValue("Age") using database functionality to retrieve a data member value given the data member name. For example, ObjectStore’s has a Meta-Object protocol (MOP) which supports this.
4. Summary and Conclusions
This paper presented several design patterns which are key elements to an object-oriented approach to dynamic modeling. We described a dynamic modeling toolkit DDML which was designed to directly support this approach. We believe that an object-oriented approach can improve the productivity of dynamic modelers and help develop dynamic applications in a more scalable and maintainable way.
5. Acknowledgements
Special thanks go to Matt BenDaniel for discussions in the area of dynamic behavior; Francois Forster, Tetsuya Tohdo, Alan Santos, Deborah Vlock, and specially Kacper Nowicki for reviewing this paper and offering wonderful feedback and ideas; Satish Maripuri for his support which was crucial in making this paper a reality; George Feinberg for discussions on ObjectStore eXcelon server extensions and how it can be adapted to implement dynamic polymorphism; and John Blais for developing many key features of this product.
6. References
[Foote & Yoder 1998] Brian Foote and Joseph Yoder. Metadata and Active Object Model. OOPSLA '98 Workshop on Metadata and Dynamic Object-Model Pattern Mining. Vancouver, BC.
[Gil & Lorenz 1996] Joseph Gil, David H. Lorenz. Environmental Acquisition - A New Inheritance-Like Abstraction Mechanism. OOPSLA 1996, pages 214-231.
[Gosling et. al. 1996] James Gosling, Bill Joy, and Guy Steele. The JavaTMLanguage Specification. Addison-Wesley, Reading, MA, 1996.
[Lamb el. al. 1991] Charles Lamb, Gordon Landis, Jack Orenstein, Dan Weinreb. The ObjectStore database system. Communications of the ACM Vol. 34, No. 10 (Oct. 1991), pages 50-63.
[Riehle & Mätzel 1998] Dirk Riehle and Kai-Uwe Mätzel. Using Reflection to Support System Evolution. OOPSLA '98 Workshop on Modeling Dynamic/Emergent Distributed Object Systems. Vancouver, BC.
[Stroustrup 1991] Bjarne Stroustrup. The C++ Programming Language. Second Edition. Addison-Wesley, Reading, MA, 1991.
APPENDIX A: Database Related Patterns
This section presents design patterns that specifically relates to the use of the underlying database. DDML was designed to enable these patterns.
A.1 Data Partitioning
Divide the objects into mutually exclusive sets.
When objects are organized under a common structure such as a table, collection, or index, as shown in the following figure, these central structures can be subject to lock contention so the application cannot scale easily as the workload increases. However if the data is partitioned into mutually exclusive sets, then contention can be reduced. In application server architectures, a server would be dedicated to a subset of partitions. In this way, they never contend for the same data. As workload increases, additional servers can be launched (on possibly additional hardware) and the partitions are distributed further. Furthermore, the duplication of cached data at the client is reduced so that overall memory usage is conserved.
In DDML, partitions can be created by allocating a series of attributeless dynamic classes which all derive from a common dynamic class that holds the dynamic schema. All child classes will share a common inherited schema from the parent class. Recall that a dynamic class holds a reference to all of its dynamic instances. The instances are attached and distributed across these child dynamic classes using an appropriate distribution algorithm, for example by geography (i.e., each state has a partition), by time (i.e., each month/year has a partition), by type (i.e., each type of purchase order has a partition), or by name (i.e., all names starting with A reside in one partition, B in another partition, etc.). Any changes to the schema are directed to the parent class (or any of its ancestors).
A.2 Attribute Clustering
Cluster attributes that tend to be accessed together.
Typically, when an E-Commerce web site is searched, the full details of the matching products are not displayed immediately. Instead, a table of matching objects is presented that shows a subset of information (summary attributes) such as the product name, price, SKU, and inventory status. This is certainly a common use case pattern. If the user is interested in a particular item, one click of the item displays its full details.
If the summary attributes are not clustered, the application cannot realize its full performance potential for two reasons. First, the matching products might be scattered all over the database. And second, the attributes that are not needed could be retrieved together with those that are, incurring more disk and network bandwidth than is necessary. By clustering the desired attributes, a smaller working set could be achieved.
DDML provides a feature to cluster at the attribute level. By default, an attribute value is clustered with the dynamic instance. Through the feature, one can specify a common area in the database where attribute values will be stored.