Monday, December 10, 2007

Defining the Model

2.3 Defining the Model
Let's step back and take a closer look at what we're really describing with an EMF model. We've seen in Section 2.1 that our conceptual model could be defined in several different ways, that is, in Java, UML, or XML Schema. But what exactly are the common concepts we're talking about when describing a model? Let's look at our purchase order example again. Recall that our simple model included the following:

PurchaseOrder and Item, which in UML and Java map to class definitions, but in XML Schema map to complex type definitions.

shipTo, billTo, productName, quantity, and price, which map to attributes in UML, get()/set() method pairs (or Bean properties, if you want to look at it that way) in Java, and in the XML Schema are nested element declarations.

items, which is a UML association end or reference, a get() method in Java, and in XML Schema, a nested element declaration of another complex type.

As you can see, a model is described using concepts that are at a higher level than simple classes and methods. Attributes, for example, represent pairs of methods, and as you'll see when we look deeper into the EMF implementation, they also have the ability to notify observers (such as UI views, for example) and be saved to, and loaded from, persistent storage. References are more powerful yet, because they can be bidirectional, in which case referential integrity is maintained. References can also be persisted across multiple resources (documents), where demand-load and proxy resolution come into play.

To define a model using these kinds of "model parts" we need a common terminology for describing them. More importantly, to implement the EMF tools and generator, we also need a model for the information. We need a model for describing EMF models, that is, a metamodel.

2.3.1 The Ecore (Meta) Model
The model used to represent models in EMF is called Ecore. Ecore is itself an EMF model, and thus is its own metamodel. You could say that makes it also a meta-metamodel. People often get confused when talking about meta-metamodels (metamodels in general, for that matter), but the concept is actually quite simple. A metamodel is simply the model of a model, and if that model is itself a metamodel, then the metamodel is in fact a meta-metamodel.[4] Got it? If not, I wouldn't worry about it, since it's really just an academic issue anyway.

[4] This concept can recurse into meta-meta-metamodels, and so on, but we won't go there.

A simplified subset of the Ecore model is shown in Figure 2.3. This diagram only shows the parts of Ecore needed to describe our purchase order example, and we've taken the liberty of simplifying it a bit to avoid showing base classes. For example, in the real Ecore model the classes EClass, EAttribute, and EReference share a common base class, ENamedElement, which defines the name attribute which here we've shown explicitly in the classes themselves.

Figure 2.3. A simplified subset of the Ecore model.


As you can see, there are four Ecore classes needed to represent our model:

EClass is used to represent a modeled class. It has a name, zero or more attributes, and zero or more references.

EAttribute is used to represent a modeled attribute. Attributes have a name and a type.

EReference is used to represent one end of an association between classes. It has a name, a boolean flag to indicate if it represents containment, and a reference (target) type, which is another class.

EDataType is used to represent the type of an attribute. A data type can be a primitive type like int or float or an object type like java.util.Date.

Notice that the names of the classes correspond most closely to the UML terms. This is not surprising since UML stands for Unified Modeling Language. In fact, you might be wondering why UML isn't "the" EMF model. Why does EMF need its own model? Well, the answer is quite simply that Ecore is a small and simplified subset of full UML. Full UML supports much more ambitious modeling than the core support in EMF. UML, for example, allows you to model the behavior of an application, as well as its class structure. We'll talk more about the relationship of EMF to UML and other standards in Section 2.6.

We can now use instances of the classes defined in Ecore to describe the class structure of our application models. For example, we describe the purchase order class as an instance of EClass named "PurchaseOrder". It contains two attributes (instances of EAttribute that are accessed via eAttributes) named "shipTo" and "billTo", and one reference (an instance of EReference that is accessed via eReferences) named "items", for which eReferenceType (its target type) is equal to another EClass instance named "Item". These instances are shown in Figure 2.4.

Figure 2.4. The purchase order Ecore instances.


When we instantiate the classes defined in Ecore to define the model for our own application, we are creating what we call a core model.

2.3.2 Creating and Editing the Model
Now that we have these Ecore objects to represent a model in memory, the EMF framework can read from them to, among other things, generate implementation code. You might be wondering, though, how do we create the model in the first place? The answer is that we need to build it from whatever input form you start with. If you start with Java interfaces, the EMF generator will introspect them and build the core model. If, instead, you start with an XML Schema, then the model will be built from that. If you start with UML, there are three possibilities:

Direct Ecore Editing— EMF's simple tree-based sample Ecore editor and Omondo's (free) EclipseUML graphical editor[5] are examples.

[5] You can download EclipseUML from the Omondo Web site at www.omondo.com.


Import from UML— The EMF Project Wizard provides this option for Rational Rose (.mdl files) only. The reason Rose has this special status is because it's the tool that we used to "bootstrap" the implementation of EMF itself.

Export from UML— This is essentially the same as option 2, except the conversion is invoked from the UML tool, instead of from the EMF Project Wizard.

As you might imagine, option 1 is the most desirable. With it, there is no import or export step in the development process. You simply edit the model and then generate. Also, unlike the other options, you don't need to worry about the core model being out of sync with the tool's own native model. The other two approaches require an explicit reimport or reexport step whenever the UML model changes.

The advantage of options 2 or 3 is that you can use the UML tool to do more than just your EMF modeling. You can use the full power of UML and whatever fancy features the particular tool has to offer. If it supports its own code generation, for example, you can use the tool to define your core model, and also to both define and generate other parts of your application. As long as the tool is able to export a serialized core model, then that tool will also be usable as an input source for the EMF framework/generator.[6]

[6] For option 3, the UML tool in question needs to support exporting to "Ecore XMI," not simply export to XMI. This issue is discussed in Section 2.6.3.

2.3.3 XMI Serialization
By now you might be wondering what is the serialized form of a core model? Previously, we've observed that the "conceptual" model is represented in at least three physical places: Java code, XML Schema, or a UML diagram. Should there be just one form that we use as the primary, or standard, representation? If so, which one should it be?

Believe it or not, we actually have yet another (that is, a fourth) persistent form that we use as the canonical representation: XMI (XML Metadata Interchange). Why did we need another one? We weren't exactly short of ways to represent the model persistently.

The reason we use XMI is because it is a standard for serializing metadata, which Ecore is. Also, except for the Java code, the other forms are all optional. If we were to use Java code to represent the model, we would need to introspect the whole set of Java files every time we want to reproduce the model. It's definitely not a very concise form of the model.

So, XMI does seem like a reasonable choice for the canonical form of Ecore. It's actually the closest to choosing door number 3 (UML) anyway. The problem is that every UML tool has its own persistent model format. An Ecore XMI file is a "Standard" XML serialization of the exact metadata that EMF uses.

Serialized as XMI, our purchase order model looks something like this:


xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ecore="http://www.eclipse.org/emf/2002/Ecore"
name="po" nsURI="http:///com/example/po.ecore"
nsPrefix="com.example.po">

eType="#//Item" upperBound="-1" containment="true"/>
eType="ecore:EDataType
http://www.eclipse.org/emf/2002/Ecore#//EString"/>
eType="ecore:EDataType
http://www.eclipse.org/emf/2002/Ecore#//EString"/>


eType="ecore:EDataType
http://www.eclipse.org/emf/2002/Ecore#//EString"/>
eType="ecore:EDataType
http://www.eclipse.org/emf/2002/Ecore#//EInt"/>
eType="ecore:EDataType
http://www.eclipse.org/emf/2002/Ecore#//EFloat"/>



Notice that the XML elements correspond directly to the Ecore instances back in Figure 2.4, which makes perfect sense seeing that this is a serialization of exactly those objects.

2.3.4 Java Annotations
Let's revisit the issue of defining a core model using Java interfaces. Previously we implied that when provided with ordinary Java interfaces, EMF "would" introspect them and deduce the model properties. That's not exactly the case. The truth is that given interfaces containing standard get() methods,[7] EMF "could" deduce the model attributes and references. EMF does not, however, blindly assume that every interface and method in it is part of the model. The reason for this is that the EMF generator is a code-merging generator. It generates code that not only is capable of being merged with user-written code, it's expected to be.

[7] EMF uses a subset of the JavaBean simple property accessor naming patterns [3].

Because of this, our PurchaseOrder interface isn't quite right for use as a model definition. First of all, the parts of the interface that correspond to model elements whose implementation should be generated need to be indicated. Unless explicitly marked with an @model annotation in the Javadoc comment, a method is not considered to be part of the model definition. For example, interface PurchaseOrder needs the following annotations:

/**
* @model
*/
public interface PurchaseOrder
{
/**
* @model
*/
String getShipTo();

/**
* @model
*/
String getBillTo();

/**
* @model type="Item" containment="true"
*/
List getItems();
}

Here, the @model tags identify PurhaseOrder as a modeled class, with two attributes, shipTo and billTo, and a single reference, items. Notice that both attributes, shipTo and billTo, have all their model information available through Java introspection, that is, they are simple attributes of type String. No additional model information appears after their @model tags, because only information that is different from the default needs to be specified.

There is some non-default model information needed for the items reference. Because the reference is multiplicity-many, indicated by the fact that it returns a List, we need to specify the target type of the reference. We also need to specify containment="true" to indicate that we want purchase orders to be a container for their items and serialize them as children.

Notice that the setShipTo() and setBillTo() methods are not required in the annotated interface. With the annotations present on the get() method, we don't need to include them; once we've identified the attributes (which are settable by default), the set() methods will be generated and merged into the interface if they're not already there. Whether the set() method is generated in the interface or not, both get and set implementation methods will be generated.

2.3.5 The Ecore "Big Picture"
Let's recap what we've covered so far.

Ecore, and its XMI serialization, is the center of the EMF world.

A core model can be created from any of at least three sources: a UML model, an XML Schema, or annotated Java interfaces.

Java implementation code and, optionally, other forms of the model can be generated from a core model.

We haven't talked about it yet, but there is one important advantage to using XML Schema to define a model: given the schema, instances of the model can be serialized to conform to it. Not surprisingly, in addition to simply defining the model, the XML Schema approach is also specifying something about the persistent form of the model.

One question that comes to mind is whether there are other persistent model forms possible. Couldn't we, for example, provide a relational database (RDB) Schema and produce a core model from it? Couldn't this RDB Schema also be used to specify the persistent format, similar to the way XML Schema does? The answer is, quite simply, yes. This is one type of function that EMF is intended to support, and certainly not the only kind. The "big picture" is shown in Figure 2.5.

Figure 2.5. The core model and its sources.


Generating Code
The most important benefit of EMF, as with modeling in general, is the boost in productivity that results from automatic code generation. Let's say that you've defined a model, for example the purchase order core model shown in Section 2.3.3, and are ready to turn it into Java code. What do you do now? In Chapter 4, we'll walk through this scenario, and others where you start with other forms of the model (for example, Java interfaces). For now, suffice to say that it only involves a few mouse clicks. All you need to do is create a project using the EMF Project Wizard, which automatically launches the generator, and select Generate Model Code from a menu.

2.4.1 Generated Model Classes
So what kind of code does EMF generate? The first thing to notice is that an Ecore class (that is, an EClass) actually corresponds to two things in Java: an interface and a corresponding implementation class. For example, the EClass for PurchaseOrder maps to a Java interface:

public interface PurchaseOrder ...

and a corresponding implementation class:

public class PurchaseOrderImpl extends ... implements PurchaseOrder {

This interface/implementation separation is a design choice imposed by EMF. Why do we require this? The reason is simply that we believe it's a pattern that any good model-like API would follow. For example, DOM [4] is like this and so is much of Eclipse. It's also a necessary pattern to support multiple inheritance in Java.

The next thing to notice about each generated interface is that it extends directly or indirectly from the base interface EObject like this:

public interface PurchaseOrder extends EObject {

EObject is the EMF equivalent of java.lang.Object, that is, it's the base of all modeled objects. Extending from EObject introduces three main behaviors:

eClass() returns the object's metaobject (an EClass).

eContainer() and eResource() return the object's containing object and resource.

eGet(), eSet(), eIsSet(), and eUnset() provide an API for accessing the objects reflectively.

The first and third items are interesting only if you want to generically access the objects instead of, or in addition to, using the type-safe generated accessors. We'll look at how this works in Sections 2.5.3 and 2.5.4. The second item is an integral part of the persistence API that we will describe in Section 2.5.2.

Other than that, EObject has only a few convenience methods. However, there is one more important thing to notice; EObject extends from yet another interface:

public interface EObject extends Notifier {

The Notifier interface is also quite small, but it introduces an important characteristic to every modeled object; model change notification as in the Observer Design Pattern [5]. Like object persistence, notification is an important feature of an EMF object. We'll look at EMF notification in more detail in Section 2.5.1.

Let's move on to the generated methods. The exact pattern that is used for any given feature (that is, attribute or reference) implementation depends on the type and other user-settable properties. In general, the features are implemented as you'd expect. For example, the get() method for the shipTo attribute simply returns an instance variable like this:

public String getShipTo() {
return shipTo;
}

The corresponding set() method sets the same variable, but it also sends a notification to any observers that may be interested in the state change:

public void setShipTo(String newShipTo) {
String oldShipTo = shipTo;
shipTo = newShipTo;
if (eNotificationRequired())
eNotify(new ENotificationImpl(this,
Notification.SET,
POPackage.PURCHASE_ORDER__SHIP_TO,
oldShipTo, shipTo));
}

Notice that to make this method more efficient when the object has no observers, the relatively expensive call to eNotify() is avoided by the eNotificationRequired() guard.

More complicated patterns are generated for other types of features, especially bidirectional references where referential integrity is maintained. In all cases, however, the code is generally as efficient as possible, given the intended semantic. We'll cover the complete set of generator patterns in Chapter 9.

The main message you should go away with is that the generated code is clean, simple, and efficient. EMF does not pull in large base classes, or generate inefficient code. The EMF framework is lightweight, as are the objects generated for your model. The idea is that the code that's generated should look pretty much like what you would have written, had you done it by hand. But because it's generated, you know it's correct. It's a big time saver, especially for some of the more complicated reference handshaking code, which might otherwise be fairly difficult to get right.

Before moving on, we should mention two other important classes that are generated for a model: a factory and a package. The generated factory (for example, POFactory) includes a create method for each class in the model. The EMF programming model strongly encourages, but doesn't require, the use of factories for creating objects. Instead of simply using the new operator to create a purchase order, you should do this:

PurchaseOrder aPurchaseOrder =
POFactory.eINSTANCE.createPurchaseOrder();

The generated package (for example, POPackage) provides convenient accessors for all the Ecore metadata for the model. You may already have noticed, in the code fragment on page 23, the use of POPackage.PURCHASE_ORDER__SHIP_TO, a static int constant representing the shipTo attribute. The generated package also includes convenient accessors for the EClasses, EAttributes, and EReferences. We'll look at the use of these accessors in Section 2.5.3.

2.4.2 Other Generated "Stuff"
In addition to the interfaces and classes described in the previous section, the EMF generator can optionally generate the following:

A skeleton adapter factory[8] class (for example, POAdapterFactory) for the model. This convenient base class can be used to implement adapter factories that need to create type-specific adapters. For example, a PurchaseOrderAdapter for PurchaseOrders, an ItemAdapter for Items, and so on.

[8] Adapters and adapter factories are described in Section 2.5.1.

A convenience switch class (for example, POSwitch) that implements a "switch statement"-like callback mechanism for dispatching based on an object's type (that is, its EClass). The adapter factory class, as just described, uses this switch class in its implementation.

A plug-in manifest file, so that the model can be used as an Eclipse plug-in.

An XML Schema for the model.

If all you're interested in is generating a model, then this is the end of the story. However, as we'll see in Chapters 3 and 4, the EMF generator can, using the EMF.Edit extensions to the base EMF framework, generate adapter classes that enable viewing and command-based, undoable editing of a model. It can even generate a working editor for your model. We will talk more about EMF.Edit and its capabilities in the following chapter. For now, we just stick to the basic modeling framework itself.

2.4.3 Regeneration and Merge
The EMF generator produces files that are intended to be a combination of generated pieces and hand-written pieces. You are expected to edit the generated classes to add methods and instance variables. You can always regenerate from the model as needed and your additions will be preserved during the regeneration.

EMF uses @generated markers in the Javadoc comments of generated interfaces, classes, methods, and fields to identify the generated parts. For example, getShipTo() actually looks like this:

/**
* @generated
*/
public String getShipTo() { ...

Any method that doesn't have this @generated tag (that is, anything you add by hand) will be left alone during regeneration. If you already have a method in a class that conflicts with a generated method, then your version will take precedence and the generated one will be discarded. You can, however, redirect a generated method if you want to override it but still call the generated version. If, for example, you rename the getShipTo() method with a Gen suffix:

/**
* @generated
*/
public String getShipToGen() { ...

Then if you add your own getShipTo() method without an @generated tag, the generator will, upon detecting the conflict, check for the corresponding Gen version and, if it finds one, redirect the generated method body there.

The merge behavior for other things is generally fairly reasonable. For example, you can add extra interfaces to the extends clause of a generated interface (or the implements clause of a generated class) and they will be retained during regeneration. The single extends class of a generated class, however, would be overwritten by the model's choice. We'll look at code merging in more detail in Chapter 9.

2.4.4 The Generator Model
Most of the data needed by the EMF generator is stored in the core model. As we've seen in Section 2.3.1, the classes to be generated and their names, attributes, and references are all there. There is, however, more information that needs to be provided to the generator, such as where to put the generated code and what prefix to use for the generated factory and package class names, that isn't stored in the core model. All this user-settable data also needs to be saved somewhere so that it will be available if we regenerate the model in the future.

The EMF code generator uses a generator model to store this information. Like Ecore, the generator model is itself an EMF model. Actually, a generator model provides access to all of the data needed for generation, including the Ecore part, by wrapping the corresponding core model. That is, generator model classes are Decorators [5] of Ecore classes. For example, GenClass decorates EClass, GenFeature decorates EAttribute and ERerference, and so on.

The significance of all this is that the EMF generator runs off of a generator model instead of a core model; it's actually a generator model editor.[9] When you use the generator, you'll be editing a generator model, which in turn indirectly accesses the core model from which you're generating. As you'll see in Chapter 4 when we walk through an example of using the generator, there are two model resources (files) in the project: a .ecore file and a .genmodel file. The .ecore file is an XMI serialization of the core model, as we saw in Section 2.3.3. The .genmodel file is a serialized generator model with cross-document references to the .ecore file. Figure 2.6 shows the conceptual picture.

[9] It is, in fact, an editor generated by EMF, like the ones we'll be looking at in Chapter 4 and later in the book.

Figure 2.6. The .genmodel and .ecore files.


Separating the generator model from the core model like this has the advantage that the actual Ecore metamodel can remain pure and independent of any information that is only relevant for code generation. The disadvantage of not storing all the information right in the core model is that a generator model may get out of sync if the referenced core model changes. To handle this, the generator model classes include methods to reconcile a generator model with changes to its corresponding core model. Using these methods, the two files are kept synchronized automatically by the framework and generator. Users don't need to worry about it.

No comments: