Hibernate Collection-Mapping Collection

One of the main concerns of a good ORM, in addition to identity and scope handling, is collection mapping. Hibernate has a lot of ability in this area. In this blog I’ll show a variety of collection-mappings.

The collection we’ll map is very simple;

class Child {
  String name;
}
class Parent {
  String name;
  Set children;
}

That’s all there is to it (Well, actually I left out quiet a bit of code, but that’s the usual plumbing; ids, getters and setters. You can find it all in the attached zip file hibernate-test.zip.

Many-to-many

We’ll start with an exotic mapping, but one that hibernate prefers:

Hibernate will use three tables, Parent, Child and children, to store the data and relations. The children table used to store the one-to-many association in a many-to-many fashion. Notice the unique=”true” in the many-to-many element, signaling that this is not a “real” many-to-many.
We’ll run the same simple test scenario against all mappings:

save parent with two children.
look it up in a new session.
delete a child.

If we run this scenario against the mapping we’ll see the following SQL:

++
insert into Parent (name, id) values (?, null)
insert into Child (name, id) values (?, null)
insert into Child (name, id) values (?, null)
insert into children (id, elt) values (?, ?)
insert into children (id, elt) values (?, ?)
++
select parent0_.id as id1_, parent0_.name as name0_1_, children1_.id as id3_, child2_.id as elt3_, child2_.id as id0_, child2_.name as name2_0_ from Parent parent0_ left outer join children children1_ on parent0_.id=children1_.id left outer join Child child2_ on children1_.elt=child2_.id where parent0_.id=?
++
delete from children where id=? and elt=?
++

The inserting now takes two statements for each child. This seems a little sub optimal.

One-to-many

Let’s try the following mapping, that uses only two tables:

This leads to a far more natural relational model, with a foreign key from the Child to the Parent table. Let’s look at the logging:

++
insert into Parent (name, id) values (?, null)
insert into Child (name, id) values (?, null)
insert into Child (name, id) values (?, null)
update Child set parent=? where id=?
update Child set parent=? where id=?
++
select parent0_.id as id1_, parent0_.name as name3_1_, children1_.parent as parent3_, children1_.id as id3_, children1_.id as id0_, children1_.name as name4_0_ from Parent parent0_ left outer join Child children1_ on parent0_.id=children1_.parent where parent0_.id=?
++
update Child set parent=null where parent=? and id=?
++

Mmm, still four statements to insert the two children. There are an insert and an update statement on the same row for each child. This seems weird, but if you compare to the logging from the other mapping, you’ll see that in essence the same steps are taken; First all entities are inserted and then the association is made.

One-to-many-not-null

It gets even worse if we change the mapping to:

Now the logging shows:

++
insert into Parent (name, id) values (?, null)
insert into Child (name, parent, id) values (?, ?, null)
insert into Child (name, parent, id) values (?, ?, null)
update Child set parent=? where id=?
update Child set parent=? where id=?
++
select parent0_.id as id1_, parent0_.name as name5_1_, children1_.parent as parent3_, children1_.id as id3_, children1_.id as id0_, children1_.name as name6_0_ from Parent parent0_ left outer join Child children1_ on parent0_.id=children1_.parent where parent0_.id=?
++
++

Still four statements! If you’d switch on logging on org.hibernate.type, you’d see that the two updates are really redundant, since the parent-id’s have already been filled in correctly by the insert statements.
Furthermore, the delete has no effect at all! We’ll come back to that later.
There are two solutions to make hibernate use only one statement to insert the child and the relation:

Use components.
Map the relation bidirectional and inverse.

Component

Mapping the children as components has consequences for the usages of your classes. The objects will lose their identity as far as the relational model concerns. This means no other object (except the parent) can have a reference to the child, since the child has no ID-column in the relational model, hence no foreign keys to such a column can be made. The child can have one-to-one or many-to-one associations to other entities though. Furthermore the life-cycle of the children will be linked to the life-cycle of the parent.
The mapping now looks like this:

Let’s look at the logging:

++
insert into Parent (name, id) values (?, null)
insert into children (parent, name) values (?, ?)
insert into children (parent, name) values (?, ?)
++
select parent0_.id as id0_, parent0_.name as name7_0_, children1_.parent as parent2_, children1_.name as name2_ from Parent parent0_ left outer join children children1_ on parent0_.id=children1_.parent where parent0_.id=?
++
delete from children where parent=? and name=?
++

Now there are only three statements to insert the Parent and the children. Notice that with this mapping, in contrast with the others, the child will actually be removed from the persistent storage when it is removed from the collection.

Inverse

The restrictions on the usage might be too much for most applications. If the child must be a full-blown entity, the relation can be mapped bi-directional and inverse. This will change the domain model, adding an association from the Child to the parent.

class Child {
  String name;
  Parent parent;
}
class Parent {
  String name;
  Set children;
}

The mapping will now look as follows:

The property parent on the Child-entity is now mapped as a many-to-one association and the children collection on the Parent-entity is mapped as inverse=”true”. This configuration option is quite hard to understand. In principle it means that the association is mapped bi-directionally and that the other side of the association is leading. In effect the association is now part of the child entity in stead of the parent entity. The logging now comes out like this:

++
insert into Parent (name, id) values (?, null)
insert into Child (name, parent, id) values (?, ?, null)
insert into Child (name, parent, id) values (?, ?, null)
++
select parent0_.id as id1_, parent0_.name as name9_1_, children1_.parent as parent3_, children1_.id as id3_, children1_.id as id0_, children1_.name as name10_0_, children1_.parent as parent10_0_ from Parent parent0_ left outer join Child children1_ on parent0_.id=children1_.parent where parent0_.id=?
++
update Child set name=?, parent=? Where id=?
++

Since the association is now part of the child entity, it will be inserted along with the child entity and only one insert (or update) is needed.

All-Delete-Orphan

Now, none of the above mappings that mapped Child as a separate entity deleted the children when they were removed from the set. With the one-to-many-not-null mapping, deleting the child didn’t even result in a DML-action. Why is this? The answer lies in the cascade setting of the association, the cascade=”save-update” setting only effects the children when the parent is saved or updated. Changing this to cascade=”all-delete-orphan” will delete the children when the association is broken.
For the many-to-many mapping the logging will change to:

++
insert into Parent (name, id) values (?, null)
insert into Child (name, id) values (?, null)
insert into Child (name, id) values (?, null)
insert into children (id, elt) values (?, ?)
insert into children (id, elt) values (?, ?)
++
select parent0_.id as id1_, parent0_.name as name11_1_, children1_.id as id3_, child2_.id as elt3_, child2_.id as id0_, child2_.name as name13_0_ from Parent parent0_ left outer join children children1_ on parent0_.id=children1_.id left outer join Child child2_ on children1_.elt=child2_.id where parent0_.id=?
++
delete from children where id=? and elt=?
delete from Child where id=?
++

So, as expected, two deletes are used to delete the association and the child on removal from the set.
For the one-to-many mapping, the situation is hardly better.

++
insert into Parent (name, id) values (?, null)
insert into Child (name, id) values (?, null)
insert into Child (name, id) values (?, null)
update Child set parent=? where id=?
update Child set parent=? where id=?
++
select parent0_.id as id1_, parent0_.name as name14_1_, children1_.parent as parent3_, children1_.id as id3_, children1_.id as id0_, children1_.name as name15_0_ from Parent parent0_ left outer join Child children1_ on parent0_.id=children1_.parent where parent0_.id=?
++
update Child set parent=null where parent=? and id=?
delete from Child where id=?
++

The update (before the delete) is used to delete the association, which seems quite redundant, since the same row will be deleted with the next statement.
When the one-to-many-not-null mapping is used, the redundant update is removed:

++
insert into Parent (name, id) values (?, null)
insert into Child (name, parent, id) values (?, ?, null)
insert into Child (name, parent, id) values (?, ?, null)
update Child set parent=? where id=?
update Child set parent=? where id=?
++
select parent0_.id as id1_, parent0_.name as name16_1_, children1_.parent as parent3_, children1_.id as id3_, children1_.id as id0_, children1_.name as name17_0_ from Parent parent0_ left outer join Child children1_ on parent0_.id=children1_.parent where parent0_.id=?
++
delete from Child where id=?
++

This is also the reason why the association could not be deleted without also removing the child entity; the column with the foreign key cannot be filled with a NULL because of the constraint, thus the association can only be deleted by deleting the child as well.
The last mapping is the one-to-many-inverse mapping:

++
insert into Parent (name, id) values (?, null)
insert into Child (name, parent, id) values (?, ?, null)
insert into Child (name, parent, id) values (?, ?, null)
++
select parent0_.id as id1_, parent0_.name as name18_1_, children1_.parent as parent3_, children1_.id as id3_, children1_.id as id0_, children1_.name as name19_0_, children1_.parent as parent19_0_ from Parent parent0_ left outer join Child children1_ on parent0_.id=children1_.parent where parent0_.id=?
++
delete from Child where id=?
++

In effect this mapping is exactly the same as the component mapping, but with the Child as an entity.

Conclusion

We’ve seen 9 different hibernate mappings for a simple one-to-many association and focused on the consequences for the DML-statements issued. There are some inconsistent configurations that can be harmful for an application (like the one-to-many-not-null association that doesn’t allow child removal).
I think, in general, components should be used as much as possible when using Hibernate. They remove the necessity for synthetic keys and columns and the performance of the DML-statements and the session cache are better, since the components are not managed (separately) by the session.
For situations where components cannot be used (because the synthetic keys are necessary for other relations), an inverse mapping is preferable. Be sure to synchronize the object model correctly, by setting the “parent” association to null when removing a child from the collection.
A special situation when a join table (fake many-to-many association) should be considered is when a one-to-many mapping would result in a foreign-key constraint to the primary-key on the same table. This happens when an entity holds a collection of entities of the same type.
Of course, there are situations where other mapping should be applied. When mapping to a legacy schema, everything that works is allowed.
In another blog I will focus on how components can be used.