We are currently working on a project that uses an Oracle 9 database, Hibernate as ORM tool and Spring for our dependency management. Our application processes millions of records per month which means that some tables will grow with approximately 15 million records each month. For maintenance and performance reasons it is a must to partition these tables. But does partitioning fit into Hibernate, and if so, how?
First we will explain a little about the basics of the application we are discussing. Then we will discuss which partitioning strategy we chose and what needed to change in our application. Finally we will point a bug we encountered in Hibernate while working on partitioning.
Application basics
Our tables have sequence values as primary keys as the functional keys are not always clear or consist of multiple columns with dependencies between them. The classes and their mapped tables are mostly parent – child relationships. Navigating from parent to child with Hibernate is simple. Just call parent.getChildren() and Hibernate will load the children on the join column which in our case is the sequence value:
select children0_.PARENT_ID as PARENT2_1_, children0_.ID as ID1_, children0_.ID as ID1_0_, children0_.PARENT_ID as PARENT2_1_0_ from CHILD children0_ where children0_.PARENT_ID=?
Partition strategy
There are multiple partitioning strategies for Oracle, the two most frequently used are RANGE partitioning and HASH partitioning. Together with the DBA we chose the RANGE partitioning strategy for several reasons:
- We logically process per month so the partitioning will be on year-month. This way Oracle almost always deals with one partition at a time, which means it only needs to load one partition into memory. If we chose HASH partitioning the data would be scattered over the different partitions.
- Because we logically process data per month, it can now also be backed up per month. After some years, it can be updated five years after the month it was inserted, that partition could be taken offline.
With range partitioning, the database will put data in the partition based on the value of a column in the table. In order to profit from partitioning you need to include the column on which you partition in your WHERE clause of your sql query. So this is where the problem lies: as our FOREIGN KEY always lies on the PRIMARY KEY of the PARENT table. In our case this is a sequence generated value. As we just stated we wish to partition on a date and not on a sequence value. In order to support this we added a column PARTITION_VALUE in all the tables that needs to be partitioned. This leads to the main point of this article:
How do we manipulate the Hibernate configuration of our application, to make Hibernate include the partition-value in all where clauses of queries on tables that are partitioned?
Most tables that need to be partitioned are grouped around a single table (let’s call this table A). The value used for partitioning doesn’t actually originate from this table, but most domain-logic queries (queries that are executed by the application directly to obtain data that needs to be processed), are executed on this table.
When executing a domain-logic query with Hibernate, adding the partition-value query is easy. A restriction on the partition-value property of the domain object can be added using criteria or HQL. Hibernate might need to fetch extra data as the application navigates the associations of the entities returned by the query. The queries that Hibernates executes is based on a foreign key relation of the dependent table (say B) to the table that the loaded entity is mapped to. The basic query is:
select … from B b0_ where b0_.A_ID=?
The column B.A_ID has a foreign key restriction on the column A.ID. To benefit from partitioning we need the query to look like:
select … from B b0_ where b0_.A_ID=? and b0_.PARTITION_VALUE = ?
To enforce this, we choose to add the column A.PARTITION_VALUE to the Hibernate Id of the entity mapped to table A. Hibernate will now use both columns to identify the associated rows in related tables, resulting in queries like the one above.
We now need a composite-id that consists of a sequence value and the PARTITION_VALUE column. For some reason Hibernate does not support a generator for generating composite-id’s at save time, forcing the application to set the id value of each entity before saving it to the Hibernate session (this is similar to the “assigned” generator strategy for single valued ids). We do not want to reinvent the wheel so we reused Hibernate’s SequenceHiLoGenerator (which reduces the number of sequence.nextval calls to Oracle). Next we created a Spring FactoryBean that uses the SequenceHiLoGenerator to generate the sequence values. In the DAO that is used to save the Parent the id property is set with the new sequence value retrieved from the SequenceHiLoGenerator. The Spring bean which creates an instance of the SequenceHiLoGenerator looks as follows:
public class SequenceFactory extends AbstractFactoryBean { private String dialectName; private Properties properties; protected Dialect dialect; public Class getObjectType() { return SequenceHiLoGenerator.class; } public void afterPropertiesSet() throws Exception { try { getDialectFromName(); } catch (ClassCastException e) { throw new BeanInitializationException("The given class '" + dialectName + "' does not extend 'org.hibernate.dialect.Dialect'"); } catch (Exception e) { throw new BeanInitializationException("The given class '" + dialectName + "' does not exist or cannot be instantiated",e); } super.afterPropertiesSet(); } protected Object createInstance() throws Exception { SequenceHiLoGenerator generator = createNewSequenceHiLoGenerator(); generator.configure(Hibernate.LONG, properties, getDialectFromName()); return generator; } private Dialect getDialectFromName() throws InstantiationException, IllegalAccessException, ClassNotFoundException { if (dialect == null) { dialect = (Dialect) Class.forName(dialectName).newInstance(); } return dialect; } /** * created for test purposes. */ protected SequenceHiLoGenerator createNewSequenceHiLoGenerator() { return new SequenceHiLoGenerator(); } public void setDialectName(String dialectName) { this.dialectName = dialectName; } public void setProperties(Properties properties) { this.properties = properties; } }
This bean is wired into the DAO as follows:
999 UKC_PARENT_SEQ
As the SequenceFactory extends the AbstractFactoryBean Spring will use the method createInstance to create instances of the SequenceHiLoGenerator. The instance of the SequenceHiLoGenerator is then injected into the dao and is used in the save method to generate the new sequence value. The Hiberante mapping now looks like this:
999 CHILD_SEQ
As you can see there is a class ParentId that holds the composite-id of Parent. The Child class now refers back to the Parent using the composite-id columns. If we look to the sql that Hibernate generates it takes the PARTITION_VALUE column in the where clause:
select children0_.PARENT_ID as PARENT2_1_, children0_.PARTITION_VALUE as PARTITION3_1_, children0_.ID as ID1_, children0_.ID as ID1_0_, children0_.PARENT_ID as PARENT2_1_0_, children0_.PARTITION_VALUE as PARTITION3_1_0_ from CHILD children0_ where children0_.PARENT_ID=? and children0_.PARTITION_VALUE=?
Here you have it! Partitioning in Oracle with Hibernate.
Bug in Hibernate
There is one more thing we want to discuss. In our application in one case we used HQL to retrieve some entities. We found that using HQL queries a restriction on a composite-id does not work. Given the previous Parent Child example, executing the following HQL: "from Child c where c.parent = :parent", Hibernate produces this SQL:
select child0_.ID as ID1_, child0_.PARENT_ID as PARENT2_1_, child0_.PARTITION_VALUE as PARTITION3_1_ from CHILD child0_ where ( child0_.PARENT_ID, child0_.PARTITION_VALUE )=?
This obviously does not work. I wonder if they say this is a feature .. 🙂
Maarten Winkels
Lars Vonk