Pentaho Kettle and Integration Testing
Recently for our project we started using Kettle for ETL purposes. Pentaho Kettle provides UI based tool. Initially it takes quite some time to get used to Kettle UI as it becomes difficult to visualize how to orchestrate available Kettle Steps to solve a business problem. As you know how to use it, it’s all about drag and drop a step and configuring it with available UI. With our experience we observed that it’s pretty easy to design 90% stuff easily but rest 10% involves a lot of research and at the end involved some hacks which we never liked.
As we created Kettle transformations and jobs, we were not very sure about its testability part. After some research we found that we can use BlackBoxTests class available in Kettle distribution for test purposes. The fundamentals of it are quite simple. You pass some inputs and define the expected file and in the output you get actual output file after executing Kettle transformation. BlackBoxTests asserts if expected file matches with actual file. So for instance if you have a Sample.ktr under test, BlackBoxTests will expect Sample.expected.<txt/xml/csv> as an expected file and Sample.actual.<txt/xml/csv> as actual file to make it work. It tests all available transformations under a folder and subfolders.
By definition Kettle uses kettle.properties (available under $HOME/.kettle folder) which creates complications from testing point of view. However you should be able to test a Kettle transformation in isolation. That’s why instead of using kettle.properties, we planned to use application specific property file to pass it to TransMeta class with available injectVariables() method. We were kind of successful but later found out that Kettle still uses kettle.properties even if we use a different property file.
After a lot of debugging we found out the culprit. BlackBoxTests uses EnvUtil.environmentInit() and does all the magic. It loads the kettle.properties by default and to our horror loads into java.lang.System.
We quickly got rid of using EnvUtil but found again that it’s not enough to pass the properties from outside. It works for the current transformation but somehow Kettle is not able to pass these properties to embedded sub-transformations. It worked earlier just because EnvUtil.environmentInit() loads properties into java.lang.System.
Overall, though we were finally able to do the testing with BlackBoxTests in isolation with some hacks, we concluded that the Kettle code is not designed to be testable and it can be termed as legacy code in Michael Feather’s language.