Modern AI for the JVM – 0.0.2

18 Jul, 2023
Xebia Background Header Wave

Last month, our team introduced a new project, The project aims to offer the most convenient interface to modern AI techniques, including Large Language Models and image generation, enabling developers to integrate those as part of larger systems and applications.

After the release of 0.0.1, we've developed some proof of concept applications internally and for our clients. These experiences have shown us what was working in the current libraries, what needed some re-organization, and what was missing to create powerful AI-based services. The result is the new 0.0.2 release of, which you can get from Maven Central.

First-class support for Java

Initially, we targeted Kotlin and Scala to offer idiomatic interfaces for both. In this release, we take a step further, making easily consumable from Java, which is undoubtedly the biggest language in the JVM space.

A library like benefits greatly from concurrent work. Among the many choices for concurrent and reactive abstractions in the JVM, we've decided to use CompletableFuture. That abstraction has been readily available in the JVM since version 8, and most libraries provide excellent interoperability with it (for example, in kotlinx.coroutines or Cats Effect).

The entry point to from Java is an AIScope, which initializes all the required resources to access AI services. Within that scope, you can call functions like prompt, which queries a Large Language Model. To link the steps together, you have to use CompletableFuture's methods, like thenAccept.

try (AIScope scope = new AIScope()) {
    scope.prompt("Give me a selection of books about " + topic, Book.class)

You may have noticed that in the prompt call, we indicate the type of result we want to obtain. If your class is a simple data transfer object, like the one below, you need no further annotations or preparations to use it.

public class Book {
    @NotNull public String title;
    @NotNull public String author;
    @NotNull public int year;
    @NotNull public String genre;

    public String toString() { ... }

Our roadmap includes developing for Java, Scala, and Kotlin on equal footing. We've ported all of our examples to the three languages; feel free to have a look if you are interested in how the functionality is consumed from all of them.

One for all, all for one

Our initial implementation relied on a Kotlin core consumed by Scala. This proved feasible but hard to maintain in the long run. Furthermore, providing an idiomatic Java API is hard when the main implementation uses Kotlin features such as extension methods and suspensions, which are unavailable in Java. For version 0.0.2, we've split our core functionality into a new xef-core library, with an eye for interoperability.

The main outcome from the developer's point of view is a smaller dependency footprint. no longer depends on KotlinX serialization or coroutines libraries, unless you use the Kotlin version. This change should make integrating into a wider variety of applications easier.

As a witness of this new ability to collaborate between JVM languages, we are quickly progressing towards a version of that integrates with Cats Effect. Meow! 😺

Local models

You're no longer limited to a cloud service -- OpenAI or HuggingFace endpoints -- if you want to use We've integrated GPT4All, a library that runs LLM models locally on your computer. The list of currently supported models already contains more than a dozen.

The following code downloads one of the available models, and then uses it for prompting. Since models are quite big, typically in the realm of gigabytes, we require a stable path to save it, to avoid re-downloads.

val MODEL_URL = ""
val DOWNLOAD_PATH = "./ggml-gpt4all-j-v1.3-groovy.bin"

ai {
  GPT4All(MODEL_URL, DOWNLOAD_PATH).use { gpt4All ->
    gpt4All.promptMessage("what are the best known songs from The Beatles?")

The communication with GPT4All is done through Java Native Access. As of the time of this post, we've checked that everything works correctly on MacOS and Linux platforms.

The integration of GPT4All marks the first milestone for local models in Our roadmap includes looking at other technologies in this space, such as transformers, and figuring out the best processes for deploying and managing these models.

Tree of Thoughts (ToT) is only as useful as the pipelines you can build with it. As part of the development of 0.2, we've implemented a variant of Tree of Thoughts, a technique that improves the results by better guiding the LLM. The example implementation in Kotlin uses this technique in the context of checking a piece of code. In this example implementation, we use the LLM to reason similarly to humans when attempting to solve a problem. We generate guidance and self-criticism for each proposed solution until an acceptable solution to a problem has been found, backtracking and repeating different solutions for a fixed number of max iterations. We plan on researching and developing similar patterns to Tree of Thought and will make them available in Xef as they prove useful and mature.

Keeping up with OpenAI

One of the interesting features of the first release of was the integration of serialization to produce values of your own types directly. We then reported our investigation of more complex techniques to improve this aspect, which we consider essential for tight integration between AI and the rest of the data sources.

A few weeks later, OpenAI announced the availability of its "function calls" mechanism. By setting it up correctly, one can instruct OpenAI to respond following a particular JSON schema, which is exactly the functionality required for serialization! In the 0.2 release, we've integrated this feature, resulting in better prompt outcomes.


A smaller feature that can make a big impact depending on the scenario is the ability to stream responses instead of being forced to consume them in full. For example, if you want OpenAI to stream responses, you can now get chunks of the chat response as characters you can handle without awaiting and blocking for the full response.


Head over to, and @Xef_ai for the best source of documentation and news. To get a broader overview of the project's goals, this presentation by Raúl Raja introduces the main features and some interesting use cases.

Explore related posts