10 May 2019

Hijacking the IIB JVM to create long-running processes like QuickFixJ

Luke Staddon | Senior middleware developer

What do you actually do?

Ever since I first found myself labelled with the title “middleware developer”, I’ve noticed that people repeatedly asked me one question: What the heck’s a middleware developer?

It’s something that even comes from developers in other disciplines, as they send their messages and files off to a queue, API or directory, and then watch as the format magically changes and it appears where they’d like it to be, giving themselves a pat on the back for a job well done.

And so it is that the concept of middleware disappears into obscurity, only to surface when things fail. To paraphrase an episode of Futurama I once saw – when you do things right, people won’t be sure you’ve done anything at all.

In this blog, I hope to surface one of the more interesting technical challenges I’ve come across whilst on the job, and to allow the role of middleware developer to emerge from its dark backroom and into the light - if only for its 15 seconds.

The challenge: begin at the beginning and go on till you come to the end; then stop

The technical challenge at hand is to cater for FIX connections with our IIB-based ESB. Copious acronym usage aside that sounds fairly innocuous.

Let’s unpick a few of the acronyms before we get rolling:

ESB (Enterprise Service Bus)
A design pattern based around decoupling source systems and
their targets. Responsible for taking data from one place, transforming it in
some way, and then sending that data on to another place
IIB (IBM Integration Bus)
A tool that makes the process of standing up ESB programs
or “flows” easier. Provides the ability to code Java, C#, ESQL and XSLT,
amongst other things
FIX (Financial Information Exchange)
A message format for describing trades.

With these terms defined the challenge seems to be business as usual: take data in from various source systems, turn them into FIX using IIB, send them on, then do the reverse with the FIX responses. Piece of cake, right?

Wrong.

The conundrum lies in the way that both IIB and FIX engines work.

IIB is transaction-based and transitory in nature. A flow begins on receipt of the data, it runs to completion, and then disappears. No data is preserved between flows. FIX however, requires that a socket connection be held open throughout the duration of trading hours.

Therefore, the riddle we must solve is how to create a long-running task in something which is fundamentally short-lived.

Starting the long running process – what road do I take?

The root of our issue is that IIB flows and their related data only exist for the lifetime of their transaction. So is there anything within IIB that we can tap into which runs for longer than the lifetime of a flow?

Fortunately for us there is: the JVM.

When standing up an IIB node your applications are divided up between several integration servers (more seasoned integrators may know these as execution groups). Each integration server comes bundled with its own set of allocated resources, including a JVM. This JVM persists for the lifetime of the integration server, so if you kill the integration server, the JVM goes with it.

The persistence of the JVM is key to solving our puzzle. In normal IIB operation any Java classes are cleaned up with everything else when the flow finishes running. We need to find a way to disconnect our Java class from the garbage collection of IIB.

Enter the factory pattern. Here’s an example of our IIB Java class, as attached to a Java Compute node. We’ll build this out as we progress through the article.

			public class CALL_FIX_ENGINE_SF_CallFIX extends MbJavaComputeNode implements IFixCallback {
    //***
    //*** UDP variables.
    //***
    protected String engineId = (String)getUserDefinedAttribute("FixEngineId");

    //***
    //*** Main evaluation method, called as an entry point to node.
    //***
    public void evaluate(MbMessageAssembly inAssembly) throws MbException {
        // Set up the usual assemblies for our output message tree.

        MbOutputTerminal out = getOutputTerminal("out");
        MbMessage inMessage = inAssembly.getMessage();
        MbMessage outMessage = new MbMessage();
        MbMessageAssembly outAssembly = new MbMessageAssembly(inAssembly,         outMessage);      

        this.copyMessageHeaders(inMessage, outMessage);
        outMessage = this.setupOutputMessage(outMessage);

        // Obtain our engine from the engine factory.
        FixEngine engine = FixEngineFactory.getInst().getEngine(engineId, 60);

        if (engine == null) {
                    // Throw exception here.
        }

        // Try to assemble our message class to be sent, based on the input tree.
        FixMessage fixMsg = null;

        try {
                fixMsg = new FixMessage(inMessage);
        }

        catch (FixException e) {
                    // Throw exception here.
        }

        // Attempt to send a test message across the FIX connection.
        FixResponse resp = null;

        try {
                resp = engine.sendMessage(fixMsg, this);
        }
        catch (SessionNotFound e) {
                    // Throw exception here.
        }

        // Assemble our response message.
        outMessage = this.populateResponse(outMessage, engine, resp, fixMsg);

        // Pass our message to the output terminal.
        out.propagate(outAssembly);

    }

}
		

This should look fairly familiar if you’ve ever dabbled in IIB Java. We’ve got the standard “extends MbJavaComputeNode” at the top, the “evaluate” method as called by IIB when the node is reached within the flow, and some manipulation of the input/output message trees.

The important part of this code snippet is the call to “FixEngineFactory”. This class is responsible for fetching a running instance of a FIX engine, based off of the popular QuickFixJ framework, and returning it to our IIB Java method. Note that the “engineId” parameter that we pass is used to determine which engine instance we grab.

Because the FixEngineFactory is based off of a singleton static instance, the second we step in to the class, we are freed from IIB’s garbage detection. The implication of this is that when the flow finishes, this class and anything it references won’t be cleaned up.

Thus, we have our long-running process.

Jumping between IIB flows – curiouser and curiouser

Our mystery FixEngineFactory class is somewhat of a black box at the moment, so let’s take a look at how it works under the covers.

When starting an engine up, a number of parameters are required by QuickFixJ which denote things like what to connect to, which FIX version to use, and what type of security to use. These values are stored within a configuration file which looks something like this (note that some fields have been given dummy values).

			[DEFAULT]
ConnectionType=initiator
LogonTimeout=30
ReconnectInterval=30
ResetOnLogon=Y
FileStorePath=/var/esb/quickfix/config/
FileLogPath=/var/esb/quickfix/logs/
AppDataDictionary=/var/esb/quickfix/config/FIX50SP2.xml
TransportDictionary=/var/esb/quickfix/config/FIXT11.xml

[SESSION]
BeginString=FIXT.1.1
SenderCompID=SENDER_UAT
TargetCompID=TARGET_UAT
StartDay=sunday
EndDay=friday
StartTime=00:00:00
EndTime=23:59:59
HeartBtInt=30
CheckLatency=N
SocketConnectPort=8228
SocketConnectHost=128.0.0.1
UseDataDictionary=Y
DefaultApplVerID=FIX.5.0SP2
SocketUseSSL=Y
EnableProtocols=TLSv1.2
DefaultApplVerID=FIX.5.0SP2
SocketKeyStore=/var/esb/quickfix/config/myjks.jks
SocketKeyStorePassword=SksPN0tR3aLh4yXUDZAPug
		

Our FixEngineFactory provides a means of associating the location of this configuration file with the “engineId” that is passed to it. Due to the need for reliability, we store this association in a database, but you could just keep it as a hashmap in memory if needs be.

Whenever a new engine is added, you simply need to create the new QuickFixJ configuration file, then add the link between the configuration file path and the engine ID to your FixEngineFactory class.

Getting hold of these configuration files is all well and good, but we don’t want each call to our interface to start a new FIX engine. This means that we also need a similar association between the engine ID and the Java class that wraps a running FIX engine - another hashmap.

The result of these look-ups is that whenever our FixEngineFactory “getEngine()” method is called, the factory checks for running engines and returns a reference to them if they exist. It then looks for a configuration entry and starts the engine up if it’s not currently running. Finally, it passes back a meaningful error if no configuration path exists for the given engine ID. Lovely. The IIB Java class can now directly interact with the engine to its heart’s content.

Getting back in to IIB – never once considering how in the world she was to get out again

So, we’ve freed ourselves from the shackles of IIB’s garbage collection, and it may now seem like we’re on the home straight. After all, we only need to get responses back and send them on their merry way to the target system.

Oh how wrong that is. FIX is a completely asynchronous process: you may get one response back, you may get none, or you may get many. Even the time delay on those responses aren’t guaranteed and can vary from setup to setup.

The impact of this is that the class which receives our FIX responses is completely disconnected from anything which relates to IIB. You can’t access the class, you can’t access the flows, and you can’t interact with the message trees. Even if we could, there would be no way of knowing which flow instance the response should relate to, as the engine deals with all requests. So, this begs the question: How in the world do we get back?

We actually have two solutions here, and which one is used is selected on a per-engine basis.The first approach is there to cater for interfaces where we can guarantee one response per request and when timeliness of the response matters. In such a scenario, we need to create a hook between the engine and the class which originally invoked it.

To do this, we create a simple interface called IFixCallback. It looks something like this:

			public interface IFixCallback { 

// Called when a message is sent by our FIX engine.
public void onMessageSend(int msgSeqNum, Message fixMsg);    

// Called when a message is received by our FIX engine.     
public void onMessageReceive(int msgSeqNum, Message fixMsg);}
		

If we have our IIB Java class implement this interface we can guarantee a core set of methods once they reach the FIX engine. The more astute amongst you may have noticed that in the original code snippet, the class “CALL_FIX_ENGINE_SF_CallFIX” passes itself to the “sendMessage()” call of the FIX engine.

By doing this, the FIX engine is able to cache the IFixCallable instance against a unique ID from within the FIX message that is being sent. When a response comes in, the FIX engine will check the same ID from the response against the set of cached IDs and retrieve the IFixCallable reference.

As IFixCallable exposes a “onMessageReceive()” method, the engine can now call back in to the flow via this method, passing the contents of the response message that it received.

There are also a few gotchas hidden in here. We don’t want flows hanging around indefinitely, so a flow can timeout causing the IFixCallable class to be deleted, so the engine must also clean up any null references whenever it receives a response.

In addition, the FIX engine on the other side could send over a message which doesn’t correspond to a request (such as a disconnect message if something goes down). In such a case, the engine must surface the response as an exception when it cannot find a related unique ID in its cache.

Our second solution exists to cater for truly asynchronous, multi-response FIX interactions.In this scenario we can no longer rely on the calling IIB Java class to be around once the response comes back.

Nor can we know to clean up a reference to that class just because we’ve received a response, as there may be more to come.Whilst not as elegant as the synchronous approach, in this setup we can simply use the MQ client bindings to put any responses straight on to an MQ queue and then have an entirely different flow read from that queue.

You could use a web API or drop folder here too, whichever is more appropriate for your setup. As a small aside, we also piggy-back off of the QuickFixJ configuration file and are able to inject our own parameters for things like details of the queue manager to be used; this means that all of our configuration can be held in a single place.

Using a few look-up tables, our response handling flow can now parse the response message and check it against a cache in order to determine where the response should go. Here are some example tables that we’re using to perform that look-up:

When a response comes in we obtain its message type, target ID and sender ID. We use these to look up a “Tag” value in the first database. So, if we received a message type “j” we would obtain a tag value of “379”.

This tag value tells us which field to pull out of the response message.When a corresponding request message originally came in we performed a similar set of steps, taking the sender ID, target ID and message type, then using that to obtain the value of a tag. For example, we could have obtained tag “571” for a message type of “AE”.

The value of field 571 from the request message was cached in the second database table upon arrival, along with details of where to send the response. In our case, we obtain this from something called “ApplicationContext”, but you could equally have an MQ queue or URL in here.

When the response comes back and we’ve obtained the value of the appropriate tag, we can query the second database table in search of a “CorrelationValue” which matches that tag value. When we find a match, we can return the “ApplicationContext” for that entry which tells us where to route it.

As with the synchronous approach, there are a number of other things which need to be catered for here, such as cleaning out the database caches so that they don’t fill up (this is why we have the timestamps there) or dealing with responses that aren’t tied to cache entries.

Extensions and addendums - what an idea. A crazy, mad, wonderful idea

There’s a whole heap of functionality that I’ve not been able to touch on during this article: the engine can start itself up after network outages; the addition of a new FIX engine takes a matter of minutes with very few changes; we can validate both data structure and content for any FIX version almost out of the box; trace can be toggled on the fly (even in production); and we can route copies of the messages at various points in the flow, which allows for easy generation of reports.

In the day-to-day operation of the bank here at Investec, our whole MiFID2 trade reporting framework is built off the back of this interface. To date nothing has gone wrong with it and it has managed to self-correct and continue running through several major network outages with no human intervention.

Truly it’s as if we never did anything at all – it just works. While this blog only really scratches the surface, I do hope that I’ve helped to shine a light on the kinds of challenges that we wrestle with as middleware developers.

Next time you kick off a call to another application and it all comes back seamlessly, spare a little thought for us middleware folk - we were probably involved at some point along the chain.

Want to work on this cool stuff?

Get in touch

If you want to join a dynamic team, drop us a line.

Email us

Visit our jobs portal

To browse our latest vacancies across Investec, visit our jobs board.

View job vacancies

More from the Investec Engineering team

UI testing done right

01 May 2019

It's possible that you may have written user interface tests before. Tests that spin up a version of your application, and progressively click and tap their way through, identifying issues that may have been introduced into the codebase.

5 min read

How we use Async Iterators to simplify streaming data

20 May 2019

We're using React and TypeScript on the front-end in the Banking for Business team in Investec UK. We've been striving to keep our code as simple as possible which motivated our choice of MobX on the front-end over Redux.

10 min read