Why you should always check your requirements

Many years ago I worked on a refresh of a simulation test tool. The tool presented a map display and you could place events on the map (e.g. a boat appears at location 50.1,97.3 at time 13:00 and reached 58.3,96.45 at 15:00) to construct test scenarios. The idea is you would build a set of scenarios to confirm a system correctly responded to events (system test tool). 

The point of the refresh was to take the 'engine' of the tool (written in C++) and replace the ancient UI with a internal pluggable platform that embedded a Map interface into Eclipse Rich Client Program (RCP). We ported the requirements from the original tool and one of those was:
The tool can open Scenario XYZ

We obtained a copy of the scenario and added it to our weekly test pack runs, but we quickly ran into problems.

Getting Scenario XYZ To Run 

Scenario XYZ had thousands of events and the first few times we tried to open it the entire UI would crash.

Single Threaded Application


Each service was designed to be triggered via the UI Thread and so map updates were locked into a single thread. The engine ran within it's own process so UI events or the engine could trigger map updates.

While we weren't the first team to use multiple threads, the size of the scenario meant engine updates could take significant time and so UI events would often conflict with engine updates and causing the map display to destroy objects it was using and crash.

I implemented the Active Object Design pattern all events were stored within a Thread safe list. A scheduled service was run that pulled all data from the list and then interfaced with the map. This ensured only a single thread was interacting with the map.

Simplified Wrapper and Interface Generator

Now the application no longer crashed when we opened or ran scenarios. However the UI became increasingly slow to respond as the size of the scenario increased. The issue was SWIG, which we used to map C++ objects held in the 'engine' into Java. Calling a C++ method from Java was incredibly slow, rendering the events on the map meant it had to call several C++ methods for each event and for Scenario XYZ this could take multiple seconds. 


After weeks of performance optimisations, we decided to build a cache of the objects on each side of the interface. Each cache used plain language objects which contained fields covering all information that side used. 

Events from either side would use the cache to build a Data Transfer Object (DTO) which could be used to correlate the event data to the object held in the opposing sides cache.

This meant all interactions with the SWIG interface happened within a single service and also defeated the purpose of using SWIG over the Java Native Interface.

The end result is no part of the application needed to call through the SWIG interface and so the processing time for a simulation event to be processed by every service on the message bus was measured in milliseconds.

Multithreaded Map Access

Now every single scenario could be opened, modified by the user without too much issue. However we had 1 last problem, when we played the scenario the displayed picture of events would fall behind the scenario time (e.g. 15 minutes into the scenario, you could see the scenario at 10 minutes, at 1 hour your picture would show the view from 40 minutes, etc..).

In a previous diagram there was a component titled "Map Interface". This was a Java Native Interface for a Map drawing library (think Leaflet, ESRI, etc..).

Back then mapping libraries were very basic and so to draw complex pictures we had to pass in bitmap images. Our scenario was large enough that transferring the binary data via the interface took longer than the interval between updates (e.g. we generate updates every second and it took 1.1 seconds to draw them).

The first 'fix' was to move away from a list of updates to a map, so rather than appending new updates and performing them in sequence. The map only held the latest update for the object. This helped the view snap to the situation at the current scenario run time.

The second fix was recognising our map library supported updating existing objects, so rather than purging everything, transferring bitmaps through the layer to be redrawn, we started updating the object within the map.

The last fix was recognising each map layer was independent and we could spawn a thread for each one. This meant if one layer had become blocked the Map would still be responsive and other layers would receive updates.

The tool can open Scenario XYZ

At this point you might be asking what any of this has to do with checking your requirements and I'm going to take you back to the requirement:

The tool can open Scenario XYZ 

When we demonstrated the tool opening and running Scenario XYZ to the old tool support team and senior user they were stunned! The scenario was created because a customer had two requirements:

The tool can create scenario's of X objects

The tool can run scenario's of X/100 objects

So the old team created Scenario XYZ which showed it could create and open scenarios containing X objects, but they never needed to actually demonstrate running that scenario or even the tool being particularly usable with a scenario that large.

At this point one of them showed us the original tool opening scenario XYZ, the entire computer slowed down, all UI events had a 2-5 second delay and when they tried to run the scenario the application crashed.

We had surpassed this capability with our first step (Single Threaded Application) which took us 2-3 weeks to complete. I spent 3 months implementing the remaining steps.

You could argue our work made a much better product, but was it actually required?

NO

Comments

Popular posts from this blog

Continuous Integration is an organisation problem

How tools have their own workflow