Why the 'Virtual Cockpit' model risks IT turbulence
Are you in IT Ops or Application Support at an organization that runs highly consequential apps such as trading platforms in capital markets? If so, consider yourself the pilot. Read here why a ‘Virtual Cockpit’ has the potential to introduce unwanted points of failure throughout your flight.
Virtual Cockpit to manage complex IT businesses
Look into the cockpit of an aircraft and it will look like a complex array of dials and screens - but not to the pilots who are trained on the aircraft and know how to use the data to do their jobs.
Managing a large and complex IT infrastructure which is mission critical to your business is similar to flying an aircraft. To do it safely, you need all the necessary data presented to you in the cockpit with some help to show you what to focus on.
As the pilot, you need to know the status and health of the key elements of the aircraft like the jet engines, the landing gear, and the cabin pressure. You need to know your fuel levels and rate of fuel burn. You need to know ground speed (how long is it going to take to get to the destination) and airspeed (how fast is the air going over the wings). You need to know the direction and altitude and any significant changes in these. You need to know the terrain of the ground underneath you and what else is in the skies around you.
All this data comes from different sources, most of it is captured on the plane and some is sent to the aircraft from other sources like air traffic control or weather services. All of it is presented to the pilots to allow them to fly the plane safely.
Move to Virtual Cockpit
Over the last few years, most vendors of the instrumentation used to run large complex IT estates have moved their monitoring tools to SaaS-only delivery models. The User Interface (UI) may look better, easier to navigate, easier to configure, and adjust as you want to see it. The analysis of the data might show potential problems or prediction metrics in the future. This feels like a significant step forward. It demos well when you are buying.
Now, imagine that you are a pilot flying an aircraft and I tell you that all the telemetry will be sent to a data center somewhere in the USA. The virtual cockpit will be your view of the data and you should be concerned! What if the connectivity between the aircraft and the data center is disrupted? What if the data center is having problems and can’t collect, analyze, or present your data back to your cockpit? You’d be flying blind.
You can’t fly a plane which is dependent on the performance of the connectivity to the data center and the performance of the data center itself. It just introduces risks that you cannot control: each on its own is very reliable, but the combined reliability of the cockpit is the multiple of each one’s availability and performance. The risk has gone up and you don’t know by how much.
‘Real-time’ data
You sensibly ask: “Is the display real-time?” – you are told it is. So, you ask: “What’s the delay in data from the aircraft to the data center and back to the aircraft?” – you are told it varies and it is not under anybody’s direct control. Then you ask about the processing time of the data before the virtual cockpit is updated – you are told it varies depending on the rate at which you are sending data and the type of data you are sending. Logs take much longer to index and search than metrics or traces. How much longer? No one knows exactly since the data center is shared with all other aircraft using the virtual cockpit. So, you want to see for yourself and try it – and you find it is more than 30 minutes.
As a pilot, are you prepared to fly an aircraft where some of the data only updates every few minutes or the delay in other key data is 30 minutes or more? Can you land an aircraft with a 30-minute delay in the altimeter or a 1-minute update on the airspeed? … Get ready for a bumpy ride.
Don’t lose control of the plane
The move to SaaS delivery of many software products is continuing at pace, as it removes complexity and costs. But sometimes, when availability and performance are mission-critical, it isn’t the right option.
The largest monitoring company in the world is Datadog, which only offers a SaaS delivery model. On 8th March 2023, it experienced a worldwide outage. Many services were down for 8 hours, and some took 24 hours to recover. If you use Datadog’s log management service to monitor your logs, and you are looking for errors to be reported in the logs, it can take over 1 hour from when the error is written into the log on your server until the console flags that the event has occurred.
Instead, you may need the option to have your instrumentation software on board your aircraft and for all the data to remain within the plane. The options for this are decreasing and few understand the consequences until it is too late.