The Piccadilly line is currently experiencing severe delays due to the non-availability of trains

Number of tube trains running on the Piccadilly Line between 21 November and 20 December 2016. Unfortunately, some data was lost between the 8th and 9th of December. The numbers on the x-axis are the day of the month and the hour.

It seems that the Piccadilly line is only just getting back to normal after a problem with flat spots on the wheels of 50% of the trains: [TfL Link].

Looking at the data for the number of trains running, this seems to stem from around the 24th November (Thursday) when the numbers started to drop off. Analysing this data is problematic because of the noise inherent in the data collection process and the need to take weekends into account. There is also the launch of the Night Tube on the Piccadilly line which happened on Friday 16th December.

Plotting the total number of tubes running over a 24 hour period as a moving average makes things a bit simpler:


The data break is immediately evident on the 8th and 9th December, but the numbers can be seen to be dropping from the 23rd to the 28th. Then the 5th, 6th and 7th of December (Mon to Wed) just before the data break is particularly bad. It’s interesting to note that there were more tubes running on the 10th and 11th of December (Sat/Sun) than there were running on the 5th and 6th (Mon/Tue), which seems to be the worst period for the Piccadilly Line.

While this is quite an interesting exercise, the real value of this type of analysis is in the effect it has on the commuter. Spacing between tubes of 15 minutes were reported and sections of the line had no service at all. What I need to develop now is a way of generating these spatial analytics automatically from the data as we collect it.

MapTube is Back

Somebody managed to do a SQL injection attack on MapTube recently, so it hasn’t been working properly for a while. Now that the vulnerability has been identified and fixed though, it’s back to normal again.

Looking through the logs, they’ve spent the best part of a month trying to do this, so I wish I had seen it earlier. It’s also been flagged by the main firewall as malicious.

I’ve had this idea for a while, but it occurred to me that we should be doing some spatial analysis on where all these attacks are coming from. They use groups of IP addresses which they change every day, but we have years worth of data now for a number of different web servers which could be analysed. The same applies to all the spam email that we’re filtering out. Just looking at the web server logs for this morning from midnight to 9am, there were 15 potential attacks and there were also 39 the day before, so there’s a lot of potential data there if we started putting it all together. It’s all just information theory.