Night Tube Launches

NightTubes_20160822
Stacked area chart showing total number of tubes running on all lines from midnight on Thursday 18th August through to midnight on the following Monday morning.

The first night tubes started running last Friday evening, so I couldn’t help wondering what that does to the number of tubes running. The graph above shows the total number of tubes running from Thursday 18th to Sunday 22nd August. My first reaction was, “what night service”, but then I read the TfL statement and realised that this is only the Central and Victoria lines with the rest to follow in the Autumn. The arrows on the diagram above show where the extra services show up in the statistics.

The following graph of only the Central and Victoria lines shows it a lot better:

NightTubes_20160822_CV
Stacked area chart showing the number of tubes running on the Central and Victoria lines only.

The total of around 20 tubes running through the night, with a total capacity of around 800 passengers each (TfL Rolling Stock) is a significant extra capacity. It’s just that the peak rush hour service is so much bigger by comparison. I also wonder whether they were testing the Central line on the Thursday evening because of the large number showing up there over night? Normally we get a small residual number of tubes moving during the shutdown period which I’ve always put down to engineering services.

What is going to be interesting is to see how the service adapts to usage over time.

Now That’s What You Call a Tube Strike

Number of tubes running on the London Underground at 08:30 on 9th July 2015
Number of tubes running on the London Underground at 08:30am on 9th July 2015. The time scale runs from 4pm on Wednesday 8th until 08:30am on Thursday 9th.

The graphic says it all really. The width of the stream graph shows the total number of tubes running, with a breakdown by each tube line displayed in the regular line colour (red=Central Line, green=District Line etc).

Basically, there’s nothing running, apart from a “special service” on the Waterloo and City line. I’ve never seen it like this before, as, in the previous strikes, they’ve always managed to run about a 30% service.

Despite being told that everything was going to shut down completely by 6pm last night, it appears that the shut down began around 6pm and wasn’t complete until just after 9pm, although I wouldn’t like to have been trying to get home during that time. From the pictures on the news last night it looked like complete chaos, which just goes to reinforce the fact that we need to establish a method for measuring how many people the tube network is carrying (i.e. the “crush factor”).

Bus Networks and Other Complicated Data

As part of my PhD I’ve been looking at a lot of real-time data about tubes, buses and trains. In fact, I probably started from the point where I already had a lot of data and was wondering what to do with it. While I would not class this as “Big Data”, the complex nature and real-time element make it difficult to analyse and visualise.

21,987 bus stops (blue cubes), 53,896 links (white lines) and the Greenwich Meridian (bright green line).
21,987 bus stops (blue cubes), 53,896 links (white lines) and the Greenwich Meridian (bright green line).

The image above shows the bus network displayed using my virtual Earth viewer. Having previously done a lot of work on the tube network, it only took about half a day’s work to get the buses into the system. One reason for this is that I’ve implemented an agent based modelling system (ABM) similar to NetLogo, so I just have to write the code to load agents and links from CSV files (easy!). The simulation is a bit harder to do, but not much.

Although I knew the bus data was about 10 times bigger than the tube data, what I hadn’t bargained for was the fact that there are 21,987 bus stops (agent nodes), 53,896 route points (links) and up to 7,000 live buses (moving agents). The other weird thing is that TfL seem to be missing 409 bus stops from their master list as there are stops contained in the routes that I don’t have positions for. There are also a lot of invalid lines in the data that look as if there has been an error extracting the data from a database. I had a really interesting discussion with last Thursday’s visitor about that fact because he couldn’t believe it. I think I’m right in saying that there is a theory about complexity that goes along the lines of “any sufficiently complex data analysed deeply enough will always show inconsistencies”? In other words, we just have to deal with it.

Putting in some buildings gives a much better appreciation of just how big:

GeoGLBus_20150511_205417
Some buildings, a river and lots of bus stops.

If you look closely, you can just see the bus stops in the river which are pontoons for the boats. The coloured cubes representing the stops are all 100m on each axis. It now all gets worse, because that graph containing 53,896 route points has to be fragmented using the road network and a routing algorithm to make the buses travel along the roads or rivers. I’ll have to implement this just as soon as I get the data displaying at a reasonable frame rate.

To really put things into the correct scale, and thinking of the highways agency’s UK wide road network, which is on my list:

GeoGLBus_20150510_191155
You are here.

I just like the Winter Blue Marble image, which you don’t see very often. The Google Earth images are all the Blue Marble composite.

So, getting back to the PhD topic, which is about the algorithms which make all of this work, I obviously need to improve the graphics a lot, but most of the building blocks are now in place. I’m a graphics programmer, so the graphics engine is obviously hacked to pieces and I need to tidy it up. All the numbers in the top left of the images are the frame rates, which should be a lot higher than about 4-6 frames per second. If you take the geometry representing the bus routes (links), it’s a mesh with over 6 million points, and it’s taxing my graphics card a bit. Top of the range GPUs these days will do over a teraflop, which used to be supercomputing territory not long ago, but use them in 64 bit mode and the performance drops drastically. I still have some shader tricks to use which will improve the ABM performance a huge amount.

Finally, I have to answer the question, “what’s the point of it all?”. I wanted to analyse real-time dynamic data using a system that allowed me to explore the data visually in both time and space. Why is there a bus route from NW London going diagonally across to the SE in the first image? You can just see the white line going through the buildings, but it looks like an error in the data. Programming the model to simulate the buses allows you to explore the real-time element, but the aim is to have more in the way of analysis and data-mining than the simple widgets you get with NetLogo.

Now I have two networks, my first question is to look at how they compare to each other. I have the whole of 2014 to use for the analysis and a tool which (might) now let me do it.

Number of buses running.
Number of buses running.
Number of tubes running.
Number of tubes running.

I’ve always wondered whether the peaks in the bus, tube and train numbers occur at the same times and whether there is any spatial variation?

Just as an update, here’s a movie I uploaded which shows  the bus network much better than any words can:

 

Ten Car Trains

I had a pleasant surprise on my commute home last night when I found myself on one of the new ten car trains that southwest trains have bought. They’ve coupled two brand new cars onto the front of their existing stock, so we could all see it was a new train as it approached the platform. Being able to get a seat was also a new experience.

Now, in transport modelling terms, that means that they have potentially increased their people carrying capacity by 25%. If they were running 8 car trains before and can now run 10 car trains, that’s a significant increase.

What I don’t know is how many of these trains they’ve got and when and where they’re running them. I’ve looked at the Network Rail data feed, but it doesn’t give you the size of the train. I need to look into the data a bit more deeply as there might be a physical train identifier that I’ve missed. They all have “leading car id”, but I can’t find this in the data feed. Even a map of all the stations that have been converted to take 10 cars would be interesting.

Happy Birthday MapTube!

Today is exactly 7 years since MapTube was launched at the “Digital Geography in a Web 2.0 World” conference at the Barbican in London as part of the GeoVUE project.MapTubeBirthday7

To mark the event I’ve added a new feature to the homepage which should make it more dynamic. Now, if I blog about any maps, they will automatically appear on the MapTube front page with the text, images and map links extracted directly from the RSS feed. Along with the ‘topicality index’, which places maps for data which is currently in the media on the front page, this should keep the website up to date with the latest events. It’s also telling me what information we don’t currently have so we can gradually fill the gaps in our knowledge.

I’m hoping to follow this up in the next month with some real-time data feeds and more interactivity on the maps.

Links:

http://www.casa.ucl.ac.uk/barbican/presentation6.html

http://www.casa.ucl.ac.uk/barbican/

http://www.casa.ucl.ac.uk/news/newsStory.asp?ID=122

Bus Strike January 13th 2015 (Update)

Just for completeness, I’ve updated the two graphs of the numbers of buses running on 13th January with the complete set of data up to 23:59 that night.

Numbers of buses running the 13th (strike) against the previous day
Numbers of buses running the 13th (strike) against the previous day
Ratio of number of buses running on the 13th (strike) to the previous day
Ratio of number of buses running on the 13th (strike) to the previous day

The first graph shows the total number of buses running on Tuesday 13th (red) against the previous day (blue). The second graph shows the ratio of red/blue*100%, or what percentage of a normal day’s buses were running? It levels off at around 24% quite definitely and never reaches the 33% which is the official TfL figure for the percentage of services running. The mean value for 7am to midnight works out as 23.7%, so either TfL have a different way of calculating the figure, or our data is wrong. This is something I’ve been wondering about for a while now, as we assume the data from the TfL Countdown API is accurate, but have no independent cross check. Coding errors can also lead to data issues and we know that during the last tube strike lots of extra buses ran which didn’t have the “iBus” tracker on them and so didn’t show up on our data. Having said that, there is nothing to suggest a problem with the data.

One other thing I was wondering about was what effect the strike would have on tube overcrowding? Having seen a news report from Vauxhall bus garage the previous day, I realised the huge number of people this was going to affect. If you’re a commuter changing trains at Vauxhall, then your logic goes something like this:

1. “There is a bus strike, so everybody who normally catches a bus from there is going to try to get on the tube. The tube will be packed.”

2. “There is a bus strike, so the delivery of people by bus to the tube station will be much lower than normal. The tube will be empty.”

It’s all a question of numbers, but, at the moment,  it’s not something which have the data to even attempt to answer. But, by collecting data about unusual events like this, it might give us the insights into what happens on a normal day.

Bus Strike January 13th 2015

Another day, another major public transport failure. I didn’t think the bus strike was having much of an impact until I got into the office and had a look at the statistics.

Comparison of the number of buses running on the 12th January 2015 (blue) against the 13th (red)
Comparison of the number of buses running on the 12th January 2015 (blue) against the 13th (red)

The graph above shows the number of buses running on the two days using the same horizontal time axis, so 0915 is a quarter past nine on both days. The red graph shows the comparison of how few buses are running. From the data, I can calculate this as about 24%.

Ratio of number of buses running on each day
Ratio of number of buses running on each day

By plotting the ratio of the number of buses running on Tuesday (strike) divided by the number on Monday (no strike), the fall off in numbers from around 4am this morning is visible. From around 7am until 12pm, this levels off at about 24%.

The numbers don’t tell the whole story, though:

12 January 2015 09:00am bus heatmap
12 January 2015 09:00am bus heatmap
BusStrike_20150113_strikemap_blue
13 January 2015 09:00am bus strike heatmap using the same blue colour scale as the normal day’s map

It looks as though there are more buses in London than in the suburbs, but it’s not showing the huge gaps we saw during the May 2012 strike which were caused by only selected unions striking.

Both these maps are online on MapTube at the following link:

http://www.maptube.org/map.aspx?s=DBDFGlF5TLCgQmUcE0fAqVwcCoAMChAME0jAoFwcCoAMChZt
 

 

Building and Signing SharpMap 1.1

I’ve been upgrading the MapTubeD tile renderer code to use the latest release of SharpMap and came across the problem of SharpMap not being a signed assembly. This is a real problem as MapTubeD is signed and so can’t reference an unsigned assembly. The rest of this post details how I built SharpMap from scratch and use the NuGet tools to sign all the referenced assemblies manually.

Before you start, make sure you have Powershell version 3.0 installed. I found that the previous version had an issue when doing a recursive dir command with a wildcard i.e. “dir -rec *.dll” didn’t work.

First, download the latest code release of SharpMap from here [link] by clicking on the “download” link. The changeset I’ve been using is 105237, but I found the download to be very slow, even on a very fast Internet connection.

It is very important to right click on the zip file before you unpack it, select “Properties” from the menu and click the “Unblock” button on the “General” tab. This prevents the warning from Visual Studio of using an application from an untrusted source.

Now extract the zip file to a suitable location.

With the files unzipped, open Visual Studio 2010. If you don’t already have the NuGet package manager installed, follow the instructions to install it here: [NuGet]. Basically it’s just “Tools”, “Extension Manager”, “Online Gallery” and click on the download button for the “NuGet Package Manager”. Let the VSIX installer do its work and you should get a “NuGet Package Manager” option on your tools menu:

VS2010NuGetPackageManager

Open the SharpMap solution (Trunk\SharpMap*.sln) in Visual Studio so you have it open, but don’t compile anything yet. Ignore the Team Studio messages as you don’t need to log in. I also got an error regarding the SharpMap.Demo.WMS example project not being a supported project type, but I ignored this and continued.

Find the main SharpMap project in the “Solution Explorer” (it should be in bold), right click on it and select “Properties”. Click the “Signing” tab on the left, tick the “Sign the Assembly” box and click on the “Choose a strong name key file” drop down. Select “new” and enter “SharpMap.snk” in the “Key file name” box, but UNCHECK the “Protect my key file with a password” check box. Click save and close the dialogue.

Now do the same for the “GepAPI.Extensions” package above the main SharpMap one which you just signed. This time I used “GeoAPIExtensions” for the key file name, but everything else is the same.

Now do a “Build” and “Rebuild Solution” to build everything. If you get an error saying “check ‘Allow NuGet to download missing packages during the build’ “, then follow the instructions to change the NuGet download settings. I found mine was already set to download, but ended up clicking the “Update” button from the “Tools”, “NuGet Package Manager”, “Manage NuGet Packages for Solution…” window. This wasn’t necessary on the first computer I tried this on.

At this point you will see build errors relating to Strong Names i.e. Assembly generation failed — Referenced assembly ‘GeoAPI’ does not have a strong name. You can check this by right clicking the “GeoAPI.dll” in the “References” folder of the solution, selecting properties and seeing “Strong Name” and “false” in the properties window.

Now is the part where all the 3rd party assemblies need to be signed.

You need the Nivot Strong Naming package (full instructions [here]), so use the NuGet console to install it. Go to “Tools”, “NuGet Package Manager” and “Package Manager Console” and enter the following at the PM prompt:

PM> Install-Package Nivot.StrongNaming

This is the tool used to sign the package references that are included by SharpMap, namely BruTile, ProjNet, NetTopologySuite and GeoAPI.

The instructions on the Nivot FAQ page give examples of how to do this, but here is what I did:

PM> $root = join-path (split-path $dte.solution.filename) packages

This sets the directory containing the packages which are going to be signed, relative to the solution directory. All becomes clear when you examine what the variable root gets set to:

PM> echo $root

C:\richard\projects\VS-CS\sharpmap-105237\Trunk\packages

Then to load an unprotected key (no password):

PM> $key = Import-StrongNameKeyPair -keyfile SharpMap\sharpmap.snk
PM> dir *.dll | Set-StrongName -keypair $key -verbose

The response should be “Loaded SNK.” to show that the keyfile has been loaded. Now use this to sign the required assemblies:

PM> cd (join-path $root GeoAPI.1.7.1.1)
PM> dir -rec *.dll | Set-StrongName -keypair $key -verbose

You might get prompts asking about signing of sub assemblies, so just answer “Y”.

GeoAPIStrongName

Now, when you look at the properties for the GeoAPI DLL under the “SharpMap\Resources” folder in the Solution Explorer, you should see that the “Strong Name” property has changed to “True” (see image above).

Repeat these two commands for the following packages:

PM> cd (join-path $root ProjNet4GeoAPI.1.3.0.2)
PM> dir -rec *.dll | Set-StrongName -keypair $key -verbose

PM> cd (join-path $root BruTile.0.7.4.1)
PM> dir -rec *.dll | Set-StrongName -keypair $key -verbose

PM> cd (join-path $root NetTopologySuite.1.13.1)
PM> dir -rec *.dll | Set-StrongName -keypair $key -verbose

PM> cd (join-path $root NetTopologySuite.IO.1.13.1.1)
PM> dir -rec *.dll | Set-StrongName -keypair $key -verbose

PM> cd (join-path $root NetTopologySuite.IO.SpatiaLite.1.13.1.1)
PM> dir -rec *.dll | Set-StrongName -keypair $key -verbose

Check the packages are signed by right clicking on the DLL under the “References” tab of the project and selecting “Properties”. The “StrongName” property should now show as “True” for all the packages above.

Rebuild all and SharpMap will now be a signed assembly with a strong name.

Analysis of the UK Airspace Closure on Friday

As anybody who was trying to fly last Friday will know, a software failure at NATS caused the closure of UK airspace for 36 minutes between 15:27 and 16:03. The delays at Heathrow continued until Saturday as the system runs at 99% capacity and so has no spare capacity to recover from a failure.

I have data from the arrivals and departures boards logged for Heathrow, Gatwick and City, but the Heathrow data was easier to handle, so I’ve plotted a graph of the effect of the shutdown below:

Heathrow_20141212_bar
Average delay minutes at Heathrow. The x-axis shows the hour of the day from 12:00 on 12th December 2014 until 16:00 on 13th December 2014.
Heathrow_20141212_line
Average delay minutes at Heathrow. The x-axis shows the hour of the day from 12:00 on 12th December 2014 until 16:00 on 13th December 2014.

 

Both graphs show the same data, with the x-axis divided into hours. The fault with the air traffic computer system occurred at 15:27, but unfortunately, the 16:00 data is missing. This might be attributable to high demand on the web server giving information to passengers.

The other interesting feature is that the first peak occurs round about 20:00 on the 12th, then it looks like things are getting back to normal until midnight on the 13th, when there is a sudden peak of over 500 minutes. Looking at the data, this is accounted for by a large number of transatlantic flights (about 110) all arriving several hours late at around the same time. This might be a sensible option, as by delaying flights arriving from long distance, they can get the flights that didn’t leave earlier in the day out, then extend the airport operating hours to clear the backlog.

My main reason for showing this data is that I’ve attempted to analyse it before and failed. That time it was snow that caused the airport [link], but the type and complexity of the data is currently preventing analysis. The graphs above don’t show the number of planes being handled, or the number being cancelled. The data is also very dirty in the sense that it’s very difficult to analyse with a computer something that is a plain text description designed for a human. The status text can be something like “LANDED”, “EXPECTED”, “CANCELLED”, “CALL AIRLINE”, “TAXIED”, “AIRBORNE”, or “SCHEDULED”. Also, the delay minutes can be negative if a plane gets in early, so, in the data above, this pulls the average down. I don’t think this is significant, though, as far too many aircraft have delays of hours for a few minutes early to make much difference. This does highlight the other problem, which we also see with the rail data, where a delay of 10 minutes on a route that takes 40 minutes is more significant than 10 minutes’ delay on a train from Scotland to London. Similarly, with the airports data, transatlantic flights taking over 24 hours are hard to analyse side by side with European flights taking 1 or 2 hours.

Basically, I’m still struggling with how to handle this data and how to include it with all the other factors we have which show how well London is operating. I think that more work needs to be done on data mining the archive data to establish a reliable baseline and detection of significant features. Then the data fusion to bring the airports data in with all the other transport data can happen. It would be interesting to know whether the Victoria line problems were a knock-on effect of the Heathrow problem on Friday night.

Just to finish off, here’s another graph of the number of the absolute number of delays and cancellations over the same period. It’s interesting how the peak in delays occurs at 18:00, two hours after the problem has been cleared:

Absolute number of delayed and cancelled flights at Heathrow. The x-axis shows the hour of the day from 12:00 on 12th December 2014 until 16:00 on 13th December 2014.
Absolute number of delayed and cancelled flights at Heathrow. The x-axis shows the hour of the day from 12:00 on 12th December 2014 until 16:00 on 13th December 2014.

Building Virtual Worlds, Part 5

Asynchronous loading is finally working…

After about a month, I’ve finally got the loading of 3D content into the data cache working asynchronously. It’s taken me a lot of work to get this right as it had to be implemented using double message queues to offload background workers to another thread while the main thread continues with the rendering. Once the 3D content is loaded into the main cache, a message gets pushed back to the rendering thread, signalling that the new block of buildings is ready for display.

It’s a pity that the YouTube clip ends up so blurred, but it might be that it doesn’t like my 21:9 aspect ratio monitor. The original clip is 2560×1080 pixels, so it started out OK.

Now that I’ve got the message queues sorted out I’m going to go back and revisit some of the rendering. The Earth textures also need to load on demand as you can see how I’ve compromised the level of detail above. Also, the buildings don’t load into buffers on the background thread yet. The SSD disk on my machine is so fast that it looks like they’re loading from RAM when actually they’re coming from the disk. Running on my iMac at work shows what a big difference it makes.