MapTube is Back

Somebody managed to do a SQL injection attack on MapTube recently, so it hasn’t been working properly for a while. Now that the vulnerability has been identified and fixed though, it’s back to normal again.

Looking through the logs, they’ve spent the best part of a month trying to do this, so I wish I had seen it earlier. It’s also been flagged by the main firewall as malicious.

I’ve had this idea for a while, but it occurred to me that we should be doing some spatial analysis on where all these attacks are coming from. They use groups of IP addresses which they change every day, but we have years worth of data now for a number of different web servers which could be analysed. The same applies to all the spam email that we’re filtering out. Just looking at the web server logs for this morning from midnight to 9am, there were 15 potential attacks and there were also 39 the day before, so there’s a lot of potential data there if we started putting it all together. It’s all just information theory.

Night Tube Jubilee Line

Stacked area chart showing the number of tubes running on the Central, Jubilee and Victoria lines only.
Stacked area chart showing the number of tubes running on the Central, Jubilee and Victoria lines only. The data shows Thursday 6th October, Friday, Saturday and Sunday.

Just to follow up on the last post about the launch of the “Night Tube”, the service launched on the Jubilee Line last Friday. There are now close to 40 tubes running over night at the weekend on the Central, Victoria and Jubilee lines. The chart above shows the number of tubes running on Thursday 6th October through to 23:57 on Sunday night. The morning and evening peak rush hours are evident for Thursday and Friday, then the first Jubilee Line night services can be seen in the trough between Friday and Saturday.

The interesting thing to do now would be to run a public transport accessibility analysis using the real-time running data to see which parts of the city are now more connected. As today is the second day of the Southern Rail strike, that might make another interesting subject. Using the Census travel to work data we could forecast the areas where people are going to be late for work because of transport failures. That could potentially give a measure of what effect any strikes, or even just “congestion” generally, is having on London.

On a technical level, one thing which is now becoming apparent is that the number of drop-outs from the API has increased. It used to be that there would be a few per day on the Northern Line (biggest data), but now they seem to be occurring randomly across all the lines. The CASA API has been collecting data since the London Olympics in 2012, so it’s long overdue for some maintenance.

Night Tube Launches

NightTubes_20160822
Stacked area chart showing total number of tubes running on all lines from midnight on Thursday 18th August through to midnight on the following Monday morning.

The first night tubes started running last Friday evening, so I couldn’t help wondering what that does to the number of tubes running. The graph above shows the total number of tubes running from Thursday 18th to Sunday 22nd August. My first reaction was, “what night service”, but then I read the TfL statement and realised that this is only the Central and Victoria lines with the rest to follow in the Autumn. The arrows on the diagram above show where the extra services show up in the statistics.

The following graph of only the Central and Victoria lines shows it a lot better:

NightTubes_20160822_CV
Stacked area chart showing the number of tubes running on the Central and Victoria lines only.

The total of around 20 tubes running through the night, with a total capacity of around 800 passengers each (TfL Rolling Stock) is a significant extra capacity. It’s just that the peak rush hour service is so much bigger by comparison. I also wonder whether they were testing the Central line on the Thursday evening because of the large number showing up there over night? Normally we get a small residual number of tubes moving during the shutdown period which I’ve always put down to engineering services.

What is going to be interesting is to see how the service adapts to usage over time.

Now That’s What You Call a Tube Strike

Number of tubes running on the London Underground at 08:30 on 9th July 2015
Number of tubes running on the London Underground at 08:30am on 9th July 2015. The time scale runs from 4pm on Wednesday 8th until 08:30am on Thursday 9th.

The graphic says it all really. The width of the stream graph shows the total number of tubes running, with a breakdown by each tube line displayed in the regular line colour (red=Central Line, green=District Line etc).

Basically, there’s nothing running, apart from a “special service” on the Waterloo and City line. I’ve never seen it like this before, as, in the previous strikes, they’ve always managed to run about a 30% service.

Despite being told that everything was going to shut down completely by 6pm last night, it appears that the shut down began around 6pm and wasn’t complete until just after 9pm, although I wouldn’t like to have been trying to get home during that time. From the pictures on the news last night it looked like complete chaos, which just goes to reinforce the fact that we need to establish a method for measuring how many people the tube network is carrying (i.e. the “crush factor”).

Bus Networks and Other Complicated Data

As part of my PhD I’ve been looking at a lot of real-time data about tubes, buses and trains. In fact, I probably started from the point where I already had a lot of data and was wondering what to do with it. While I would not class this as “Big Data”, the complex nature and real-time element make it difficult to analyse and visualise.

21,987 bus stops (blue cubes), 53,896 links (white lines) and the Greenwich Meridian (bright green line).
21,987 bus stops (blue cubes), 53,896 links (white lines) and the Greenwich Meridian (bright green line).

The image above shows the bus network displayed using my virtual Earth viewer. Having previously done a lot of work on the tube network, it only took about half a day’s work to get the buses into the system. One reason for this is that I’ve implemented an agent based modelling system (ABM) similar to NetLogo, so I just have to write the code to load agents and links from CSV files (easy!). The simulation is a bit harder to do, but not much.

Although I knew the bus data was about 10 times bigger than the tube data, what I hadn’t bargained for was the fact that there are 21,987 bus stops (agent nodes), 53,896 route points (links) and up to 7,000 live buses (moving agents). The other weird thing is that TfL seem to be missing 409 bus stops from their master list as there are stops contained in the routes that I don’t have positions for. There are also a lot of invalid lines in the data that look as if there has been an error extracting the data from a database. I had a really interesting discussion with last Thursday’s visitor about that fact because he couldn’t believe it. I think I’m right in saying that there is a theory about complexity that goes along the lines of “any sufficiently complex data analysed deeply enough will always show inconsistencies”? In other words, we just have to deal with it.

Putting in some buildings gives a much better appreciation of just how big:

GeoGLBus_20150511_205417
Some buildings, a river and lots of bus stops.

If you look closely, you can just see the bus stops in the river which are pontoons for the boats. The coloured cubes representing the stops are all 100m on each axis. It now all gets worse, because that graph containing 53,896 route points has to be fragmented using the road network and a routing algorithm to make the buses travel along the roads or rivers. I’ll have to implement this just as soon as I get the data displaying at a reasonable frame rate.

To really put things into the correct scale, and thinking of the highways agency’s UK wide road network, which is on my list:

GeoGLBus_20150510_191155
You are here.

I just like the Winter Blue Marble image, which you don’t see very often. The Google Earth images are all the Blue Marble composite.

So, getting back to the PhD topic, which is about the algorithms which make all of this work, I obviously need to improve the graphics a lot, but most of the building blocks are now in place. I’m a graphics programmer, so the graphics engine is obviously hacked to pieces and I need to tidy it up. All the numbers in the top left of the images are the frame rates, which should be a lot higher than about 4-6 frames per second. If you take the geometry representing the bus routes (links), it’s a mesh with over 6 million points, and it’s taxing my graphics card a bit. Top of the range GPUs these days will do over a teraflop, which used to be supercomputing territory not long ago, but use them in 64 bit mode and the performance drops drastically. I still have some shader tricks to use which will improve the ABM performance a huge amount.

Finally, I have to answer the question, “what’s the point of it all?”. I wanted to analyse real-time dynamic data using a system that allowed me to explore the data visually in both time and space. Why is there a bus route from NW London going diagonally across to the SE in the first image? You can just see the white line going through the buildings, but it looks like an error in the data. Programming the model to simulate the buses allows you to explore the real-time element, but the aim is to have more in the way of analysis and data-mining than the simple widgets you get with NetLogo.

Now I have two networks, my first question is to look at how they compare to each other. I have the whole of 2014 to use for the analysis and a tool which (might) now let me do it.

Number of buses running.
Number of buses running.
Number of tubes running.
Number of tubes running.

I’ve always wondered whether the peaks in the bus, tube and train numbers occur at the same times and whether there is any spatial variation?

Just as an update, here’s a movie I uploaded which shows  the bus network much better than any words can:

 

Ten Car Trains

I had a pleasant surprise on my commute home last night when I found myself on one of the new ten car trains that southwest trains have bought. They’ve coupled two brand new cars onto the front of their existing stock, so we could all see it was a new train as it approached the platform. Being able to get a seat was also a new experience.

Now, in transport modelling terms, that means that they have potentially increased their people carrying capacity by 25%. If they were running 8 car trains before and can now run 10 car trains, that’s a significant increase.

What I don’t know is how many of these trains they’ve got and when and where they’re running them. I’ve looked at the Network Rail data feed, but it doesn’t give you the size of the train. I need to look into the data a bit more deeply as there might be a physical train identifier that I’ve missed. They all have “leading car id”, but I can’t find this in the data feed. Even a map of all the stations that have been converted to take 10 cars would be interesting.

Happy Birthday MapTube!

Today is exactly 7 years since MapTube was launched at the “Digital Geography in a Web 2.0 World” conference at the Barbican in London as part of the GeoVUE project.MapTubeBirthday7

To mark the event I’ve added a new feature to the homepage which should make it more dynamic. Now, if I blog about any maps, they will automatically appear on the MapTube front page with the text, images and map links extracted directly from the RSS feed. Along with the ‘topicality index’, which places maps for data which is currently in the media on the front page, this should keep the website up to date with the latest events. It’s also telling me what information we don’t currently have so we can gradually fill the gaps in our knowledge.

I’m hoping to follow this up in the next month with some real-time data feeds and more interactivity on the maps.

Links:

http://www.casa.ucl.ac.uk/barbican/presentation6.html

http://www.casa.ucl.ac.uk/barbican/

http://www.casa.ucl.ac.uk/news/newsStory.asp?ID=122

Bus Strike January 13th 2015 (Update)

Just for completeness, I’ve updated the two graphs of the numbers of buses running on 13th January with the complete set of data up to 23:59 that night.

Numbers of buses running the 13th (strike) against the previous day
Numbers of buses running the 13th (strike) against the previous day
Ratio of number of buses running on the 13th (strike) to the previous day
Ratio of number of buses running on the 13th (strike) to the previous day

The first graph shows the total number of buses running on Tuesday 13th (red) against the previous day (blue). The second graph shows the ratio of red/blue*100%, or what percentage of a normal day’s buses were running? It levels off at around 24% quite definitely and never reaches the 33% which is the official TfL figure for the percentage of services running. The mean value for 7am to midnight works out as 23.7%, so either TfL have a different way of calculating the figure, or our data is wrong. This is something I’ve been wondering about for a while now, as we assume the data from the TfL Countdown API is accurate, but have no independent cross check. Coding errors can also lead to data issues and we know that during the last tube strike lots of extra buses ran which didn’t have the “iBus” tracker on them and so didn’t show up on our data. Having said that, there is nothing to suggest a problem with the data.

One other thing I was wondering about was what effect the strike would have on tube overcrowding? Having seen a news report from Vauxhall bus garage the previous day, I realised the huge number of people this was going to affect. If you’re a commuter changing trains at Vauxhall, then your logic goes something like this:

1. “There is a bus strike, so everybody who normally catches a bus from there is going to try to get on the tube. The tube will be packed.”

2. “There is a bus strike, so the delivery of people by bus to the tube station will be much lower than normal. The tube will be empty.”

It’s all a question of numbers, but, at the moment,  it’s not something which have the data to even attempt to answer. But, by collecting data about unusual events like this, it might give us the insights into what happens on a normal day.

Bus Strike January 13th 2015

Another day, another major public transport failure. I didn’t think the bus strike was having much of an impact until I got into the office and had a look at the statistics.

Comparison of the number of buses running on the 12th January 2015 (blue) against the 13th (red)
Comparison of the number of buses running on the 12th January 2015 (blue) against the 13th (red)

The graph above shows the number of buses running on the two days using the same horizontal time axis, so 0915 is a quarter past nine on both days. The red graph shows the comparison of how few buses are running. From the data, I can calculate this as about 24%.

Ratio of number of buses running on each day
Ratio of number of buses running on each day

By plotting the ratio of the number of buses running on Tuesday (strike) divided by the number on Monday (no strike), the fall off in numbers from around 4am this morning is visible. From around 7am until 12pm, this levels off at about 24%.

The numbers don’t tell the whole story, though:

12 January 2015 09:00am bus heatmap
12 January 2015 09:00am bus heatmap
BusStrike_20150113_strikemap_blue
13 January 2015 09:00am bus strike heatmap using the same blue colour scale as the normal day’s map

It looks as though there are more buses in London than in the suburbs, but it’s not showing the huge gaps we saw during the May 2012 strike which were caused by only selected unions striking.

Both these maps are online on MapTube at the following link:

http://www.maptube.org/map.aspx?s=DBDFGlF5TLCgQmUcE0fAqVwcCoAMChAME0jAoFwcCoAMChZt
 

 

Building and Signing SharpMap 1.1

I’ve been upgrading the MapTubeD tile renderer code to use the latest release of SharpMap and came across the problem of SharpMap not being a signed assembly. This is a real problem as MapTubeD is signed and so can’t reference an unsigned assembly. The rest of this post details how I built SharpMap from scratch and use the NuGet tools to sign all the referenced assemblies manually.

Before you start, make sure you have Powershell version 3.0 installed. I found that the previous version had an issue when doing a recursive dir command with a wildcard i.e. “dir -rec *.dll” didn’t work.

First, download the latest code release of SharpMap from here [link] by clicking on the “download” link. The changeset I’ve been using is 105237, but I found the download to be very slow, even on a very fast Internet connection.

It is very important to right click on the zip file before you unpack it, select “Properties” from the menu and click the “Unblock” button on the “General” tab. This prevents the warning from Visual Studio of using an application from an untrusted source.

Now extract the zip file to a suitable location.

With the files unzipped, open Visual Studio 2010. If you don’t already have the NuGet package manager installed, follow the instructions to install it here: [NuGet]. Basically it’s just “Tools”, “Extension Manager”, “Online Gallery” and click on the download button for the “NuGet Package Manager”. Let the VSIX installer do its work and you should get a “NuGet Package Manager” option on your tools menu:

VS2010NuGetPackageManager

Open the SharpMap solution (Trunk\SharpMap*.sln) in Visual Studio so you have it open, but don’t compile anything yet. Ignore the Team Studio messages as you don’t need to log in. I also got an error regarding the SharpMap.Demo.WMS example project not being a supported project type, but I ignored this and continued.

Find the main SharpMap project in the “Solution Explorer” (it should be in bold), right click on it and select “Properties”. Click the “Signing” tab on the left, tick the “Sign the Assembly” box and click on the “Choose a strong name key file” drop down. Select “new” and enter “SharpMap.snk” in the “Key file name” box, but UNCHECK the “Protect my key file with a password” check box. Click save and close the dialogue.

Now do the same for the “GepAPI.Extensions” package above the main SharpMap one which you just signed. This time I used “GeoAPIExtensions” for the key file name, but everything else is the same.

Now do a “Build” and “Rebuild Solution” to build everything. If you get an error saying “check ‘Allow NuGet to download missing packages during the build’ “, then follow the instructions to change the NuGet download settings. I found mine was already set to download, but ended up clicking the “Update” button from the “Tools”, “NuGet Package Manager”, “Manage NuGet Packages for Solution…” window. This wasn’t necessary on the first computer I tried this on.

At this point you will see build errors relating to Strong Names i.e. Assembly generation failed — Referenced assembly ‘GeoAPI’ does not have a strong name. You can check this by right clicking the “GeoAPI.dll” in the “References” folder of the solution, selecting properties and seeing “Strong Name” and “false” in the properties window.

Now is the part where all the 3rd party assemblies need to be signed.

You need the Nivot Strong Naming package (full instructions [here]), so use the NuGet console to install it. Go to “Tools”, “NuGet Package Manager” and “Package Manager Console” and enter the following at the PM prompt:

PM> Install-Package Nivot.StrongNaming

This is the tool used to sign the package references that are included by SharpMap, namely BruTile, ProjNet, NetTopologySuite and GeoAPI.

The instructions on the Nivot FAQ page give examples of how to do this, but here is what I did:

PM> $root = join-path (split-path $dte.solution.filename) packages

This sets the directory containing the packages which are going to be signed, relative to the solution directory. All becomes clear when you examine what the variable root gets set to:

PM> echo $root

C:\richard\projects\VS-CS\sharpmap-105237\Trunk\packages

Then to load an unprotected key (no password):

PM> $key = Import-StrongNameKeyPair -keyfile SharpMap\sharpmap.snk
PM> dir *.dll | Set-StrongName -keypair $key -verbose

The response should be “Loaded SNK.” to show that the keyfile has been loaded. Now use this to sign the required assemblies:

PM> cd (join-path $root GeoAPI.1.7.1.1)
PM> dir -rec *.dll | Set-StrongName -keypair $key -verbose

You might get prompts asking about signing of sub assemblies, so just answer “Y”.

GeoAPIStrongName

Now, when you look at the properties for the GeoAPI DLL under the “SharpMap\Resources” folder in the Solution Explorer, you should see that the “Strong Name” property has changed to “True” (see image above).

Repeat these two commands for the following packages:

PM> cd (join-path $root ProjNet4GeoAPI.1.3.0.2)
PM> dir -rec *.dll | Set-StrongName -keypair $key -verbose

PM> cd (join-path $root BruTile.0.7.4.1)
PM> dir -rec *.dll | Set-StrongName -keypair $key -verbose

PM> cd (join-path $root NetTopologySuite.1.13.1)
PM> dir -rec *.dll | Set-StrongName -keypair $key -verbose

PM> cd (join-path $root NetTopologySuite.IO.1.13.1.1)
PM> dir -rec *.dll | Set-StrongName -keypair $key -verbose

PM> cd (join-path $root NetTopologySuite.IO.SpatiaLite.1.13.1.1)
PM> dir -rec *.dll | Set-StrongName -keypair $key -verbose

Check the packages are signed by right clicking on the DLL under the “References” tab of the project and selecting “Properties”. The “StrongName” property should now show as “True” for all the packages above.

Rebuild all and SharpMap will now be a signed assembly with a strong name.