Opening up rail data Blog

Published on August 13, 2018

As the DfT announces an initiative to make more of the rail industry’s data ‘open’, I thought I’d share my views on a few key gaps, and highlight some of the tools using the data already available.

Key Gaps

Fare availability data – for several years, the industry timetable and static fares data has been made widely available. However, this only includes the ‘static’ aspects of the fares data – not the actual availability of a given fare on a given train. Whilst there are some uses of the static data (e.g. season ticket fares, or 'walk-up' fares) – as the industry increasingly makes use of ‘quota controlled’ fares (e.g. Advance fares) the static data alone doesn’t allow you to know what a given journey will cost.

The data is probably needed in two manners:

  • (i) a simple ‘best fare’ query interface, to allow mobile apps and other simple applications to quote the best fare for a given journey;
  • (ii) a subscription feed, allowing more complex applications to subscribe to maintain a comprehensive database of fare availability – to power more complex searches – for example to find the cheapest routes and times to travel across the network.

It is reasonably well known that current systems make it difficult for operators to share this data (the system simply can’t handle large volumes of queries). There might also be commercial concerns if e.g. airlines or coach companies systematically used this data to influence their own pricing – this might result in air and coach passengers paying more, and/or may result in rail being systematically undercut. However, this risk is probably overstated - those air and coach companies can access this data from public websites today, and I believe services are available that ‘screen-scrape’ this data for them.

Train formation and loading data – the existing timetable and real-time data feeds give surprisingly little information about a train formation – i.e. the number of carriages, location of first class, buffet, bike storage, etc - let alone where the empty seats are.  In recent years the information has started to appear on customer information screens and is exactly the sort of information that would help customers wait in the ‘right’ part of the platform, or even potentially defer their travel a few minutes to get a slightly later but quieter train. This behaviour helps train companies operationally too through quicker boarding times.

As onboard systems for identifying empty seats become more sophisticated ( , this data should improve in quality.

Origin-Destination Matrix – the rail industry currently shares aggregated data on the estimated number of station entries/exits for each station on the rail network. Behind this aggregated data is a much more comprehensive model that estimates the number of journeys between every possible origin-destination pair on the network. This gives a much more comprehensive understanding about how the rail network is used. It’s data that is often cited as being ‘highly commercially sensitive’ but at the same time, most of it is widely available within the industry.

The potential uses of this data are less about direct consumer facing applications – but helping those developing services and products targeted at the rail industry and its passengers understand the potential of the market. I was recently approached by a friend of a friend who was considering renting some vacant station space at a rural station for a bike hire shop – whilst he knew the station footfall, being able to infer whether this was local commuters making a local hop, vs inbound tourists were key to understanding the viability, but no one was allowed or willing to share this data. No doubt similar issues exist for others trying to plan services to support rail passengers – even my local District Council’s ‘Local Plan’ seemed to have very little understanding of rail journeys in and out of the district, and it underpins some pretty significant local investment decisions.

Existing tools built on existing open data

Whilst there are notable gaps, there is already a lot of ‘open’ data being used to provide tools/apps for passengers:

Planning a journey

Probably the most widespread use of open data is using timetable data for journey planning. There are too many examples to list them all, but these include major players such as Google, industry owned portals such as Traveline and independent sites such as

How much will the fare cost?

Most websites and apps quoting fares are not using ‘open’ data, but have some form of data licence (usually a retail licence) which includes access to fares and availability. These include the industry’s own, train operating companies own websites, and third party ticketing sites such as .

Some of the websites include more ‘advanced’ search capabilities to hunt out anomalies and loop holes, for example or which both search for scenarios where splitting up a journey into a combination of tickets is cheaper than an end-to-end ticket.

Sticking to those using ‘open’ data - then to see every published fare, regardless of current availability or time of travel, then see . (Albeit this is a good example as to why fare availability data is also required).

Is it going to be on time?

Having worked out the times and fares, you may want to know if the train is likely to run on time. A handy app for this is where you can check how your chosen train service has run over recent weeks (some services have a better punctuality record than others).

Is it on time today?

When it comes to the time of travel (and numerous apps running from the same data) can tell you how it’s running. 

If you want more granular information (exactly where is my train, and how many trains are in front of it) then Open Train Times or Real Time Trains both expose a lot of information. Alternatively RailDar will show you where your train is estimated to be on a normal Google map.

If you want a quick glance at the general state of the rail network then either National or will give you a quick headline status. The latter also allows you to review historic performance.

Claim compensation

Several websites use historic running information to semi-automate the process of claiming compensation, these include, and A key concern here seems to be the extent to which these tools are used to facilitate fraudulent claims.

Hopefully there will be more tools and services to add to this list as the range of open data increases.