What is the methodology we are using at CheckIn.com. Questioning the data silos for years, we believe you got to understand to believe. And as Jürgen explained frequently in his blog, the time of silo data is over.
CheckIn.com uses the latest in database and map technology to provide highly automated, instantly available and affordable catchment area analyses. So far, consultants charged amounts often in excess of € 10,000 for a simple cross-border isochrones analysis, often on district level to keep the work manageable, using outdated technology for mapping and drive time calculations. Our custom built database allows us to offer not only basic isochrones analyses for free, but we provide a completely new quality level providing catchment area analyses under consideration of the impact of the neighboring airports. At a fraction of the cost: € 1,200 for a single airport analysis, € 1,800 for the soon to be offered route level analyses (airport pair)!
A research in December 2016 resulted in a case study that confirmed the unreliable data quality of existing “simple” isochrone data from other sources.
This document describes our methodology in order to forestall questions or incorrect implications. We are confident to provide the business case to any aviation network planner by adding a previously unavailable or unjustifiable expensive data at an exception quality with an affordable price tag. And without waiting, but instantly available.
Customers and competitors alike questioned our methodology. Or that there can be sound quality behind our data for such a low price. Yes, there is hard, pioneering work behind this, we have worked for more than five years with the usual setbacks to come up with what we have today. We do happen to doubt that there is a business case for a competitor to recreate the stunt – and to provide the same level of quality at a similar cost. And yes, we question any competitor not publishing their own methodology for their analyses.
Braunschweig, January 2017
At this time, most airports and airlines are used to equate “isochrones” with “catchment areas”. But Isochrones are not catchment areas. Furthermore we learned that isochrones have an extremely bad reputation with airline network planners. Researching this based on public “catchment area data” on ANNA.aero’s TheRouteShop in December 2016, we found this reputation justified. Where some airports obviously did their homework, we questioned some far-off results with the airports. We learned that they used guesstimates “as the airline demanded the numbers”. An airport forwarded us a “competitive offer” for a cross-border isochrones analysis, one-off, no updates, at € 15,000. They felt, they could not justify that price tag, such their use of guesstimates… Those airports though did not understand that today we are in the computer age. And that those numbers are used for highly sophisticated analysis of route feasibility. But if you use garbage as a base, all you get as a result can only be garbage too. Garbage in. Garbage out.
We also checked with our friends at ANNA.aero, but they consider TheRouteShop as a service to airports. All “facts” come from the airports and such may by marketing-driven to beautify the airport in the eye of the airline.
For the existing better airports, we learned that mostly the data is not very much updated, airports use those isochrone data for even longer than 10 years. Even the official Censuses are done every 10 years, our findings are that about 5% of the base data by national statistics change every year. Many of the existing “good” results are based on Microsoft MapPoint or something similar, a software that Microsoft discontinued (for good reason) 2013 and took out of market by the end of 2014. Where that tool could calculate drive times, it did not take into account road speed limits but used defaults for five different street types (highway, federal streets, major and minor roads, ways). Ferries if supported had no time associated, so a six hour ferry resulted in zero drive time. And they failed to work “cross-border” on the maps. Garbage in. Garbage out.
Finally, one large airport operator published numbers that sometimes where on the spot, sometimes totally off-set. Using the same tool, that simply does not make sense. Another airport operator used the same isochrones for all airports, irrespective of size. Garbage in. Garbage out.
From our public launch in spring 2016 CheckIn.com offered our highly automated analyses of top quality and detail available at € 1,200 for a single airport, a fraction of the above price. In order to make the data quality commonly available. First airlines started using our services, though the airports remain reluctant to even spend that small amount, even being pointed to the off-set in their data.
We currently have four types of base data: Airports, population, drive times and maps. The challenge might not seem much, but…
The main airport data we used were passengers. Believe it or not, we have numerous examples where that simple information is not even consistent on an airport’s website. We mainly used Wikipedia and the airports’ own websites as a source but increasingly receive the statistical data from the airports directly. That usually allows us to add more and more airports on a monthly data level allowing us to add seasonality analysis.
Planning on route level analysis, we use Eurostat and other publicly available data. In our first route level analysis we will add route-level passengers, flights and seats offered. We would like to add fare level analyses, punctuality and such data, such have plans to add interfaces to external data providers of MIDT data to make available more sophisticated analyses. Given the business case.
In order to associate drive times to people, we needed common statistical data. In Europe, the municipality as the Local Administrative Unit, Level 2 (LAU-2) is Eurostat’s smallest level, the same for most of the countries’ national data we checked. So we decided to use that for Europe. But exceptions even exist in Europe i.e. with Scotland (as part of Britain) uses “Data Zones” instead and the “Ward level” England, Wales and Northern Ireland support is an “estimate” from the data zones.
Then, not even all countries in Europe are in Eurostat, such as the Balkan-countries. And then we extended beyond Europe into Russia/CIS (and as I write this, we are still in the process doing so).
Not all data is current as of the previous year, i.e. Germany and France usually lag behind. Russia does not provide “interim” data but only the data of the last official “Census”, in this case 2010. Sometimes more recent data is available for individual cities, but we use the common numbers as anything else would be likely misleading.
So when we researched the latest population data from Eurostat and the national statistics offices, guess what. The data is not compatible. Is it Aarhus or Århus? Hillerød Hilleroed or Hillerod? Nürnberg, Nuremberg, Nuernberg or Nurnberg? Yes, we found them all… And different from airports having a common code, they all use “their own” and i.e. Eurostat fails to provide the national code to improve the ability for us to associate. So we maintain our own code tables. And we had to develop extensive filters to identify and properly associate the different data sources with our own database.
Then there are structural administrative reforms, merging or splitting municipalities. That is not just a little thing, it impacts about 5% of the data every year in Europe …
For the drive times, you need two geo-points, a digital latitude/longitude reference. The airport. And in the municipality.
Now most airports geo-point reference is on the center of the runway. Very helpful to drive there. Aside the fact that the nearest street from there might be a little service road on the other side from the terminal. So we initially researched all airports and assigned them a different, custom geo-reference at or close to the terminal on an accessible road. With usually a maximum offset below 30 seconds which is statistically sound. And better than the runway center we found used by others.
Then for the municipality, the default is a geographic center of the municipality we call “Centroid”. Unfortunately, these centroids are very often in the middle of nowhere, far from the next street. Or on a forest service road, some 20-30 minutes off the real “city center”. OpenStreetMap provides many “nodes as admin center”, a geo-location that is supposed to be located at or near the town hall or a similar central point in the municipality. Unfortunately, we found even those being very often simply set to the centroid. The same for cadastre offices data. So we constantly use spare time to review all municipalities which are not set to a “city center”, but this is very tedious and time consuming work. So far it seems we’ve done most of the relevant ones, but we do keep reviewing and prioritize if we learn of possible off-sets. Still, there are areas, especially in the Alps or the Balkan mountains causing “weird” maps. But which are statistically logic. If you do find anything weird, we are happy to check and either prioritize or explain.
Given the two geo points it takes about 45 minutes to calculate 2.500 drive times on Google, with maximum of 1 million drive time requests no matter the package you buy from Google … we currently have some 2.5 million drive times on file. Just for Europe. And increasing. So Google was not a choice. We do (manually) cross checks with Google (and some 20 other systems) to make sure we’re not too far off-set.
We use OpenStreetMap data today, with our own filters and settings in some areas. That covers about 95% of European street data with major holes in Scandinavia and some trouble on ferries. For the remaining we manually research, covering to +99% drive time availability. The drive time calculations take into account individual street speed limits as well as national and regional peculiarities. The ferry data also improves also thanks to our own adjustment work there. Ferries may not seem important, but there are islands with substantial population (such statistically relevant) that are only reachable by ferry, many of which not even allow cars, another area where we apply manual adjustments.
The off-set between current on- and offline navigation technology by Apple, Bing (Microsoft), Garmin, Google, Tom Tom, etc. are within a negligible window of +/- 3%, in exceptions though that can be something like 15 or 20% in “remoter” areas. After all updates we run spot checks on
We have several thousand individual overrides for known problems relevant to the catchment area, covering for known errors on the drive times. But we do not cover for temporary issues like major road reconstruction that may impact the drive times between to airports.
But the main concern and time eating at that are the remaining centroids. We constantly work on them and have assigned substantially more than any other system we know.
So now we have the drive times, but we want to map the data. And there is yet another hurdle. Whereas we also had trouble on the accuracy of administrative boundaries, not matching the municipality data. Usually the commercial maps limp behind on municipality changes, where OpenStreetMap sometimes is too accurate. So we basically started with the freely available OpenStreetMap boundaries and still use them to an extend. Which makes sense, as we use OpenStreetMap data for our underlying maps. But in most cases, municipality boundary data became “open data”, such we received them from the national cadastre offices.
There is official EU administrative boundary data, but the license model is exceptionally expensive. With most data being available either in OpenStreetMap or as Open Data from the cadastre offices, we found it worthwhile to do it that way.
Problems arise from incorrect map-offsets from the cadastre data to the map data. So at some places the layers we color do not match with the underlying maps (see Corfu and zoom in once for an example). Though most of those off-sets have little relevance on the usual map scale our maps use. but it also impacts on some individual municipalities or at country borders. Other areas, for instance the Kosovo, don’t have recent data on the administrative border lines (boundaries), so we had to adjust the population data to the older map data. As that data is also incorrect on any other sources we know, it’s simply nothing we can do about it, until better data is being made available. We do review those cases at least once a year (or in case we “hear” something).
In some areas, the data available is more granular, sometimes it is less accurate. We always use the most granular data on the calculations and where needed fall back to the lesser detailed level on the map. In other cases, maps differ politically, that is currently the case in Cyprus, the Kosovo and Crimea. We have not addressed Cyprus yet, for Kosovo Serbia does not publish population statistics any more, but Kosovo made some changes to municipalities and for those no map boundaries are offered. Crimea so far is identical between Russia and Ukraine, but the population numbers from Ukraine and Russia differ. We try to follow (English) Wikipedia in such cases, which i.e. recognizes Kosovo data.
For Turkey (which also impacts Cyprus), we just learned they only have a “NUTS-3” level on maps and population, which is something between state and district. Very large regions to our standards. But as there is nothing better, we have added this to the “to be added” list. Russia works with Raions (districts) in most cases, Oblasts (province/state) on some lesser populated areas. We currently work on that. Difficult as most data is offered in Cyrillic (Yulia is the only one able to work on those), maps in Latin, using very different transliterations, that don’t easily compute. So it’s a lot of work making sure we can non only import the data one time, but that we can also associate future updates properly.
Isochrones vs. Catchment Area
As of the extremely tedious task to combine local population data with drive time calculations, most offers for isochrones analyses in the past have lacked the quality necessary for sound analyses by aviation network planners. Our comparison case study we did by the end of 2016 to disqualify the claims of those airline analysts did not do that but unfortunately confirmed the issue. Addressing the often substantial off-sets, airports actually admitted that they used “educated guess”, as they couldn’t afford the usually very expensive analyses.
An airport actually confirmed to us a “competitive offer”, for a cross-border isochrone analysis, one-off, no updates at € 15,000.- … at the same time that we started to give access to the same isochrones “for free”. Charging for the extended analysis including detailed isochrone ring population and large scale maps, the competitive catchment area at € 1,200,-. As we do no longer run an individual research, but rely on our existing database and high automation. Such the analysis is neither beautified in favor of the one or other airports but data is strictly comparable on a European level.
Based on the isochrone data, we early identified the main shortcoming of the isochrones. As the isochrone data assumes that the population will naturally use the airport in question. But in average, any traveler has the choice between several airports, typically two or three in the central European region, but with the smaller regional airports more airports are relevant. As an example, in the German industrial center of “Rhein-Neckar” containing Mannheim, Heidelberg or Ludwigsburg (SAP, BASF, etc.), travelers can easily reach the small Mannheim airport, FKB (Karlsruhe/Baden-Baden), Stuttgart, Saarbrücken or head towards Frankfurt. They can also easily reach Nürnberg or Strasbourg in France.
Having compiled a database, containing all relevant drive times for any European commercial airport to all municipalities (as the smallest base for population data) or similar, identifying the drive times up to x minutes now turns out a simple database request. Having map layers for all the municipalities (or similar) in our database, the isochrones can be quickly visualized.
Whereas we emphasize the cutting edge “catchment area” analysis below, these isochrone calculations are also valuable for aviation network planners, for new airports, as well as in combination with the catchment area as an indicator of population which new flights may draw upon.
Having the isochrones, enabled us to address the need to calculate on a sound statistical level the impact of neighboring airports. So we worked for almost two years and with several of the world leading experts in the academic sector. We also worked with public data available on such research by MK Metric. Only to find the entire analyses exceeding the computing power for an automated solution. We also found academics to add further complexities that we later found of little statistical relevance. And ended up with an expert in North America who was able to reduce all those complexities with a very smart algorithm that today is the basis of our own calculations. Depending on the size of the airport (passengers) and the distance to the airport, we calculate for each municipality a factor how much the surrounding airports are likely to draw into the population. As such, we turned around the viewpoint. Not the airport is “center of the world”, but the traveler.
We add some (very few) exceptions, having found they only increase the computing time but have no real statistical relevance.
We are frequently asked to add special research results for custom client groups. Be it the extensive findings of German ADV, airline specific frequent flyer data, etc. Where we can do that, we did not yet, as we focus on global data, we can apply to the benefit of all users. If you want such specifics, available and beneficial only to a small group of airports, we consider this part of our commercial development. If you use our services and analyses, we sure can discuss such custom analyses and the underlying calculations to be added for you. We appreciate your understanding that it is a question of different priorities and needs and such comes with a (reasonable) price tag. ADV data i.e. is only available for the ADV airports. So neither for the German regional airports which are not covered by ADV, nor for the cross-border airports.
Our priorities for future development are two fold:
Currently, we do work to interface to “MIDT” information (see “Route Level Analysis” below), but have no intention whatsoever to replicate the available sources. We intend to focus on catchment area, to improve the understanding and analyses by taking into account MIDT(-like) information, but we have no intention to jump that shark pond with so many big fish in there after the +20 years of development advantage they have.
Beyond that immediate priority, our development plans are two-fold
- Regional: Our first airline customer has asked us to prioritize Russia, which we currently work on. Then we plan the addition of North America for coverage. And add Turkey and Iceland.
- Depth: We also plan to add further detail to the data we have, such as purchasing power, as well as data having impact on the route level, such as commercial relations (im-/export) or ethnicity.
Summary on Base Data
So we use the latest population data we get our hands on and frequently scan national statistics websites and Eurostat for updates.
We use the latest in drive times technology, we use the latest population statistics on country level available to us at the most granular level, in Europe typically the municipality in Russia/CIS the district (Raion). And we use our own boundary datasets based on a mix of OpenStreetMaps and national cadastre data. We use our own database of geographical data, also based on OpenStreetMaps, but enhanced by constant work especially on the geo used for the end-points of the drive calculations.
The value of the data is not in itself, but in all the little tweaks we developed over the course of five years initial R&D before we went life. And the development of our own little database feeding into the data from more than 50 different, usually not 100% compatible sources to date. Only that makes available the high quality of analyses we offer on a pan-European level. The other unique tool is our drive times algorithms building on the latest developments, taking into account i.e. detailed speed limits and national and regional peculiarities.
We have plans to add additional detail relating to the data such as purchasing power.
Route Level Analysis
Having such developed a unique database, we launched in March 2016 offering the latest in isochrones, the first one offering the analysis instantly and thanks to our high level of automation at a fraction of the usual price. In September we decided to even give away the isochrone maps in the lower resolution of our dashboard together with the total population away for free.
Now the next step and vital to our business plan is the proof of concept to analyze on a route level. Initially, we will use available Eurostat data, but it is intentionally meant to proof that we can take into account more complex data from the existing providers of MIDT(-like) data, passenger, revenue, ticketing data, etc. where available. Traditionally coming from sources like BSP/ARC and GDS, such are usually not reliably available for low cost airlines or come in very different quality and formats from them.
In order to decide the best approach, in summer 2016 we had a conference with our users and supporters and decided on the following initial approach.
- Identify the competitive airports. To do that we will take the location with the closest drive time and identify all airports that draw from that municipality. That will result in four types of airport pairs:
- From the first “origin” airport to the to-be-analyzed second “destination” airport.
- From the first airport to the airports in the second airport’s vicinity.
- From the second airport to the airports in the first airport’s vicinity.
- From the airports in the vicinity of the first airport to the airports in the vicinity of the second airport.
- Identify if travel data exists between both airports. Initially we focus on passengers, flights and seats offered (load factor), available from Eurostat.
- For the given airport pairs we calculate the percental impact of the existing routes to the airport catchment area. The larger the airport, the larger the catchment area, though the percental impact should be relatively the same. The same percentage of the population is likely to use the route on the to-be-analyzed airport pair.
- We then apply the average percentage to the to be analyzed airport’s population.
That does not yet take into account facts such as flight times, average ticket prices, frequency, reputation (of airport and airline) and such factors. A simple matter to get started. And we have been told by network planners that this is a tedious part of their work today and that they usually only apply spot checks and such an automated, full range process would be a substantial quality improvement.
Thanks to highly automated processes, CheckIn.com provides comparable, standardized and top quality analyses at a fraction of the prices offered by less specialized consultancies.
Further development is also subject to the Business Case! Our focus is on generic data. We understand that it is vital to interface our data results with the tools used by network planners and analysts. We also understand that users will want to apply other data to our analysis. But please keep in mind that we are not volunteers but experts with own families and financial needs. So we have our own development priorities for making our services more attractive to all customers. To implement interfaces to other data, to provide data through such interfaces or implement the data from such interfaces into our analysis has to have a business case.
As is, we save any given network planner days in own research. Our results are sound, our methodology open (not the detailed algorithms). Cross checks with large airline network planners as well as analysts at major airports found our data to be sound though.
As our research proved, the existing data base published by airports and such used by aviation network planners has been of a very inconsistent quality. Where some airports proved to have done their homework, others simply used guesstimates or tinkered their data intentionally “to look good”, way of any realism. CheckIn.com data in contrast is unbiased, methodological sound and adds a completely new level of understanding of the airport catchment area. As such, we understand airline network planners to be the natural users, as well as airports that want to provide their airlines with high quality analyses results of proven quality.
CheckIn.com has been the first (and so far remains the only) provider of instant catchment area analyses under methodologically sound consideration of the neighboring airports, using highly sophisticated algorithms. The compilation of a sound database allowing for that analysis in a top quality is an extremely tedious task. As such, airports and airlines often disqualified the business case for the respectively expensive analyses for their daily work. With a price tag of € 1,200 for a single airport or € 1,800 for the upcoming route level analysis, this is worth a day or two you pay a consultant. Given the financial risk represented by any new route we believe you now do have a business case.
Given revenue, we have plans to invest into data improvement, as existing statistical data to date misses the quality needed for fully automated use. I.e. we cannot simply “link” into Eurostat data or national or aviation statistics, as we have to review the changes. Any given year. And again. And adjust the map data. As not even airports provide the same numbers for a simple data like annual passengers. For the use case of catchment area analysis, we have the most sophisticated database, the highest data quality in the market. For immediate access, analyses and results.
As MIDT changed the understanding of the existing passenger flow in our industry in the 90s of the last century (now +20 years ago!), we did and do pioneering work to improve the understanding of the catchment area.