The importance of improving data quality at source

December 29, 2022 8 min read

The UK can blame its bad immigration data on Hungary, one of the eight countries which joined the European Union in 2004. Unlike most existing EU countries, the UK government allowed its citizens to move and work without restrictions, expecting 5,000 to 13,000 people to arrive each year. But this was a massive underestimate, causing accusations that immigration was out of control and arguably contributing to Britain’s exit from the EU.

Based on the results of the 2021 Census, the country which sent the most people to the UK was Poland, followed by Romania. But Hungary is the home of budget airline Wizz Air, which as part of keeping down costs tends to use smaller airports such as Luton, Birmingham and Sheffield Doncaster.

Also to keep down costs, the International Passenger Survey run by the Office for National Statistics (ONS) at the time focused on Heathrow, Gatwick and Manchester. As a result, it didn’t notice increasing numbers of eastern Europeans using budget flights run by Wizz Air and others.

Georgina Sturge, a statistician for the House of Commons Library research service, highlights the episode in her new book, Bad data, as an example of how data collection can go awry. The passenger survey had been set up in the 1960s, when far fewer people travelled internationally, more left the UK permanently than arrived, and most people required visas.

“People didn’t tend to travel in large droves from Poznań to Doncaster in the past,” says Sturge. “Unfortunately for the statisticians, who hadn’t even stationed anyone there to do the survey at the time, that was exactly what people started to do.”

Sturge says the UK has excellent official data in some areas, including health, traffic accident statistics and much of the ONS’s output. The Office for Statistics Regulation maintains a list of approved national statistics which she describes as the gold standard.

“But ultimately, if we’re asked a question or we need to produce some briefing material on something and there is any data out there which seems remotely reliable, we will pretty much end up using it,” she says of her work for MPs and their staff. “From our perspective, it’s about explaining the caveats.” This means thinking about where data comes from, how it is collected and for what purpose, considering the human processes involved rather than just the technical matter of getting hold of it.