Government departments create reams of data. Linked with other data sources it can provide powerful insights for individuals and communities. Steve Peters from the Department for Communities and Local Government (DCLG) discusses how they are developing smarter ways to link information together online.
Data is everywhere, underpinning almost all aspects of our everyday lives. We use it for major decisions such as choosing a new home, or seeking care for elderly family members. We use it for more mundane activities too – checking for the next bus, or getting the best price on that “must have” new DVD. We are already using data from multiple sources and organisations: local authorities, governments, charities and business, for example, when we search for the best local school, GP surgery or Chinese takeaway.
The rise of Open Data – free and available for re-use for anyone – will generate new potential for linking together all these related data sources. This data linking will be the fuel to power radical new opportunities inside government and across the digital economy. This is the quiet hard slog of data work, and DCLG is at the forefront of efforts to stitch together data in government.
Data fuels our work at the DCLG. We use it for key government priorities including housing and public service reform and it is widely used by local authorities, charities and businesses. One example is Shelter’s online databank, which uses our data on housing tenure to tackle homelessness. Companies use our housing, planning and land-use statistics to support retail planning, or better understand the changing demography of their target markets.
The trouble with data
However, much of that data is still locked-away in separate silos, created for a time when a dataset served a single human need. We tend to release our data in separate files or documents, the result being hundreds or thousands of spreadsheets on various websites, even if they are all brought together through the government’s data portal (www.data.gov.uk). Publishing in this disconnected, piecemeal way can create significant overheads. Spreadsheets are hard to find and difficult to use when the user only requires certain columns from a particular file. The result is a user community that spends significant time and effort copying, pasting and reformatting data before they can use it.
These problems are exacerbated when users combine DCLG data with other third-party sources. For instance, imagine you are a homelessness advisor wanting to link together separate spreadsheets from DCLG, Ministry of Justice and Local Authority data on, say, homelessness and housing in Cornwall. How has each data publisher defined Cornwall in its own spreadsheet? Have they used names or codes, and are these consistent across all sources? We invariably find a lack of commonality or standards here, which again translates into unnecessary time and effort to prepare data for re-use.
What is Linked Data?
Have you ever talked to someone for a few minutes before realising you were using a common word or phrase but meaning different things? Humans are pretty adept at understanding the context of language, but computers are essentially dumb when interpreting context and meaning. The solution, Linked Data, is a set of standards and techniques for reliably and consistently linking together related data sources over the web, meaning computers can ‘understand’ that different datasets are referring to the same thing. Organisations like the BBC have meticulously catalogued and linked their data so that when you search their news site - for Manchester United for example – it automatically connects up and offers you all the BBC content on that person, place or organisation. Web pioneers such as Tim Berners-Lee have led the campaign for similar common identifiers across the open web, so that if you referred to Tower Bridge in your blog it will reference a commonly understood definition and URI, and open up the potential to access a myriad of other data connected with this same entity. The linkability of data is captured in the 5 star ratings ascribed to data on data.gov.uk and other international open data portals.
The prize: joined-up data, for joined-up problem solving
Our vision is that anyone should be able to discover and link together related sources over the web. For example, DCLG wants to develop smarter ways of joining-up disconnected data on housing, schools, parks, and retail facilities - empowering people to make more informed choices about where they want to live. We are doing this by publishing our data as Linked Data. These sources could be open data, linked over the public web, or could equally be private information shared in a more secure and protected environment.
Our front-end solution to this is OpenDataCommunities, an innovative online portal. It is a publishing platform for releasing DCLG datasets that can be quickly discovered and linked to other data. It comprises 154 Linked Data sources drawn from DCLG’s statistical and geographic data portfolio. These have been selected by working closely with data owners and end-users, ensuring that datasets can be published in new, open and linkable forms, and will be actively used. Alongside the data repository are low-cost, innovative visualisation tools to help people understand how they might use our data and create new tools and insights from it. Two examples are visualisation of new experimental statistics on the energy performance of buildings, and statistics on loans completed under the Help-to-Buy scheme.
Making a difference
OpenDataCommunities has already had an impact. Our Linked Data has been used alongside third party sources to better plan and target local services. One example is Lambeth-in-Numbers, a project by Lambeth Council to solve the problem of families being unable to afford good food, and so suffering poor diet and health. The council and residents used the site to co-design solutions based on evidence gained by combining deprivation data, health data (e.g. child obesity levels), council data (e.g. location of different types of food outlet, level of free school meals) and resident data (e.g. on the size and location of food banks). The data now underpins the work of the new Lambeth Food Partnership which brings together residents, community organisations, businesses, local government and the NHS. It is supported by Lambeth Borough Council as part of their co-operative approach to working with local people. Its priorities for action are school meals service in the borough, food poverty and access, and the local food economy.
Securing the prize: challenges and lessons learnt
However, we have faced challenges along the way. Firstly, creating Linked Data is challenging. Data owners need to look afresh at how their data is organised and structured, to determine how and where it can be reliably and responsibly linked to external sources. So we are developing new technical skills in-house.
Secondly, publishers generally lack the tools, capacity and expertise to help them generate and maintain Linked Data from multiple databases. That’s why we have built tools that can “read” and convert these sources with minimal human intervention, and then publish them directly to OpenDataCommunities.
Thirdly, we have found that consuming and re-using Linked Data is not straightforward. Again, this is partly due to a lack of established tools and standards. Our solution here is to incorporate tools in OpenDataCommunities for non-technical users to quickly blend and retrieve data from multiple datasets. One example is our new Geography Selector, which helps users extract data for particular places of interest.
Finally, there is a need to improve awareness about the benefits and opportunities of open, linked data. This is not well understood amongst senior decision-makers, data publishers, end-users, and software intermediaries. We are tackling this issue by building up a portfolio of evidence about the benefits of Linked Data over more conventional publishing approaches such as spreadsheets.
In conclusion
Linked Data has the power to deliver a web of information, deeply interconnected and rich in context. It heralds a not-too-distant future where data is not just clumps of zeros and ones parked somewhere in a spreadsheet, but a sea of meaningful connections ready to be used - like the experience you already have when searching Google, but much more powerful. It’s similar to the game of Six Degrees of Separation where connections open up a cascade of definitions and relationships, revolutionising how we interact with our data. Applying its principles to DCLG’s data has simplified yet powerfully expanded what we and others can do.
But our data work at DCLG is only part of a much wider picture across government. Departments and agencies, and civil servants of all professions, are working together to create new services for the public using Linked Data. You can now check the price of your home, look at information about companies or check the quality of the water at your local beach – all using Linked Data. Data – whether Linked Data, Open Data or Data Science - is increasingly becoming a fundamental asset on which services can be built both inside and outside government. The American technology evangelist Tim O’Reilly argues that government should eventually become a platform which instead of just providing services, actually enables and empowers citizens to innovate for themselves. We believe OpenDataCommunities is one of the first steps on the path to that future.
2 comments
Comment by TB posted on
Considering the comment that "Finally, there is a need to improve awareness about the benefits and opportunities of open, linked data. This is not well understood amongst senior decision-makers ..."
A strong and graphic example to show to senior decision makers of the benefits of using linked data is the recent work to reduce ivory poaching and elephant kills in Africa, linking to data on cargo trading routes, local militias, etc - Out of Africa: Mapping the Global Trade in Illicit Elephant Ivory - at https://www.palantir.com/2014/09/following-the-ivory/
Comment by IC posted on
In a Civil Service that is increasingly attempting to move toward 'joined up government', it is also critical that there are clearly defined differences between 'linked data' and 'data linkage'. There is a very real possibility that the benefits of linked data as a mechanism for joining disparate datasets to add value becomes wrapped up in a wider, more generic approach to use common identifiers as a linking tool.