Developing and testing Smart Data use cases with synthetic data

Our aim is to launch a Smart Data challenge prize in 2024, contingent on the Smart Data Discovery Challenge demonstrating:

  • A range of promising Smart Data-powered use case ideas that can improve services for consumers and small businesses across a range of sectors; and
  • That there is an active innovator base ready to seize these opportunities.

The Smart Data challenge prize would be a competition that offers incentives to whoever can most effectively develop new solutions for how Smart Data could be used across sectors. 

In addition to financial incentives, as part of this Prize we would offer Challenge participants access to relevant synthetic data sets via a specially developed data sandbox to enable them to prototype and test their cross-sector Smart Data use cases.

Innovators that take part in the Smart Data Discovery Challenge will join a community of cutting-edge and creative thinkers to help us to shape the direction of a potential Smart Data Prize in 2024. We want creative people from across industry, academia and civil society to join the conversation about opening up services using Smart Data to make people’s lives better, and enrich our national data infrastructure.

Why synthetic data?

Consumer Smart Data is personal, highly sensitive and subject to Data Protection laws. Organisations holding such data on behalf of customers are obliged to protect it and generally have little appetite or incentive to share this data, citing both GDPR and commercial privacy concerns. This can create a significant barrier to innovation, preventing the sharing of data to uncover new insights and unlock value that can benefit individuals, organisations and the wider economy and society.

Synthetic data is an approach used widely to test innovative new ideas without the risk of breaching privacy laws and potentially harming individuals.

Smart Data Foundry, one of the partners in the Smart Data Discovery Challenge, generates synthetic data artificially. This synthetic data is based on simulations of millions of real-world events, but does not contain any real personal information and requires no real data to be used in the synthesis process, thereby removing all GDPR concerns. Recent deployments include supporting industry collaboration events (‘tech sprints’) hosted by the Financial Conduct Authority (FCA), Payment Systems Regulator (PSR), Fintech Scotland as well as individual banks and fintech organisations.

Smart Data Foundry’s data products extend from existing Smart Data schemes like Open Banking, into personal identity, wider financial services (Open Finance) and telecommunications. By combining sector insight, publicly available data (from sources like ONS, Census, UK Finance) and statistical analyses of real datasets, Smart Data Foundry can create rich, realistic yet entirely synthetic data for networks of synthetic individuals and companies – for example: monthly energy and water bills; phone call and text message metadata; bank transactions and payments; company and director details, and so on.

Crucially for the potential Smart Data challenge prize – which would focus on cross-sector Smart Data use cases – this approach makes it possible to provide data that spans industries and sectors while maintaining a link across these datasets to individual consumers and small businesses. Unique identifiers can be assigned allowing us to simulate the capabilities of a Digital ID scheme without the complexity of implementing such a scheme. This means participants in a potential challenge prize would be able to develop and test Smart Data use cases that are cross-sectoral.

During the Smart Data Discovery Challenge, we will work with participants to identify additional datasets that need to be created, or changes to existing datasets to support promising use case ideas that have been proposed in the open call.

Then, to ensure that the synthetic data being created is representative of the real-world data it seeks to mimic, we will engage relevant sector experts, including data holders and other relevant stakeholders across industries emerging as candidates for Smart Data schemes and challenge prize use cases. These will provide industry expertise and insight to support the generation of synthetic data for the potential challenge prize.