Why synthetic data?
Consumer Smart Data is personal, highly sensitive and subject to Data Protection laws. Organisations holding such data on behalf of customers are obliged to protect it and generally have little appetite or incentive to share this data, citing both GDPR and commercial privacy concerns. This can create a significant barrier to innovation, preventing the sharing of data to uncover new insights and unlock value that can benefit individuals, organisations and the wider economy and society.
Synthetic data is an approach used widely to test innovative new ideas without the risk of breaching privacy laws and potentially harming individuals.
Smart Data Foundry, one of the partners in the Smart Data Discovery Challenge, generates synthetic data artificially. This synthetic data is based on simulations of millions of real-world events, but does not contain any real personal information and requires no real data to be used in the synthesis process, thereby removing all GDPR concerns. Recent deployments include supporting industry collaboration events (‘tech sprints’) hosted by the Financial Conduct Authority (FCA), Payment Systems Regulator (PSR), Fintech Scotland as well as individual banks and fintech organisations.
Smart Data Foundry’s data products extend from existing Smart Data schemes like Open Banking, into personal identity, wider financial services (Open Finance) and telecommunications. By combining sector insight, publicly available data (from sources like ONS, Census, UK Finance) and statistical analyses of real datasets, Smart Data Foundry can create rich, realistic yet entirely synthetic data for networks of synthetic individuals and companies – for example: monthly energy and water bills; phone call and text message metadata; bank transactions and payments; company and director details, and so on.
Crucially for the potential Smart Data challenge prize – which would focus on cross-sector Smart Data use cases – this approach makes it possible to provide data that spans industries and sectors while maintaining a link across these datasets to individual consumers and small businesses. Unique identifiers can be assigned allowing us to simulate the capabilities of a Digital ID scheme without the complexity of implementing such a scheme. This means participants in a potential challenge prize would be able to develop and test Smart Data use cases that are cross-sectoral.
During the Smart Data Discovery Challenge, we will work with participants to identify additional datasets that need to be created, or changes to existing datasets to support promising use case ideas that have been proposed in the open call.
Then, to ensure that the synthetic data being created is representative of the real-world data it seeks to mimic, we will engage relevant sector experts, including data holders and other relevant stakeholders across industries emerging as candidates for Smart Data schemes and challenge prize use cases. These will provide industry expertise and insight to support the generation of synthetic data for the potential challenge prize.