Session Q&A
Does the metered energy data include both electricity and gas readings? And does the property data include information about the physical characteristics of the property, like type of building, size, etc?
Yes, the energy data includes both electricity and gas. The property data will have these characteristics and to some extent influence the composition of the household and the energy requirements/bill.
Does the “People” data include disability, benefits claimant status, etc?
Currently, disability and benefits are not in the people table, however that is something we can take away and explore. Participants are also able to adapt the synthetic data in their private workspace using their own sector expertise.
Is the data-set designed to assume that a single individual will be shown within each of the industry based scenarios? That is to say, is there at least one synthetic record that has all of consumer banking records, personal insurance, home energy, retail, property, conveyancing, a land registry record for their property, and their transport?
Each household will be ‘complete’ in the sense that they will have data in each of the sectors, although some people may only be present in some
Is the retail receipt data only for those with loyalty cards? Also, is it location specific so you can see where geographically the spend has occurred?
Only for those with loyalty cards, as we want to be able to link. Our simulation will be at a very local geography but by default will not be tied to specific stores. Of course, there is nothing to stop you as a data user appending those kinds of characteristics.
How will the meter energy data for gas and electricity be provided? Daily usage or every 30 min usage?
We are producing the data at the 30 min reading level. Both gas and electricity.
When it comes to loans are you getting the balance or are you getting additional information such as their rates?
For loans we will be showing lots of additional information such as the initial amount, amount remaining, the term and the interest rate.
Will industry providers who are part of the teams be able to upload their own synthetic data to the sandbox?
Yes, participants are able to bring in data they have the rights to use into the sandbox. In an application, it will be important to demonstrate how such data might work alongside the provided synthetic data.
How much time do finalists have to develop an MVP once they get access to the sandbox? Also, what would be the lead in time from announcement to access to sandbox
We will be notifying successful applicants at the end of April and onboarding them through the first few weeks of May. Prototyping will be from late May until late September
Can the workspace URL be accessed from outside the UK (as a team member is offshore)?
Yes the platform should be accessible from outside the UK although participants are normally asked to access the platform through work emails (so we can make sure everyone is in the same and correct team zone).
Will the JSON Schema be shared for smart data so it is validated?
At this early stage in exploring smart data in other sectors, there may be no or very few well defined schemas and specifications. The full details of the synthetic data will of course be shared with sandbox participants.
Why don’t you have consumer Transport Data?
Consumer Transport Data is a really interesting and rich area for us to create data for the challenge, with a potentially really wide scope when we consider public transport, private vehicles, smart cards, loyalty schemes and so on. With limited time, and other priority sectors with lots of demand, it’s not in the current scope. However if there proves to be sufficient demand, then this is something that could be followed up at a later time.
Is the retail basket giving access to level 3 data such as item product code, item description?
We will be providing the individual products purchased and quantities in the basket.
How many individual personas will be provided in this challenge?
5000 individuals and 1000 SMEs
The “people” table shown in the diagram is truncated. There are 7 fields there, but clearly to get realistic generated data, you’d need many more attributes to drive the generator. Can you give me an idea of how many attributes (and a few examples of the sorts of attributes in the extended set) a person agent has in your table?
“Under the hood” there are thousands of parameters that influence each person’s behaviours. We will expose some of the attributes like age, name etc which would be relevant and ‘discoverable’ in the real world.
Are the individual consumers based on pre-defined personas or are the personas made explicitly for the prize?
The attributes of the people are generated to start with, then households are created, and then influence behaviours from there. For example affluence, small/large families, and other characteristics all influence behaviours.
Can we import datasets relating to personal transport/driving, as transport data seem to be related to business users only?
Yes you can import data you have the rights to use into the sandbox. In the discovery phase, business transport use cases came up, but personal travel not so much. This shaped decisions around data to prioritise.
Can we access/utilise the data-set outside of the Naya One sandbox? i.e. export from sandbox and ingest to our tooling? And if not, can we connect to the sandbox via API to achieve the above?
You will be able to bring APIs into the sandbox, however there are some restrictions around the volume you can export in order to protect the IP of the datasets that are generated by Smart Data Foundry. Remember we want to reach a strong demonstrable prototype at the end of the prize; Basic API access will be provided to the synthetic data, to emulate open banking style access to data in other sectors; and, while you can access all the synthetic data in the sandbox, the amount of records / rows you can extract will be limited to a degree.
Can we use version control tools like git to help manage software projects? Likewise if we are altering data, can we version control the data sets to manage them as well?
The notebook isn’t stateless, so work and data changes will be saved. In terms of repo access etc we can work with you on that and how to bring things in via API and other workspaces.