The Centre for Victorian Data Linkage (CVDL) has developed a secure online platform for release and analysis of linked, de-identified data for approved projects, called the Victorian data Access Linkage Trust (VALT).
What is it and what does it do?
VALT is a Microsoft Azure-based system which utilises virtual machines (VMs) to release data to users. It provides secure access to relevant linked deidentified data for a specific project. It includes standard statistical analysis software packages (with the exception of SAS), including R, Python and SQL. Stata can be included upon request; however, the user will need to have their own Stata licence.
Why is it being used?
The use of data, even when deidentified, carries risks around potential misuse and reidentification of participants. Releasing data into a secure environment where there are strict controls placed on the way data can be handled helps to minimise the risk of reidentification of participants, or data being removed and shared inappropriately.
The use of secure environments for data analysis is now becoming standard practice around Australia and internationally. Implementation of VALT and its use by data users will facilitate a greater level of compliance with privacy legislation and information management principles than has ever been possible.
Maintaining the confidence of stakeholders and the public that their data is being stored and managed as securely as possible will help to continue realising the enormous potential these linked data assets provide.
What is a Virtual Machine (VM)?
A virtual machine works just like any other physical computer. It has a CPU, memory and disks to store your files. The main difference is that while a laptop or desktop are physical machines, a virtual machine is exactly that – a virtual computer that doesn’t physically exist but uses a series of programs to act like a physical computer. Most virtual machines operate in the cloud, meaning they can be accessed from anywhere at any time.
How secure is the platform?
The Microsoft Azure platform that the CVDL uses is very secure. The Azure cloud environment has been certified up to “Protected” by the Information Security Registered Assessors Program (IRAP) of the Commonwealth. The IRAP is an Australian Signals Directorate (ASD) initiative to provide high-quality information and communications technology (ICT) services to government in support of Australia's security.
Is the data backed up?
Yes, all data on the VMs is backed up within a Microsoft Azure data centre located in Australia.
How will I access the linked data?
Researchers with an approved linkage project need to apply for access to a VM by completing the Virtual Machine Application form and submitting the request to [email protected].
Separate virtual machines will be set up for each approved project. Registered users will be provided with login credentials to access their project data through their project virtual machine.
The linked data released to the virtual machine may be at unit record level once approved by data custodians. Unless negotiated with the CVDL and agreed by data custodians, CVDL’s standard deidentification and confidentialisation processes will apply to unit record data. These processes are described in the CVDL application form.
Standard software on the VM includes R, Python, and SQL, as well as the Microsoft Office suite of products. Researchers may request to install preferred software such as Stata on a virtual machine on a BYO licence arrangement.
Note that SAS software is not available due to licencing limitations.
Can we have multiple simultaneous users on VM?
Multiple users can be registered, however there can only be two concurrent users on each virtual machine.
Can data be removed/exported from the VM?
Any raw linked data must remain on the VM.
Outputs from the VM will need to be approved by the CVDL to ensure a sufficient level of aggregation to meet privacy and confidentiality requirements. In general, this will involve review and release of analysis, modelling, graphs and tables. As a guide, CVDL requests that cells with counts below the threshold of 5 should be avoided by either combining categories or suppressing cells.
Approval may be provided for removal of unit record outputs from the virtual machine as an exception. Specific confidentialisation and release requirements should be raised with the CVDL as early as possible as part of the application process to enable discussion with data custodians.
Does a VM have internet access?
To protect security, there is limited internet access from VMs.
Is there a cost for using VMs?
Yes. CVDL is charged by Microsoft for cloud computing processing power that is used by the secure environment. Cost recovery will therefore be applied to each project for the supply of the VM environment. The costs are calculated according to the power of VMs and the length of time they are used. Switching VMs off when not in use will save costs.
A complete listing of available VMs and charges is documented in “The Centre for Victorian Data Linkage Virtual Machine access model”. A VALT usage charge calculator is also available.
At the end of the project, the Technical Team should be notified that the VM is no longer required. The easiest way of doing this is to email [email protected]. CVDL will send an invoice reflecting the total cost of VM usage.
Researchers applying for project funding should consider the estimated cost for accessing the linked data in their grant applications.
Can I change the size/power of the VM I’m using or when it is switched on/off?
Yes. The easiest way of doing this is to email the CVDL Technical Team at [email protected]. You can request a more or less powerful machine depending on your requirements. You can also change the hours the machine is available e.g. switching from 24 hour access to business hours.
Can I access my data at a later date?
The data will remain within the Azure data centre and placed in long term storage (fees may apply depending on the amount of time data is held) and can be discussed with researchers. Syntax and programming code will be filed on specific CVDL VMs, so that research could be easily replicated in the future.
The CVDL will retain the research extract and researchers can back up their analysis, meaning data can be easily reproduced if required.