1. Data Ingestion
Data that is intended to be loaded into the BioGrid system will generally be loaded onto a Local Research Repository (LRR). This is a computer that resides at your site, or is shared with other BioGrid sites.
A copy of your database is created on the LRR by the BioGrid team in a standardised format, and a regular automatic process is set up to copy data from your database to the LRR. This data will contain patient identifying details as well as clinical information; the LRR is still within your site’s firewall and under the control of your site’s security arrangements.
BioGrid must assign a Unique Subject Identifier (or USI) to each patient. This will either be an existing USI if your patient already exists elsewhere in the BioGrid files (possibly at another site), or a new USI if the patient does not yet have any information in BioGrid.
To make this determination for each patient, the normal procedure is that a programmatic process will automatically copy just the identifying information for each patient to the central BioGrid demographic repository. Here it will attempt to match each patient’s identifying details to those of other patients already in BioGrid. Approval for this copying of the identifying details for your patients, and its use in the matching process, will have been obtained as part of the ethics application at your site when you joined BioGrid.
This process is known as probabilistic matching, and will result in a either a new USI or an existing USI number being written back in encrypted form alongside your data on your site’s LRR. It is this number that is used to link patient data between databases at different sites – a researcher will only see the USI but no identifying details for a patient.
There is a second method of matching patients that can be used in cases where a site’s ethics requirements do not allow the release of identifying details as described above. This is known as hashing, and involves the use of a non-reversible algorithm to calculate a unique hash value for your patients and patients already in BioGrid. The hashes are then compared to determine if this is a new or an existing patient, and a new or existing USI is again assigned.
In addition to the data itself, BioGrid will work with you to create a data dictionary for all items in your database. This describes what kind of data is in the data set and who the owner of the data is. This metadata information is freely available on the BioGrid web site.
Your data will now be stored in your local LRR with a Unique Subject Identifier which allows it to be linked to data from other sites. Researchers who wish to use the data can browse the BioGrid data dictionary and determine which data set they would like to access.
To be given access to your data, they must first seek your permission and also the permission of the BioGrid Management Committee. If this is obtained, they are given access to patient data from different sites that is linked by the USI; they do not see any patient identifiers. Access is provided directly to the data that is stored on the LRR at your site. The BioGrid model is a 'federated database' rather than a data repository; no clinical data is stored centrally in BioGrid.
2. Acquire a node
As mentioned above, adding your data will normally require a Local Research Repository (LRR) to be available at your site. If your data is being added to an existing BioGrid node, this LRR will already be present and can be used for your data as well as any existing data.
However if your site is new to BioGrid, an LRR will have to be installed. This process will normally be handled by your Information Technology (IT) department. BioGrid will provide specifications for a standard server which will act as the LRR, as well as information about software and communications.
Minimal specifications for a single-site LRR are:
- Processor 3.0GHz/ 4 MB L3 Cache Xeon MP processor
- Memory 4gb
- Hard Disk 3 * 300gb SCSI – Raid-5 configuration
- Operating System Microsoft W2003 Server Standard Edition
- Database MS SQL Server Standard Edition, per processor licensing
- Other DVD-ROM, redundant power supply, 3 year warranty
3. Piggyback node
In some cases, it may be possible for a new site to 'piggyback' on an existing LRR. This is especially the case for smaller sites, or for sites where there is an existing arrangement for the sharing of IT infrastructure. If this is the case, BioGrid will explore the options with your IT department, and determine if the existing LRR needs to be upgraded to handle the additional workload.
Note that your full data set, including both identifying data and clinical data, must still be copied to the LRR. If the LRR is at another site, the ethics application at your site must clearly acknowledge and approve this process.
4. IT connection
In order to link an LRR to BioGrid, a Virtual Private Network (VPN) must be established between your site and BioGrid. Your IT department will handle this.
In addition, researchers wishing to access BioGrid must keep the following in mind:
Internet Explorer 6 or later and ActiveX must be installed on all PCs which will be accessing BioGrid.
The site must allow access to Port 3389 for IP address 126.96.36.199 through its firewall (your IT department will know what this means)
5. Research Tools
BioGrid has an unlimited licence for use of the SAS statistical analysis system for all sites which are members of BioGrid, and SAS Enterprise Guide is the standard tool used to access the data. BioGrid provides regular free courses in the use of SAS to member sites.
For further information contact:
BioGrid Australia Technical Team
Phone: + 61 3 9342 2690