Kinds of data
Remote testing is associated with various kinds of data that need to be exchanged between participants and researchers. These typically include
- Participant information, including personally identifiable information (PII) and protected health information (PHI) that may be subject to regulatory compliance constraints
- Stimuli (e.g., audio, image and video data) and experiment parameters. Although often fixed across participants, these may be individualized, for instance when participants are randomly assigned to different "conditions", or if the measurements are adaptive.
- Response data, which will likely be the bulk of the data that is of interest to the experimenters. Access to single-trial response data may be needed during testing for progress monitoring and verification of task compliance, to provide feedback to participants, and/or to calculate summary performance metrics to make decisions about task flow. The full set of final responses will then need to be assembled for detailed analyses. Long-term archival and sharing with collaborators or the broader research community are also considerations that may apply.
In some cases PII may incidentally be linked to the response data, thus requiring special considerations. For instance, when verbal responses are recorded, or if there is live video interaction between the participant and experimenter, raw audio/video data will contain identifying information.
Server-side versus client-side data handling
One advantage of handling data primarily on the client side (i.e., on the participant device) is that internet access is not necessary except for the initial download of the app and task material, and the upload of data at the end. Furthermore, when computations are done on the client side with pre-loaded stimuli, better timing control can be achieved compared to loading stimuli from the server on a trial-by-trial basis. Another advantage of client-side data handling is that some privacy/security issues may be circumnavigated, as described in the next section. On the flip side, server-side handling of data typically allows for greater standardization, near real-time logging of progress and aggregation of data, and perhaps most importantly, a simpler experience for the participants because their involvement beyond completing the task itself is minimal (e.g., no need for participant involvement in installing the app or uploading the data).
Privacy and Security
Another layer of security may be achieved by encrypting all data stored (i.e., encryption at rest). All major databases (e.g., PostgreSQL, MySQL, SQLite) and cloud computing service providers (e.g., AWS, GCP, Azure) provide multiple options for encryption at rest. However, it may be desirable to have public "clear-text" copies of the de-identified research data in the interest of open science. When sharing data to public repositories, it is good practice to use different anonymous participant IDs than used during data collection. Finally, it is important that all communication between participant devices and servers are encrypted. This is especially the case for browser-based communications with form fields where the participant can type in information. This can be achieved using SSL/TLS. Keys for TLS/SSL may be obtained from certificate authorities. A popular free option for TLS/SSL certification is Let’s Encrypt
Remote testing platforms vary in their support for automatic data backup. If setting up a custom platform, major databases (e.g., PostgreSQL, MySQL, SQLite) also come with support for manual backup snapshots that may be executed by scripts that are scheduled to run at specific times (e.g., cron jobs on Linux). Multiple clones of the database may be used to reduce server downtime in the event of a database crash. Otherwise, all data backup considerations as with in-person studies apply.