PP Remote Testing Wiki | Issues / BestPractices

Identifying best practices

A long-term goal of the task force is to identify best practices or guidelines for remote testing. For the most part, however, the necessary evidence base (in terms of what works well and what does not) does not yet exist. As such evidence becomes available (see Examples), we expect to add to the list of "best practices" which can be identified from investigators' experiences. Here, we consider the motivations and challenges to identifying best practices in the first place.

Why should we attempt to identify best practices?

Best practices can form the basis of formal or informal guidelines for research practice with remote testing. The many tradeoffs evident in remote testing demonstrate the potential for fundamental weaknesses if the remote-testing approach is not designed appropriately for the study. Identifying best practices can help investigators select the features which will best ensure rigor and reproducibility of the research.

What issues stand in the way of establishing a single set of best approaches?

Outside of specific examples where remote methodology has been a key focus (see Resources), very few remote-testing studies have yet been completed. There are many research questions which ought to be addressable via remote testing, but the unavailability of results means that critical challenges and confounds remain unidentified. This barrier is likely to be overcome as investigators' complete studies and gain experience with the relevant approaches.
More fundamentally, because experimental questions differ widely in the level of control required, no single approach is likely to be optimal for all studies. The specific hardware, software, and procedures used in any remote-testing study will impact the degree of experimental control and the information that can be collected from a test session. For example, accurate calibration is critical when evaluating detection in quiet, and less so when evaluating memory for a melodic tone sequence. Investigators are encouraged to carefully consider the methodological strengths and weaknesses as they pertain to the specific goals of their own research. The best approach depends on the phenomena being evaluated.

Candidate Best Practices that can be identified at this time:

Align strengths to research goals: Prior to conducting a remote-testing study, enumerate the specific tradeoffs associated with each identified approach. Be certain to align the strengths to the goals of the specific research question. Familiarity with the questions raised on this Wiki (see Issues) and with feature comparisons across remote-testing Platforms could help.

Measure and document calibration: Incorporate the most accurate form of stimulus calibration that is achievable within the selected approach. In some cases (e.g. browser-based testing with participants' own computer and headphones) this may be very limited, but even a simple psychophysical validation using tone detection or binaural comparison could provide important verification of the stimulus setup, such as whether earphones were worn correctly or if stimulus levels were appropriate for the test setting. More elaborate procedures involving acoustical measurement before, after, or during the tests might alleviate many performance concerns about testing outside of a controlled sound booth.

Validation: If possible, include a replication or validation condition which matches, as closely as possible, an approach for which standard in-lab data exist or may easily be obtained. Close replication across in-lab and remote-testing procedures is one of the strongest approaches available to ensure the reliability and validity of new data. See Data Analysis. Unexpected results could indicate an unacceptable deviation from ideal conditions, and could help to identify previously unanticipated limitations of the selected approach.

Inclusion of independent measures and predetermined criteria for outlier removal: Incorporating additional measures, such as cognitive screens, attention checks, and catch trials into the study procedures can provide important independent data for identifying non-compliant or poorly performing participants who contribute excessively to random error and thus should be removed from data analysis to preserve statistical power (see, e.g., McPherson & McDermott 2020). A set of independent, predetermined criteria for data removal is required to avoid introducing experimental bias that could result from identifying "outliers" based on the study data themselves. Alternatively, screening measures can provide covariate measures that aid the interpretation of study data when all participants are retained in the final analysis. See Data Analysis.