Guidance for Mitigating Fraud and Safeguarding Data Integrity in Online Research
Online research carried out by fraudulent participants or bots undermines research validity and imposes substantial resource demands on researchers. The purpose of this guidance document is to outline practical strategies available to researchers to mitigate fraud and safeguard data integrity in online research involving human participants. This document is not meant to provide directives. Researchers are responsible for evaluating the risks and benefits of the various strategies when designing a solution for their unique research context. Researchers are also responsible for adhering to the relevant policies and guidelines for research involving humans (e.g., Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans (TCPS2 2022)).
Background & Definitions
Online research has become ubiquitous and offers many benefits. Conducting research online helps to overcome logistical barriers associated with in-person research and facilitates recruitment, including reaching participants from underrepresented or hard-to-reach populations. As a result, online methods such as web-based surveys, interviews, and focus groups have enabled both larger-scale and more targeted research. However, with the increased prevalence of online research, instances of fraud have also risen.
In the context of online research, fraud is defined as deceptive practices that interfere with study recruitment or data collection, often for personal gain. Typically, fraud occurs to gain access to incentives (monetary or in kind) that are being offered to research participants. There is also a growing concern of potential interference by malicious actors in research that is political or sensitive in nature.
Various types of fraudulent actors exist, including: fraudulent participants, who are not eligible for the study and who knowingly falsify information or impersonate the target population of a study to participate; participants who intentionally or unintentionally submit multiple responses to online data collection forms; and more recently, ‘bots’ that involve computer software programmed to automatically complete online data collection forms on a large scale. Currently, generative artificial intelligence (AI) (e.g., ChatGPT or other chatbots) appears to be enabling fraud in online research by generating human-like responses that make bots more difficult to detect.
Online research fraud poses a substantial threat to data integrity and the validity of research findings and has many negative implications for researchers. In recent web-based surveys in the social sciences and agricultural economics fields, approximately 60%-96% of responses were deemed fraudulent (Bonett et al. 2024, Goodrich et al. 2023, Griffin et al. 2022). Failing to prevent or address fraud by removing these responses from study samples can result in inaccurate or invalid research findings and study conclusions. Furthermore, beyond the financial burden of providing compensation for potentially thousands of fraudulent submissions, significant resources including time and money may be required to review and clean the collected data. Therefore, researchers need robust and ethically-sound strategies to mitigate fraud and safeguard data integrity in online research.
Definitions
Online research fraud: Deceptive practices that interfere with study recruitment or data collection, often for personal gain.
Fraudulent actors: Humans or computer programs (e.g., bots) that commit online research fraud by falsifying or duplicating data.
Bots: Short for ‘robots’, bots are automated computer programs designed and/or administered by humans to complete online tasks. In the research context, bots complete surveys and provide email addresses to receive study incentives. They can be programmed to do so on a large scale, thus receiving the incentive many times.
Incentive: Anything offered to participants, monetary or otherwise, to encourage participation in research (e.g., compensation, honorarium).
Strategies for Mitigating Fraud in Online Research
The following section presents strategies that researchers may consider implementing to mitigate fraud in their online studies. The strategies have been organized into three main research stages: study design (including incentive and recruitment strategies), data collection tool design (including technology- and content-based strategies), and data cleaning. This structure is meant to enhance clarity, however, there may be overlap between these stages. The order of the strategies within each section does not represent a hierarchy of preference or effectiveness. Furthermore, the strategies are not separated by study design (e.g., web-based surveys, mixed-methods, interviews, focus groups), as there may be considerable overlap depending on the research stage and specific study design features. Researchers need to evaluate the applicability and appropriateness of the strategies for their unique research context.
Over time, fraudulent actors are expected to become increasingly sophisticated as the online research landscape evolves. While this guidance is intended to cover a breadth of strategies currently available to researchers, it is not comprehensive and requires careful consideration prior to implementation. Some overarching guidance for selecting strategies includes:
- Implementing a single strategy is very likely insufficient for preventing or mitigating the negative effects of fraudulent actors. Researchers should determine which combination(s) of strategies are appropriate and feasible for their unique research context.
- Robust and detailed fraud-mitigating procedures should be developed a priori and should be documented in research protocols and consent forms, as appropriate. The procedures should also be tracked diligently during the study to enable the process to be replicable and reported in any publications or presentations.
- Fraud-mitigating strategies and procedures must comply with all relevant research ethics policies.
- Researchers should inform their research team about potential sources of fraud and the strategies used to increase vigilance and timely responses to fraud.
- Researchers should test the fraud-mitigating strategies before collecting data.
Study Design
Incentive and recruitment strategies for mitigating fraud in the study design stage
Study design: Incentive strategies
Do not automate incentive payments
Advantages
- Delaying incentive payments provides researchers with the opportunity to clean the study data and determine which responses are legitimate prior to administering the compensation.
Disadvantages
- Delaying payments and/or administering payment manually may require additional time and resources compared to automated processes.
Additional considerations
- Automating payments will almost always result in uncontrolled payments to bots or fraudulent participants. It is therefore strongly recommended that incentive payments are not automated.
- The incentive process should be described and justified in research ethics board applications.
- Information about incentive administration should be provided during the informed consent process.
Create separate data collection tools for research data and incentives data
Advantages
- Unlinking the research data collection from the incentives data collection may reduce the effect of fraudulent responses on data integrity since bots or fraudulent participants may bypass the research study portion to target only the incentive.
Disadvantages
- The incentive may still be targeted as evidenced by some researchers reporting that they received a greater number of responses to their incentive questionnaire than their survey questionnaire.
Additional considerations
- Separating the research data collection from the incentives data collection is a strongly recommended practice for online research.
Limit how the incentive is advertised
Advantages
- Bots may be programmed to complete studies based on keywords such as incentive, reward, draw, money, etc. Therefore, removing incentive details from public-facing recruitment materials can help prevent studies from being targeted by bots or other fraudulent actors.
Disadvantages
- Not advertising the incentive does not completely prevent studies from being targeted by fraudulent actors.
Additional considerations
- Information about incentives should be provided during the informed consent process.
Consider whether to offer a draw-based incentive or guaranteed compensation
Advantages
- Using a draw or lottery-based incentive may deter bots or fraudulent participants since the likelihood of receiving compensation is decreased.
Disadvantages
- Using a draw or lottery-based incentive does not eliminate fraudulent responses completely.
Additional considerations
- The use of a draw instead of guaranteed compensation should be documented and justified in research ethics board applications.
- Participants should be informed about the likelihood of winning during the informed consent process.
Consider offering geo-locked gift cards as incentives
Advantages
- Offering gift cards that can only be used within a certain geographic region or country may deter bots or fraudulent participants from targeting the study.
Disadvantages
- Bots or fraudulent participants may still complete the study, especially if they can benefit from the geo-locked gift card.
- May not be appropriate or feasible for research involving international populations.
Additional considerations
- The choice of incentive should be geographically and culturally appropriate for the study population.
- Information about incentives should be provided during the informed consent process.
Study design: Recruitment Strategies
Use a two-stage eligibility screening and data collection process
Advantages
- Separating the participant eligibility screening step from the research data collection step can allow researchers to verify participants’ identities by email, phone, video call, etc. prior to collecting study data. This can help deter or catch fraudulent actors before they participate in the data collection step, thus limiting the impact on data integrity and minimizing incentive payments to bots or fraudulent participants.
- Researchers can separate the eligibility screening and data collection steps using various methods, including posting the eligibility screening questionnaire link (as opposed to the survey or study link) in the recruitment materials or by posting an email address that interested individuals are instructed to contact to initiate the screening.
Disadvantages
- This process may be resource-intensive for researchers (e.g., time, personnel).
- May increase participant burden.
- Verifying potential participants’ identities may be inappropriate for anonymous research or studies where investigators are blinded to participant groups.
Additional considerations
- Participants should be informed of the eligibility screening procedure during the informed consent process.
- The effectiveness of this strategy to mitigate fraud may be enhanced by including similar items in the separate eligibility and research data collection steps, which allows researchers to monitor responses for consistency (see data cleaning strategies below).
Recruit by invitation only
Advantages
- Recruiting by invitation only minimizes the threat of fraud since only specific individuals can participate in the study.
- Online survey platforms often have an invitation feature that allows researchers to send invitation emails and track participation.
Disadvantages
- Researchers may not have access to a list of potential participants.
- Verifying the identity and contact information for participants prior to collecting data creates an additional step that requires time and resources.
- Not appropriate for anonymous research.
Use caution when sharing recruitment links online, including on social media and public sites
Advantages
- Studies that use public recruitment links are a major target of fraud since they are easily found and accessed by fraudulent participants and bots. Recruiting from restricted sources can help prevent the link from being shared widely and thus becoming targeted by fraudulent actors.
Disadvantages
- Posting recruitment links on public sites including social media can help reach a larger audience, which can be useful for research that requires large sample sizes or potentially hard-to-reach populations. Not posting on public social media sites may therefore limit researchers’ reach.
Additional considerations
- Consider whether a compromise is available or appropriate (e.g., recruiting through private special interest or population groups on social media sites, or asking community organizations to post study links internally rather than on their social media pages).
Create unique URLs for each recruitment source or campaign
Advantages
- Creating unique URLs for each recruitment source or campaign (e.g., website, social media, newsletter) allows researchers to monitor whether certain URL(s) have been targeted by bots and disable these links and/or remove responses in the data cleaning phase.
Disadvantages
- May require additional time and resources to monitor survey submissions.
Additional considerations
- Some online survey platforms may not support this feature so researchers should consult with IT and survey platform support services to identify specific solutions (e.g., creating multiple versions of the same survey instead of unique URLs).
Consider recruiting from online crowdsourcing platforms (e.g., Amazon Mechanical Turk (MTurk), Prolific, Leger)
Advantages
- Crowdsourcing platforms allow researchers to recruit participants from a large pool of users to complete various tasks, including surveys or panels, in exchange for compensation.
- These platforms offer several beneficial built-in features, including preventing users from submitting multiple responses. Researchers can also recruit based on participant approval ratings, which can limit poor quality data.
Disadvantages
- Users might become inattentive while striving to complete tasks rapidly to maximize their compensation.
- May not be ideal for reaching certain target populations.
Additional considerations
- More information about the practical uses and ethical considerations for using crowdsourcing platforms for research can be found in the References and Resources section.
Consider using a participant pool management system (e.g. Sona Systems)
Advantages
- Participant pool management systems allow researchers to set up and self-administer lab and online studies, and recruit and manage their pool of participants.
- Sona Systems is a participant pool management system that allows researchers to recruit students from their institution and provide compensation or course credit for completing research studies. Limiting recruitment to your home institution limits potential fraud from external actors.
- These platforms can help with assessing eligibility before participants access the survey.
- Eligible participants could be recruited for other studies or by other researchers.
Disadvantages
- Setting up studies and recruitment in a participant pool management system requires time and resources.
- Recruiting from within your home institution limits the size and demographics of the potential participant pool and may not be appropriate for reaching certain target populations.
Additional considerations
- The same participant pool should not be used perpetually. Participants should be removed at a regular interval or by request.
Data Collection Tool Design
Technology- and content-based strategies for mitigating fraud in the data collection tool design stage
Data collection tool design: Technology-based strategies
Include a CAPTCHA verification question
Advantages
- CAPTCHA (Completely Automated Public Turing Test to tell Computers and Humans Apart) is a commonly used test that helps distinguish between humans and bots, often by interpreting images or text or clicking certain buttons. Failing the CAPTCHA test blocks potential bots from proceeding to the data collection step.
- Several online survey platforms offer this feature.
Disadvantages
- Some bots are able to pass CAPTCHA tests.
- If CAPTCHA tests are too difficult, humans may fail them and be blocked from proceeding to the data collection step.
Additional considerations
- It is important for researchers to evaluate what user information is collected from participants during CATCHA tests (e.g., IP address, account information, browser history, and cookies, etc.), especially if they are administered through third-party platforms (e.g., Google reCAPTCHA). More information about CAPTCHA tests can be found in References and Resources section.
Enable the bot detection feature in the online survey platform
Advantages
- Bot detection is a feature on some survey platforms that provides researchers with a score based on the probability that a respondent is a bot.
- Researchers can filter, analyze, and remove data from respondents that do not meet the recommended score threshold.
Disadvantages
- Can inaccurately classify respondents as humans or bots.
Additional considerations
- May use Google reCAPTCHA technology, therefore researchers need to evaluate what user information is collected from participants.
- Not available on all survey platforms.
Prevent multiple submissions using cookies
Advantages
- Some survey platforms offer a feature that prevents multiple submissions by placing a cookie in the browser that flags any subsequent responses as a duplicate.
Disadvantages
- Clearing cookies overrides this strategy and allows individuals to submit multiple responses.
Additional considerations
- Not available on all survey platforms.
Include hidden or honeypot question(s)
Advantages
- Honeypot questions are survey questions that are not visible to human participants but can be detected by bots. If these questions receive a response, then it can be concluded that a non-human respondent completed the questionnaire.
Disadvantages
- Hiding questions requires the use of coding on some online survey platforms.
- Bots often skip hidden or honeypot questions as humans do.
Additional considerations
- Using hidden or honeypot questions does not appear to be an effective strategy for catching bots.
Protect the survey using a password
Advantages
- Requiring participants to provide a password to complete the data collection step can deter fraudulent actors.
- Several online survey platforms allow researchers to create a password-protected survey.
Disadvantages
- If the password is accidentally or maliciously shared publicly or to unintended audiences, the online survey could be targeted by fraudulent actors.
- May increase participant burden.
Protect the study using online identity authentication
Advantages
- Requiring participants to verify their identity by authenticating through third-party platforms (e.g., Single Sign-On through Google or Facebook) can deter fraudulent actors.
- Some online survey platforms support Single Sign-On authentication.
Disadvantages
- May exclude legitimate participants who do not have an account with the online authenticator due to access, digital literacy, or other reasons.
- May increase participant burden.
Additional considerations
- Not available on all survey platforms.
Use a “Secondary Unique Field”
Advantages
- When online survey tools such as REDCap create an entry, they create a unique record ID. The “Secondary Unique Field” allows researchers to help prevent participants from submitting more than one response by enforcing that another field (e.g., email address, phone number) must be unique as well.
Disadvantages
- Bots and fraudulent participants may be able to circumvent this strategy.
Additional considerations
- Not available on all survey platforms.
Prevent search engine indexing of survey/page URL
Advantages
- Include “noindex, nofollow” tags within your webpage’s HTML <head> or select options that with your survey/page development tool that prevent search engines from crawling and ranking your survey. This makes the study link harder to find and exploit (e.g., by preventing it from popping up on sites that advertise surveys).
Disadvantages
- Not all crawlers respect “noindex, nofollow” tags.
- May make the study link harder to find by legitimate participants.
Randomize the order of multiple-choice responses
Advantages
- Randomizing the response options in multiple-choice questions can make it more difficult for fraudulent actors to a submit a coherent response. For example, researchers can evaluate whether related questions are answered in consistent ways regardless of the order of the response options.
Disadvantages
- Cannot randomize the order of response options for all questions (e.g., questions that use a Likert scale or ‘all of the above’ option).
- Requires time and resources during the data cleaning process to determine whether responses are coherent.
Prevent multiple responses from the same IP address
Advantages
- An IP address is a label assigned to a device that is connected to the internet or local network.
- Programming the data collection tool to prevent multiple responses from the same IP address can help prevent multiple submissions from the same individual.
Disadvantages
- IP addresses can be changed using a virtual private network (VPN) or virtual private server (VPS), thereby bypassing this strategy.
- Not appropriate for research that seeks responses from individuals who may be using the same internet connection (e.g., multiple members of a single family or household, participating in a study on a device at a community-based organization).
Additional considerations
- Not available on all survey platforms.
Data collection tool design: Content-based strategies
Include multiple questions that relate to the eligibility criteria
Advantages
- Including questions related to the study eligibility criteria at one or multiple points during data collection can help researchers determine whether a respondent is eligible to participate. For example, if participants must live or work in a certain geographic area, asking for a postal code, province or city abbreviation, colloquial name for their neighborhood etc. can help identify fraudulent or ineligible participants.
Disadvantages
- May increase participant burden.
- Participants may become frustrated or confused if they are answering similar questions multiple times.
Additional considerations
- The effectiveness of this strategy may be enhanced by using a two-stage eligibility and data collection process (see recruitment strategies above) and including questions related to the eligibility criteria in both steps.
Include question(s) that are highly specific to the target population or that test institutional knowledge
Advantages
- Including questions that test institutional knowledge of specific populations can help to identify fraudulent or ineligible participants. For example, if researchers are surveying pharmacists, including question(s) related to drug mechanism of action or interactions can help distinguish between valid and fraudulent participants.
Disadvantages
- Depending on the questions, bots or fraudulent participants may be able to ‘pass’ these questions purposefully or by luck.
- Determining what specific question(s) to ask may be challenging depending on how narrow or broad the target population is.
Additional considerations
- This strategy may be more effective for highly specific populations compared to more general populations.
Consider formatting one or more of these questions as an open-text question, so that bots or fraudulent participants can’t randomly select a correct answer among multiple choices.
Repeat questions or include similar questions to enhance accuracy of fraud detection
Advantages
- Including similar questions that are worded differently allows researchers to evaluate the consistency between answers to identify potential fraud (e.g., asking for the participant’s age and the year they were born).
Disadvantages
- May increase participant burden.
- Participants may become frustrated or confused if they are answering similar questions multiple times.
Legitimate participants may rethink their responses to certain questions and want to revise them.
Additional considerations
- The effectiveness of this strategy may be increased if the eligibility and data collection steps are separate since legitimate participants should have consistent responses, while bots or fraudulent participants may answer inconsistently.
Include open-ended question(s)
Advantages
- Open-ended questions (as opposed to multiple choice questions) require participants to provide a text response that can be assessed for suspicious or fraudulent behaviour.
- Bots or fraudulent participants may enter nonsensical answers (e.g., gibberish, illogical responses) that alert researchers to possible fraud.
- More commonly, researchers can identify fraudulent responses because of word-for-word repeated answers across multiple submissions.
Disadvantages
- Researchers are increasingly noticing that bots are using AI to generate reasonable and logical answers to open-ended questions, making fraud detection more challenging. Researchers may benefit from using strategies to evaluate/detect AI generated content (e.g., GPTZero, calculating ROUGE or BLEU scores), however these processes may require times, resources, and/or coding to conduct or automate.
- Legitimate participants may enter nonsensical answers or non-answers (e.g., “xxx”, “N/A”) if they think they need to enter a response to move forward.
- Analyzing textual data can require additional time compared to responses to multiple-choice questions.
Additional considerations
- It is recommended to include one or more open-ended questions.
- Open-ended questions can relate to the study purpose or eligibility, or can be catch-all questions (e.g., soliciting additional comments at the end of the survey).
Researchers should determine whether it is appropriate and ethical to make at least one of the open-ended questions required so that all respondents must provide a response to facilitate fraud detection. If individuals are forced to respond to certain question(s) to submit their responses, this should be justified and documented in research ethics board applications.
Include attention check question(s) or false question(s)
Advantages
- Attention check questions prompt respondents to perform a specific action, such as leaving a question unanswered or selecting a particular answer option. False questions have no true answer, thus requiring participants to select ‘none of the above’. These questions can help identify inattentive survey-takers or bots that answer questions at random.
Disadvantages
- Failing attention check or false question(s) does not necessarily indicate fraud or malicious intent.
Attention check and false questions can be confusing to some human participants and/or may result in them questioning the legitimacy of the survey or study.
Data Cleaning Strategies
Strategies for mitigating fraud in the data cleaning stage
Data Cleaning Strategies
Verify the duplication or legitimacy of participant email addresses
Advantages
- Verifying the submitted email addresses can help mitigate fraud by ensuring that there are no duplicates, indicating that an individual has completed the survey or data collection step multiple times.
- Researchers can determine if participants used an institutional email address (e.g., from a university, organization, government body) to help assess the legitimacy of submissions.
- Researchers can verify whether email address formats are consistent with bulk email creation (e.g., a random string of numbers or letters, or namename##) to help determine which submissions are likely fraudulent.
Disadvantages
- Not appropriate for anonymous research.
- Legitimate participants may rethink their responses and want to revise them or accidentally submit multiple responses from the same email address.
Additional considerations
- Consider strategies for minimizing the possibility of accidental multiple submissions and develop a pre-determined protocol for managing multiple submissions from the same individual.
Verify IP addresses or browser/OS combinations
Advantages
- An IP address is a label assigned to a device that is connected to the internet or local network. Similar to duplicated email addresses, multiple submissions from the same IP address can indicate that an individual has submitted multiple responses.
- IP addresses can be linked to a respondent’s geolocation (i.e., latitude and longitude). Some survey software may be able to redirect/filter out respondents from outside of an eligible geographic location (e.g., Canada) to prevent them from accessing the survey in real-time.
- Researchers can also verify geolocation duplication during the data cleaning phase.
Disadvantages
- Geolocation based on IP address can be imprecise.
- IP addresses can be changed using a virtual private network (VPN) or virtual private server (VPS), thereby bypassing these strategies.
- Verifying IP addresses may not be appropriate for anonymous research as IP addresses can be considered identifiable information.
- Excluding responses from the same IP address is not appropriate for research that seeks responses from individuals who may be using the same internet connection (e.g., multiple members of a single family or household, participating in a study on a device at a community-based organization).
- Obsolete browser/OS combinations can but do not necessarily indicate bots.
Review for survey response patterns
Advantages
- Survey response which demonstrates “straightlining” (choosing all [a]’s; choosing [a], [b], [c], [a] in sequential order) could be used to trigger additional verification.
Disadvantages
- This strategy may take additional time and resources during the data cleaning process, especially for studies with large volumes of responses.
Monitor speed of survey completion
Advantages
- Survey platforms can provide timestamps that allow researchers to calculate how quickly participants complete the survey.
- The speed of survey completion can be used to exclude responses from respondents that complete it either too quickly or too slowly.
Collecting timestamps at the beginning and end of each question or section will also allow researchers to calculate intra-survey completion times and flag suspicious entries.
Disadvantages
- Some legitimate participants may leave the data collection tool open for a long period of time before submitting it, leading to a longer than expected completion time.
Additional considerations
- It is recommended that researchers test the survey to determine the average completion time prior to setting the lower and upper limits (i.e., the shortest and longest amount of time that a valid participant could reasonably take to complete the survey, respectively).
Researchers need to consider how conservative or liberal to be with their criteria for excluding responses based on survey completion time.
Monitor survey start and stop time
Advantages
- Survey platforms can track the start and stop time of surveys.
- Bots often complete a large number of submissions quickly and researchers can monitor whether multiple surveys were started or ended within a short period of time (e.g., within 1-2 minutes of each other), or if there are waves of survey responses (e.g., 50 responses within 15 minutes when typically, responses are submitted once every 15-30 minutes).
Researchers can also look at time of day when submissions are made. If participants are expected to be from Ontario and batches of responses are submitted between 2-3AM EST, there is a higher likelihood of fraud.
Disadvantages
- There may be an influx of submissions after the launch of a recruitment campaign, which could lead to clusters of similar start and stop times among valid submissions.
Surveys completed at odd or unexpected times do not necessarily indicate fraud.
Monitor the timing of survey responses in relation to recruitment campaigns
Advantages
- Survey submissions are typically linked to the launch of recruitment campaigns (e.g., a social media post, newsletter circulation), where the number of submissions has a predictable rise and subsequent fall over time. Monitoring whether large batches of surveys are submitted at unexpected times after the launch of a recruitment campaign could indicate that a bot gained access to the survey link.
Disadvantages
- Surveys completed at odd or unexpected times do not necessarily indicate fraud.
Conduct a verification step and monitor for inconsistencies in responses
Advantages
- Researchers may conduct a verification step where participants are contacted after completing the study and asked to confirm their responses to certain questions (e.g., sociodemographic information) to verify their identity.
Disadvantages
- Not appropriate for anonymous research.
- This strategy can be time and resource intensive.
Additional considerations
- Researchers should select questions whose answers are expected to remain constant (e.g., date of birth, other study-specific questions) as opposed to changeable (e.g., weight if pertinent to the study, home address if verification step occurs a significant time after data collection).
- This strategy could be used for all respondents or for submissions that are flagged as suspicious due to specific criteria. However, participants should be informed of how and when they will be contacted during the study and what will be asked of them during the informed consent process.
Frequently Asked Questions
This FAQ section is designed to address common questions and provide guidance on best practices and ethical considerations for mitigating fraud in online research studies.
1. How many strategies should I implement in each of my studies?
The number of fraud-mitigating strategies that should be used will be unique to each research context. General guidance includes:
- Determine whether there are any criteria that can be used as definitive indicators that a submission is fraudulent (e.g., failing reCAPTCHA test, compromised survey URL, completed survey in <1 minute, responses that indicate that the participant is ineligible).
- Determine whether there are any criteria that indicate a suspicion of fraud (e.g., completes a 30-minute survey in <10 minutes, failed attention check question, start/stop time is within 1-2 minutes of another submission). The greater number of these criteria that are met increases the likelihood that a response is fraudulent. Determine how many ‘suspicious’ criteria must be met to label the submission as fraudulent.
- Select and define the criteria prior to the start of the survey and monitor them as data collection progresses to ensure that they are effectively detecting fraud.
2. Can I just use the built-in antifraud technology in platforms such as REDCap or Qualtrics?
- Recent research has consistently shown that built-in antifraud technology alone (e.g., reCAPTCHA, bot detection, prevent multiple submissions) is insufficient to prevent fraud in online research. These strategies also appear to be less effective at distinguishing between legitimate and fraudulent responses compared to robust researcher-developed strategies in some cases. Therefore, this technology should be combined with additional strategies.
3. Do I need to provide compensation even though fraudulent actors seem to target studies offering incentives?
- The University of Toronto research ethics guidelines state that: “Researchers should normally provide compensation to participants for their time. Where compensation will not be provided, researchers may explain why in the protocol – resources lacking, inappropriateness for the area of research or culture of research participants, etc.” Researchers therefore need to evaluate their specific research context to determine the appropriate course of action.
4. What other ethical principles guide the use of study incentives?
The use of study incentives is subject to institutional and national research ethics policies. Some key ethical principles that impact how researchers should manage incentives in online research include:
- Researchers should normally provide compensation to participants for their time. Where compensation will not be provided, researchers must justify why in research ethics board applications.
- Incentives should not be so large or attractive as to undermine the voluntary nature of participation and they should be appropriate for the study population (e.g., vulnerable populations, Indigenous populations, professionals).
- Incentives may be structured incrementally (i.e., prorated based on the level of study completion) or as a lump-sum (i.e., independent from the level of study completion).
- Incentives may be administered to each participant or through a draw or lottery-based system.
- Participants are free to withdraw from a study without consequences. Therefore, participants should not have any prior incremental payment withdrawn or withheld, and they should be paid the lump-sum amount regardless of continuation in the study.
- Participants should be informed of the incentive during the informed consent process (e.g., type, dollar amount, how incentives will be awarded, expectations for participation including participating in good faith and participating only once).
5. What should I do if my survey is anonymous?
- If you are conducting anonymous research, then the strategies you select must comply with the ethical requirements to maintain participant privacy and confidentiality and avoid collecting any identifying information that can be linked to the survey responses. Therefore, using strategies such as collecting IP addresses or verifying individuals’ identities to use their data would be inappropriate in these cases. In separating the survey questionnaire from the incentive questionnaire, researchers could use appropriate strategies to mitigate fraudulent responses in both questionnaires while enabling anonymity in the survey.
References & Resources
References and additional reading
Bonett, S., Lin, W., Sexton Topper, P., Wolfe, J., Golinkoff, J., Deshpande, A., … & Bauermeister, J. (2024). Assessing and Improving Data Integrity in Web-Based Surveys: Comparison of Fraud Detection Systems in a COVID-19 Study. JMIR Formative Research, 8, e47091.
Goodrich, B., Fenton, M., Penn, J., Bovay, J., & Mountain, T. (2023). Battling bots: Experiences and strategies to mitigate fraudulent responses in online surveys. Applied Economic Perspectives and Policy, 45(2), 762-784.
Griffin, M., Martino, R. J., LoSchiavo, C., Comer-Carruthers, C., Krause, K. D., Stults, C. B., & Halkitis, P. N. (2021). Ensuring survey research data integrity in the era of internet bots. Quality & quantity, 1-12.
King-Nyberg, B., Thomson, E. F., Morris-Reade, J., Borgen, R., & Taylor, C. (2023). The Bot Toolbox: An Accidental Case Study on How to Eliminate Bots from Your Online Survey. Journal for Social Thought, 7(1).
Lawlor, J., Thomas, C., Guhin, A. T., Kenyon, K., Lerner, M. D., Ucas Consortium, & Drahota, A. (2021). Suspicious and fraudulent online survey participation: Introducing the REAL framework. Methodological Innovations, 14(3), 20597991211050467.
Sterzing, P. R., Gartner, R. E., & McGeough, B. L. (2018). Conducting anonymous, incentivized, online surveys with sexual and gender minority adolescents: Lessons learned from a national polyvictimization study. Journal of interpersonal violence, 33(5), 740-761.
Yarrish, C., Groshon, L., Mitchell, J., Appelbaum, A., Klock, S., Winternitz, T., & Friedman-Wheeler, D. G. (2019). Finding the signal in the noise: Minimizing responses from bots and inattentive humans in online research. The Behavior Therapist, 42(7), 235-242.
Research Policies and Guidance
Guidance from peer institutions
Survey platform resources
University of Toronto Contacts
For questions related to research ethics contact the Human Research Ethics Unit (HREU)
For questions related to information security contact the Research Information Security Program (RISP)