The Use of Mechanical Turk for Experimental Research Purposes
Winter 2019
Sami West
Introduction
With an ever increasing need for experimental research, online crowdsourcing and crowdwork platforms have emerged as unique tools for various companies and researchers to obtain data from larger audiences. Online crowdsourcing allows researchers to engage with individuals from their unique populations of interest in a remote setting, in order to work together, solve problems, answer questions, and uncover statistical data.
Of the platforms available to researchers, Amazon’s Mechanical Turk has become increasingly utilized to a wide effect for scientific research (Jacquet, 2011). By leveraging Mechanical Turk for data collection, researchers are able to create “Human Intelligence Tasks” and global Mechanical Turk workers are able to complete these tasks for some form of compensation (Volz et al, 2017). While that may appear to the average eye as a standard business transaction, utilizing such a broad network of crowdworkers for experimental research comes with it’s fair share of costs and benefits. We’ll be reflecting on Amazon’s Mechanical Turk crowdworking service to identify what researchers should consider positively, negatively, and ethically about the service before jumping in to collect data.
What exactly is Mechanical Turk?
Amazon’s website markets Mechanical Turk as a “global, on-demand, 24-7 workforce” that is able to provide it’s customers with access to “human intelligence” (MTurk, 2018). While artificial intelligence exists and is evolving everyday, there are still gaps within services that real humans are able to fill. That is where Mechanical Turk, or MTurk, is able to step in.
Since it’s creation in 2005, MTurk has seen companies and individuals utilize their service for image and video processing, data verification and cleanup, data processing, and information processing (MTurk, 2018) -- which is the area we’re interested in for experimental research purposes. Information and data processing opportunities within MTurk allow researchers to propose scenarios and ask workers open ended questions, likert or rating scale questions, or gather other identifiable data with quick feedback through surveys.
What form of benefits does Mechanical Turk provide for researchers?
By utilizing MTurk for information gathering, researchers are able to collect large quantities of data from their population of interest in a small amount of time, for a fraction of the cost of other data collection methodologies (Volz et al, 2017). The researcher (or “requestor” in mTurk lingo) can create their task on mTurk, or use MTurk to redirect the crowdworkers to an external site (Jacquet, 2011). From there, they can choose which demographic profiles they want to complete their task, which helps to ensure that they are gathering meaningful data to answer their research questions. Finally, MTurk allows the researcher to choose their “Worker Reward”, or pay rate, for completing their task, usually in the form of a cent amount (MTurk, 2018). The customizability of the platform allows researchers to collect their data at a lower overhead cost, making it an affordable option.
In addition to being able to customize their data collection experience and pay rate to workers, MTurk also promises it’s users a quick turnaround time for receiving their data. In an experiment conducted by members of the User Experience Professionals Association (UXPA), Jake Volz and Charles Mitchell found that whether they needed only 10’s of responses, or an increase of 100’s of responses, they were able to finish with data collection in only one to two days (Volz et al, 2017). Thousands of MTurk Workers were able to be reached for their study, which enabled them to screen their target number of users, which also reportedly allowed them to achieve statistical power (Volz et al, 2017) for their experimental research.
What are the drawbacks to using Mechanical Turk for research?
Although MTurk provides many positive benefits to their task requestors, those benefits also come at a few costs -- literally, ethically, and quality wise.
Monetary Costs
As mentioned previously, in order to request for workers to complete your task, you must select which demographic profile you’re interested in and set your pay rate for completion. MTurk then charges a 20% fee on the pay amount you select, and potentially an additional 20% on the pay amount if you need 10 or more assignments to be completed by the MTurk workforce (MTurk, 2018).
Added to those fees, if you are not interested in opening up your task to the entire MTurk workforce, you must pay additional fees if you’re interested in accessing the population of Mechanical Turk Masters (workers with high quality completion rates) or reaching workers from premium demographic backgrounds. These premium demographics can include specific job titles, age ranges, educational qualifications, and more. So, although MTurk has a diverse network of users to access across the globe (MTurk 2018), if you’re interested in reaching more specific populations, than you’re going to need to pay the price.
Ethical Costs
Because the pay rates selected for tasks are typically in the form of cents, the additional costs presented to users of MTurk do not take away from the affordability of their service. But, what does present itself is a question of the ethics behind gathering data at such low pay rates -- or even micro-payments as some have named them (Kittur et al, 2008). With the average MTurk pay rate at $1.40 per hour in 2011 (Jacquet, 2011), it starts to question the morality of obtaining data in this fashion. MTurk does not force workers to sign up, and is a voluntary service, but does that mean that we can get away with charging such a small amount to the populations that are helping us break ground on our research? This is an important question researchers need to consider before selecting their pay rate for their tasks.
Quality Costs
These concerns of micro-payments relate to another problem we see within MTurk, which is the quality of the data being produced. Going back to logistics, we know that a researcher is able to set their own pay rate for the tasks they create. Based on this, it would be safe to assume that generally higher quality work will be found from a higher payout amount. When a worker is presented with a task for a small cent amount, it’s been shown that many try to complete the task quickly in order to move on to the next task to maximize their profits (Kittur et al, 2008).
In 2008, Kittur et al conducted an experiment on MTurk to compare the quality of responses from crowdworkers to the quality of responses of alternatively recruited Wikipedia Admins. Although the tasks they posted to MTurk were rapidly completed upon posting, 58.6% of the tasks completed were found to be potentially invalid due to the length of time spent on the task or the feedback they provided within their responses (Kittur et al, 2008).
There is no way to ensure that the responses you are going to receive are going to be of the quality you expect, but Kittur et al did find that there are ways that you, as the researcher, can try to incorporate validity checks within your tasks to easily point out bad responses. In a second experiment by their team, Kittur et al added in four additional questions to their tasks that had verifiable quantitative answers (Kittur et al, 2008). The new questions prompted workers to input the quantities of specific content types that were presented during their task. The end result of this experiment still came back with some data that contained meaningless responses or task completion times that were too quick, but that number was significantly cut back in comparison to the first round of the experiment. This experiment showed that researchers don’t have full control over the quality of the data they receive, but researchers can play a role in improving the percentage of acceptable responses by spending the time adding in validity checks.
Conclusion
Recruiting participants and collecting data can be both a costly and lengthy process for researchers conducting experimental research, but online crowdsourcing platforms present a potential solution for improving this cycle. Mechanical Turk, specifically, offers resources for researchers to gather data across the globe in a matter of days, and at a fraction of the cost it would take to recruit and run participants through an in person experiment. Though the appeal of the ease is very attractive, it is important to consider what you’re paying for in order to access the populations you need, the ethics behind recruiting crowdworkers at micro-costs, and the potential quality concerns that present itself when you’re paying volunteer workers to perform tasks for little payoff. MTurk can still be a handy tool for researchers, but they should do so after becoming well informed on what they’re getting into and how they plan to combat potential ethical and literal concerns that could arise.
Works Cited
Jacquet, J. (2011, July 07). The pros & cons of Amazon Mechanical Turk for scientific surveys. Retrieved from https://blogs.scientificamerican.com/guilty-planet/httpblogsscientificamericancomguilty-planet20110707the-pros-cons-of-amazon-mechanical-turk-for-scientific-surveys/
Kittur, A., Chi, E. H., & Suh, B. (2008). Crowdsourcing user studies with Mechanical Turk. Proceeding of the Twenty-sixth Annual CHI Conference on Human Factors in Computing Systems - CHI 08. doi:10.1145/1357054.1357127
MTurk: Human intelligence through an API. Access a global, on-demand, 24x7 workforce. (n.d.). Retrieved from https://www.mturk.com/
Peer, E., Vosgerau, J., & Acquisti, A. (2013). Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behavior Research Methods, 46(4), 1023-1031. doi:10.3758/s13428-013-0434-y
Volz, J., & Mitchell, C. (n.d.). Mechanical Turk: Quickly Scale Research Without Breaking the Bank User Experience Magazine. Retrieved from http://uxpamagazine.org/mechanical-turk/