Crowdsourcing for Machine Learning

Crowd-sourcing labor markets are booming, because they enable rapid construction of complex workflows that seamlessly mix human computation with computer automation. Example applications range from photo tagging to audio-visual transcription and interlingual translation. 

Crowdsourcing makes it really easy to collect high-quality training data for exactly the model you want to build. All machine learning models need training data to work well, and crowdsourcing is the perfect way to get that data. A sentiment algorithm needs to be trained on positive and negative examples to learn what words and sentences suggest positive or negative sentiment about a topic.

Today crowdsourcing is used for the most diverse tasks. Many tasks, often language-related, require human cognition to carry out association and/or verification: assigning a label to a text object, translation, speech transcription, photo tagging, etc. Associative tasks are typically performed on raw data that is available in large volumes without the association. When the data to be processed by the crowds represents the minority class(es) of a highly skewed distribution, new issues emerge: submitting to the crowds large unlabeled volumes of data is not efficient, neither operationally nor financially. Stefano Vegnaduzzo, Vice President of Data Science at Integral Ad Science.

Daniel S. Weld is a venture partner with Madrona and the WRF/TJ Cable Professor of Computer Science & Engineering at the University of Washington. After formative education at Phillips Academy, Daniel S. Weld received bachelor’s degrees in both CS and Biochemistry at Yale University in 1982. He landed a Ph.D. from the MIT Artificial Intelligence Lab in 1988, received a Presidential Young Investigator’s award in 1989, an Office of Naval Research Young Investigator’s award in 1990, was named AAAI Fellow in 1999 and deemed ACM Fellow in 2005. Dan was a founding editor for the Journal of AI Research, was area editor for the Journal of the ACM, guest editor for Computational Intelligence and Artificial Intelligence, and was Program Chair for AAAI-96. Dan has published two books and scads of technical papers.

Dan is an active entrepreneur with several patents and technology licenses. He co-founded Netbot Incorporated, creator of Jango Shopping Search (acquired by Excite), AdRelevance, a monitoring service for internet advertising, (acquired by Media Metrix), Nimble Technology, a data integration company (acquired by Actuate). Dan is a Venture Partner at the Madrona Venture Group, and a member of the Technical Advisory Boards for the Allen Institute for Artificial Intelligence, Context Relevant, Spare5, and Madrona.

Leave a Reply

Your email address will not be published. Required fields are marked *