The Yahoo! Webscope™ Program is a reference library of interesting and scientifically useful datasets for non-commercial use by academics and other scientists. All datasets have been reviewed to conform to Yahoo!'s data protection standards, including strict controls on privacy. We have a number of datasets that we are excited to share with you. Learn how to get involved.

Latest Publications
  • A Poodle or a Dog? Evaluating Automatic Image Annotation Using Human Descriptions at Different Levels of Granularity
  • Scalable similarity-based neighborhood methods with MapReduce
  • Distributed matrix factorization with mapreduce using a series of broadcast-joins
  • "All roads lead to Rome": optimistic recovery for distributed iterative data processing

Eligibility:

Yahoo! is pleased to make these datasets available to researchers who are advancing the state of knowledge and understanding in web sciences. The datasets are only available for academic use by faculty and university researchers who agree to the Data Sharing Agreement.

To be eligible to receive Webscope™ data you must:

  • Be a faculty member, research employee or student from an accredited university
  • Send the data request from an accredited university .edu or domain name (for international universities) email address

We are not able to share data with:

  • Commercial entities
  • Employees of commercial entities with university appointment
  • Research institutions not affiliated with a research university

Note: You must have a Yahoo! account to apply for Webscope™ datasets.

View Datasets