Language Data

L31 - Questions on Yahoo Answers labeled as either informational or conversational, version 1.0 (766KB)

The dataset includes non-deleted English questions from Yahoo Answers, posted between the years 2006 and 2016, sampled uniformly at random. Each question include a URL to its Yahoo Answers page, its title, description, high-level category (one of 26), direct category, and a label marking it as informational ('0') or conversational ('1'). A small subset of the questions is marked as borderline ('2').