Language Data

L25 - Yahoo N-Gram Representations, version 2.0 (2.6Gb) (Hosted on AWS)

This dataset contains n-gram representations. The data may serve as a testbed for query rewriting task, a common problem in IR research as well as to word and sentence similarity task, which is common in NLP research. We would like for researchers to be able to produce query rewrites based these representations and test them against other state-of-the-art techniques.