Small text corpus?

Hi guys,

Do you happen to know where I can find small-corpus/corpora of texts.
May be maximum 10-30 sentences.
The other requirement is that they contain clear semantic connection between the words used in the corpus, such that human can follow them.

I’m looking for something small with which I can test my algorithm.

For short text documents, you may be interested in the IMDb reviews dataset or the Yelp Challenge dataset. Also check out NLTK, which offers a large collection of NLP corpora.

1 Like

I have some of the NLTK data sets in this repo:

1 Like