![Python Natural Language Processing](https://wfqqreader-1252317822.image.myqcloud.com/cover/883/36700883/b_36700883.jpg)
Resources for accessing free corpora
Getting the corpus is a challenging task, but in this section, I will provide you with some of the links from which you can download a free corpus and use it to build NLP applications.
The nltk library provides some inbuilt corpus. To list down all the corpus names, execute the following commands:
import nltk.corpus dir(nltk.corpus) # Python shell print dir(nltk.corpus) # Pycharm IDE syntax
In Figure 2.2, you can see the output of the preceding code; the highlighted part indicates the name of the corpora that are already installed:
![](https://epubservercos.yuewen.com/86C0EB/19470405708946506/epubprivate/OEBPS/Images/pp095.png?sign=1739251276-RmrOCL7qSgQgi27ngzOCvKniRk38Y68O-0-c8e3b7fb55fcf775ae19541aa00710a6)
If you want to explore more corpus resources, take a look at Big Data: 33 Brilliant and Free Data Sources for 2016, Bernard Marr (https://www.forbes.com/sites/bernardmarr/2016/02/12/big-data-35-brilliant-and-free-data-sources-for-2016/#53369cd5b54d).
Until now, we have looked at a lot of basic stuff. Now let me give you an idea of how we can prepare a dataset for a natural language processing applications, which will be developed with the help of machine learning.