1. Tokenization, stemming, unigrams, bigrams, trigram and skipgrams generation. 2. Bag-of-words model for classification on 20-news-groups dataset. 3. tf-idf weighting for classification on ...