i have set of documents, , transform such form, allow me count tfidf words in documents (so each document being represented vector of tfidf-numbers). i thought enough call wordnetlemmatizer.lemmatize(word), , porterstemmer - 'have', 'has', 'had', etc not being transformed 'have' lemmatizer - , goes other words well. have read, supposed provide hint lemmatizer - tag representing type of word - whether noun, verb, adjective, etc. my question - how these tags? supposed excecute on documents this? i using python3.4, , lemmatizing + stemming single word @ time. tried wordnetlemmatizer, , englishstemmer nltk , stem() stemming.porter2. ok, googled more , found out how these tags. first 1 have preprocessing, sure file tokenized (in case removing stuff left off after conversion pdf txt). then these file has tokenized sentences, each sentence word array, , can tagged nltk tagger. lemmatization can done, , stemming added on top of it. from n...