python - scikit-learn, add features to a vectorized set of documents -
i starting scikit-learn , trying transform set of documents format on apply clustering , classification. have seen details vectorization methods, , tfidf transformations load files , index vocabularies.
however, have metadata each documents, such authors, division responsible, list of topics, etc.
how can add features each document vector generated vectorizing function?
you use dictvectorizer
categorical data , use scipy.sparse.hstack combine them.
Comments
Post a Comment