python - scikit-learn, add features to a vectorized set of documents -


i starting scikit-learn , trying transform set of documents format on apply clustering , classification. have seen details vectorization methods, , tfidf transformations load files , index vocabularies.

however, have metadata each documents, such authors, division responsible, list of topics, etc.

how can add features each document vector generated vectorizing function?

you use dictvectorizer categorical data , use scipy.sparse.hstack combine them.


Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

ios - Change Storyboard View using Seague -