hash - how to set target feature dimension in Spark MLLIb's HashingTF() function? -


apache spark mllib has hashingtf() function takes tokenized words input , converts sets fixed-length feature vectors.

as mentioned in documentation link spark mlib documentation

it advisable use power of 2 feature dimension.

the question whether exponent value number of terms in input

if yes, suppose if consider more 1000 text document input has more 5000 terms , feature dimension become 2^5000

whether assumption correct or there other way find exponent value


Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -