hash - how to set target feature dimension in Spark MLLIb's HashingTF() function? -

hash - how to set target feature dimension in Spark MLLIb's HashingTF() function? -

apache spark mllib has hashingtf() function takes tokenized words input , converts sets fixed-length feature vectors.

as mentioned in documentation link spark mlib documentation

it advisable use power of 2 feature dimension.

the question whether exponent value number of terms in input

if yes, suppose if consider more 1000 text document input has more 5000 terms , feature dimension become 2^5000

whether assumption correct or there other way find exponent value

Comments