hash - how to set target feature dimension in Spark MLLIb's HashingTF() function? -
apache spark mllib has hashingtf() function takes tokenized words input , converts sets fixed-length feature vectors.
as mentioned in documentation link spark mlib documentation
it advisable use power of 2 feature dimension.
the question whether exponent value number of terms in input
if yes, suppose if consider more 1000 text document input has more 5000 terms , feature dimension become 2^5000
whether assumption correct or there other way find exponent value
Comments
Post a Comment