c++ - Get nearest centroid using Thrust library? (K-Means) -
i finished computing distances , stored in thrust vector, instance, have 2 centroids , 5 datapoints , way computed distances each centroid computed distances 5 datapoints first , stored in array , later other centroid in 1d array in distances, this:
for (int = 0; < centroids.size(); ++i) { computedistance(data, distances, centroids[i], ndatapoints, ndimensions); }
resulting in vector 1d, instance:
distancesvalues = {10, 15, 20, 12, 10, 5, 17, 22, 8, 7} datapointsindex = {1, 2, 3, 4, 5, 1, 2, 3, 4, 5}
where first 5 values represent centroid 1 , other 5 values centroid 2.
what know if there thrust function in can store count in array of minimum values each centroid?
comparing values of each index, result should be:
counts = {2, 3}
where:
countofcentroid 1 = 2 countofcentroid 2 = 3
here 1 possible approach:
create additional centroid index vector:
distancesvalues = {10, 15, 20, 12, 10, 5, 17, 22, 8, 7} datapointsindex = {1, 2, 3, 4, 5, 1, 2, 3, 4, 5} centroidindex = {1, 1, 1, 1, 1, 2, 2, 2, 2, 2}
now sort_by_key, using
datapointsindex
keys, , other 2 vectors zipped values. has effect rearranging 3 vectorsdatapointsindex
has indices grouped together:datapointsindex = {1, 1, 2, 2, 3, 3, 4, 4, 5, 5}
(and other 2 vectors correspondingly rearranged).
now reduce_by_key. if choose
thrust::minimum
operator, reduction chooses minimum value in group (rather summing values in group). reduce_by_key means reduction of type done on each consecutive group of keys. once again usedatapointsindex
our key vector, , other 2 vectors zipped our value vector. of output of reduce_by_key don't care about, except results vector emanatescentroidindex
vector. counting centroid indices in results vector, can desired output.
here worked example:
$ cat t428.cu #include <thrust/host_vector.h> #include <thrust/device_vector.h> #include <thrust/sort.h> #include <thrust/reduce.h> #include <thrust/copy.h> #include <thrust/iterator/zip_iterator.h> #include <thrust/iterator/discard_iterator.h> #include <stdio.h> #define num_points 5 #define num_centroid 2 #define dsize (num_points*num_centroid) int main(){ int distancesvalues[dsize] = {10, 15, 20, 12, 10, 5, 17, 22, 8, 7}; int datapointsindex[dsize] = {1, 2, 3, 4, 5, 1, 2, 3, 4, 5}; int centroidindex[dsize] = {1, 1, 1, 1, 1, 2, 2, 2, 2, 2}; thrust::device_vector<int> dv(distancesvalues, distancesvalues + dsize); thrust::device_vector<int> di(datapointsindex, datapointsindex + dsize); thrust::device_vector<int> ci(centroidindex, centroidindex + dsize); thrust::device_vector<int> ra(num_points); thrust::device_vector<int> rb(num_points); thrust::sort_by_key(di.begin(), di.end(), thrust::make_zip_iterator(thrust::make_tuple(dv.begin(), ci.begin()))); thrust::reduce_by_key(di.begin(), di.end(), thrust::make_zip_iterator(thrust::make_tuple(dv.begin(), ci.begin())), thrust::make_discard_iterator(), thrust::make_zip_iterator(thrust::make_tuple(ra.begin(), rb.begin())), thrust::equal_to<int>(), thrust::minimum<thrust::tuple<int, int> >()); printf("countofcentroid 1 = %d\n", thrust::count(rb.begin(), rb.end(), 1)); printf("countofcentroid 2 = %d\n", thrust::count(rb.begin(), rb.end(), 2)); return 0; } $ nvcc -arch=sm_20 -o t428 t428.cu $ ./t428 countofcentroid 1 = 2 countofcentroid 2 = 3 $
as eric points out in answer here (your question duplicate of one), sort_by_key
unnecessary. reordering of data follows regular pattern, don't need harness complexity of sort, , can therefore re-order data clever use of iterators. under circumstances, may possible entire operation (approximately) single call reduce_by_key
.
Comments
Post a Comment