c++ - Fusing a triangle loop for parallelization, calculating sub-indices -
a common technique in parallelization fuse nested loops this
for(int i=0; i<n; i++) { for(int j=0; j<n; j++) { to
for(int x=0; x<n*n; x++) { int = x/n; int j = x%n; i'm wondering how can fuse triangle loop this
for(int i=0; i<n; i++) { for(int j=0; j<i+1; j++) { this has n*(n+1)/2 iterations. let's call fused iteration x. using quadratic formula have come this:
for(int x=0; x<(n*(n+1)/2); x++) { int = (-1 + sqrt(1.0+8.0*x))/2; int j = x - i*(i+1)/2; unlike fusing square loop requires using sqrt function , conversions int float , float int.
i'm wondering if there simpler or more efficient way of doing this? example solution not require sqrt function or conversions int float or float int.
edit: don't want solution depends on previous or next iterations. want solutions int i = funci(x) , int j = funcj(x,i)
here code showing works:
#include <stdio.h> #include <math.h> int main() { int n = 5; int cnt = 0; for(int i=0; i<n; i++) { for(int j=0; j<i+1; j++) { printf("%d: %d %d\n", cnt++, i,j); } } printf("\n"); int nmax = n*(n+1)/2; for(int x=0; x<nmax; x++) { int = (-1 + sqrt(1.0+8.0*x))/2; int j = x - i*(i+1)/2; printf("%d: %d %d\n", x,i,j); } }
considering you're trying fuse triangle intent of parallelizing, non-obvious solution choose non-trivial mapping of x (i,j):
j |\ -> | \ ____ | | \ => |\\ | v |___\ |_\\__| after all, you're not processing them in special order, exact mapping don't care.
so calculate x->i,j you'd rectangle, if i > j { i=n-i, j = n-j } (mirror y axis, mirror x axis).
____ |\\ | |\ |\ |_\\__| ==> |_\ __ => | \ / | | \ /__| |___\
Comments
Post a Comment