c++ - Fusing a triangle loop for parallelization, calculating sub-indices -
a common technique in parallelization fuse nested loops this
for(int i=0; i<n; i++) { for(int j=0; j<n; j++) {
to
for(int x=0; x<n*n; x++) { int = x/n; int j = x%n;
i'm wondering how can fuse triangle loop this
for(int i=0; i<n; i++) { for(int j=0; j<i+1; j++) {
this has n*(n+1)/2
iterations. let's call fused iteration x
. using quadratic formula have come this:
for(int x=0; x<(n*(n+1)/2); x++) { int = (-1 + sqrt(1.0+8.0*x))/2; int j = x - i*(i+1)/2;
unlike fusing square loop requires using sqrt
function , conversions int float , float int.
i'm wondering if there simpler or more efficient way of doing this? example solution not require sqrt
function or conversions int float or float int.
edit: don't want solution depends on previous or next iterations. want solutions int i = funci(x) , int j = funcj(x,i)
here code showing works:
#include <stdio.h> #include <math.h> int main() { int n = 5; int cnt = 0; for(int i=0; i<n; i++) { for(int j=0; j<i+1; j++) { printf("%d: %d %d\n", cnt++, i,j); } } printf("\n"); int nmax = n*(n+1)/2; for(int x=0; x<nmax; x++) { int = (-1 + sqrt(1.0+8.0*x))/2; int j = x - i*(i+1)/2; printf("%d: %d %d\n", x,i,j); } }
considering you're trying fuse triangle intent of parallelizing, non-obvious solution choose non-trivial mapping of x (i,j):
j |\ -> | \ ____ | | \ => |\\ | v |___\ |_\\__|
after all, you're not processing them in special order, exact mapping don't care.
so calculate x->i,j
you'd rectangle, if i > j
{ i=n-i, j = n-j }
(mirror y axis, mirror x axis).
____ |\\ | |\ |\ |_\\__| ==> |_\ __ => | \ / | | \ /__| |___\
Comments
Post a Comment