segmentation fault - LLVM generated vector math assembly segfaulting -


the following llvm function takes 3 pointers arrays of 5 doubles , calculates c = a*b + c:

define void @my_vector_math(double* %a1, double* %b2, double* %c3) { entry:   %0 = bitcast double* %c3 <5 x double>*   %1 = load <5 x double>* %0, align 64   %2 = bitcast double* %b2 <5 x double>*   %3 = load <5 x double>* %2, align 64   %4 = bitcast double* %a1 <5 x double>*   %5 = load <5 x double>* %4, align 64   %6 = fmul <5 x double> %3, %5   %7 = fadd <5 x double> %1, %6   store <5 x double> %7, <5 x double>* %0, align 64   ret void } 

it compiles down following assembly on machine (64 bit ubuntu 13.10, llvm 3.4):

  30:   c5 fb 10 42 20          vmovsd 0x20(%rdx),%xmm0   35:   c5 fb 10 4e 20          vmovsd 0x20(%rsi),%xmm1   3a:   c5 fd 28 16             vmovapd (%rsi),%ymm2   3e:   c5 fb 10 5f 20          vmovsd 0x20(%rdi),%xmm3   43:   c5 f5 59 cb             vmulpd %ymm3,%ymm1,%ymm1   47:   c5 ed 59 17             vmulpd (%rdi),%ymm2,%ymm2   4b:   c5 fd 58 c1             vaddpd %ymm1,%ymm0,%ymm0   4f:   c5 ed 58 0a             vaddpd (%rdx),%ymm2,%ymm1   53:   c5 fd 29 0a             vmovapd %ymm1,(%rdx)   57:   c5 f9 13 42 20          vmovlpd %xmm0,0x20(%rdx)   5c:   c5 f8 77                vzeroupper   5f:   c3                      retq 

called arrays of size 5 segfaults, produce correct results oversized arrays:

int main() {   double a[] = {1,1,1,1,1};   double b[] = {2,2,2,2,2};   double c[] = {3,3,3,3,3};   my_vector_math(a, b, c); // segfaults   (int = 0; < 5; i++) {     printf("%f\n",c[i]);   }   return 0; } 

i'm having lot of trouble figuring out why happening, pointers appreciated.


edit:

the llvm ir without optimization passes:

define void @my_vector_math(double* %a1, double* %b2, double* %c3) { entry:   %a = alloca <5 x double>   %b = alloca <5 x double>   %c = alloca <5 x double>   %c = alloca double*   %b = alloca double*   %a = alloca double*   store double* %a1, double** %a   store double* %b2, double** %b   store double* %c3, double** %c   %c4 = load double** %c   %0 = bitcast double* %c4 <5 x double>*   %1 = load <5 x double>* %0   store <5 x double> %1, <5 x double>* %c   %b5 = load double** %b   %2 = bitcast double* %b5 <5 x double>*   %3 = load <5 x double>* %2   store <5 x double> %3, <5 x double>* %b   %a6 = load double** %a   %4 = bitcast double* %a6 <5 x double>*   %5 = load <5 x double>* %4   store <5 x double> %5, <5 x double>* %a   %c7 = load <5 x double>* %c   %a8 = load <5 x double>* %a   %b9 = load <5 x double>* %b   %6 = fmul <5 x double> %a8, %b9   %7 = fadd <5 x double> %c7, %6   %c10 = load double** %c   %8 = bitcast double* %c10 <5 x double>*   store <5 x double> %7, <5 x double>* %8   ret void } 

your ir claims arrays of doubles have 64-byte alignment, causes compiler generate aligned load. intended specify 8-byte alignment instead (the natural alignment of double on many platforms).


Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -