c++ - Count floating-point instructions -


i trying count number of floating point operations in 1 of programs , think perf tool looking (are there alternatives?), have trouble limiting function/block of code. lets take following example:

#include <complex> #include <cstdlib> #include <iostream> #include <type_traits>  template <typename t> typename std::enable_if<std::is_floating_point<t>::value, t>::type myrand() {         return static_cast <t> (std::rand()) / static_cast <t> (rand_max); }  template <typename t> typename std::enable_if<!std::is_floating_point<t>::value, std::complex<typename t::value_type>>::type myrand() {         typedef typename t::value_type s;          return std::complex<s>(                 static_cast <s> (std::rand()) / static_cast <s> (rand_max),                 static_cast <s> (std::rand()) / static_cast <s> (rand_max)         ); }  int main() {     auto const = myrand<type>();     auto const b = myrand<type>();      // count here     auto const c = * b;     // stop counting here      // prevent compiler optimizing away c     std::cout << c << "\n";      return 0; } 

the myrand() function returns random number, if type t complex random complex number. did not hardcode doubles program because optimized away compiler.

you can compile file (lets call bench.cpp) c++ -std=c++0x -dtype=double bench.cpp.

now count number of floating point operations, can done on processor (nehalem architecture, x86_64 floating point done scalar sse) event r8010 (see intel manual 3b, section 19.5). can done with

perf stat -e r8010 ./a.out 

and works expected; counts overall number of uops (is there table telling how many uops movsd e.g. is?) , i interested in number multiplication (see in example above).

how can done?

i found way this, although not using perf instead corresponding perf api. 1 first has define perf_event_open function syscall:

#include <cstdlib> // stdlib.h c #include <cstdio> // stdio.h c #include <cstring> // string.h c #include <unistd.h> #include <sys/ioctl.h> #include <linux/perf_event.h> #include <asm/unistd.h>  long perf_event_open(     perf_event_attr* hw_event,     pid_t pid,     int cpu,     int group_fd,     unsigned long flags ) {     int ret = syscall(__nr_perf_event_open, hw_event, pid, cpu, group_fd, flags);     return ret; } 

next, 1 selects events 1 wishes count:

perf_event_attr attr;  // select want count std::memset(&attr, 0, sizeof(perf_event_attr)); attr.size = sizeof(perf_event_attr); attr.type = perf_type_hardware; attr.config = perf_count_hw_instructions; attr.disabled = 1; attr.exclude_kernel = 1; // not count instruction kernel executes attr.exclude_hv = 1;  // open file descriptor int fd = perf_event_open(&attr, 0, -1, -1, 0);  if (fd == -1) {     // handle error } 

in case want count number of instructions. floating point instructions can counted on processor (nehalem) replacing corresponding lines with

attr.type = perf_type_raw; attr.config = 0x8010; // event number = 10h, umask value = 80h 

by setting type raw 1 can count every event processor offering; number 0x8010 specifies one. note number highly processor-dependent! 1 can find right numbers in intel manual 3b, part2, chapter 19, picking right subsection.

one can measure code enclosing in

// reset , enable counter ioctl(fd, perf_event_ioc_reset, 0); ioctl(fd, perf_event_ioc_enable, 0);  // perform computation should measured here  // disable , read out counter ioctl(fd, perf_event_ioc_disable, 0); long long count; read(fd, &count, sizeof(long long)); // count has (approximated) result  // close file descriptor close(fd); 

Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -