c++ - Count floating-point instructions -
i trying count number of floating point operations in 1 of programs , think perf
tool looking (are there alternatives?), have trouble limiting function/block of code. lets take following example:
#include <complex> #include <cstdlib> #include <iostream> #include <type_traits> template <typename t> typename std::enable_if<std::is_floating_point<t>::value, t>::type myrand() { return static_cast <t> (std::rand()) / static_cast <t> (rand_max); } template <typename t> typename std::enable_if<!std::is_floating_point<t>::value, std::complex<typename t::value_type>>::type myrand() { typedef typename t::value_type s; return std::complex<s>( static_cast <s> (std::rand()) / static_cast <s> (rand_max), static_cast <s> (std::rand()) / static_cast <s> (rand_max) ); } int main() { auto const = myrand<type>(); auto const b = myrand<type>(); // count here auto const c = * b; // stop counting here // prevent compiler optimizing away c std::cout << c << "\n"; return 0; }
the myrand()
function returns random number, if type t complex random complex number. did not hardcode doubles program because optimized away compiler.
you can compile file (lets call bench.cpp
) c++ -std=c++0x -dtype=double bench.cpp
.
now count number of floating point operations, can done on processor (nehalem architecture, x86_64 floating point done scalar sse) event r8010
(see intel manual 3b, section 19.5). can done with
perf stat -e r8010 ./a.out
and works expected; counts overall number of uops (is there table telling how many uops movsd
e.g. is?) , i interested in number multiplication (see in example above).
how can done?
i found way this, although not using perf
instead corresponding perf api. 1 first has define perf_event_open
function syscall:
#include <cstdlib> // stdlib.h c #include <cstdio> // stdio.h c #include <cstring> // string.h c #include <unistd.h> #include <sys/ioctl.h> #include <linux/perf_event.h> #include <asm/unistd.h> long perf_event_open( perf_event_attr* hw_event, pid_t pid, int cpu, int group_fd, unsigned long flags ) { int ret = syscall(__nr_perf_event_open, hw_event, pid, cpu, group_fd, flags); return ret; }
next, 1 selects events 1 wishes count:
perf_event_attr attr; // select want count std::memset(&attr, 0, sizeof(perf_event_attr)); attr.size = sizeof(perf_event_attr); attr.type = perf_type_hardware; attr.config = perf_count_hw_instructions; attr.disabled = 1; attr.exclude_kernel = 1; // not count instruction kernel executes attr.exclude_hv = 1; // open file descriptor int fd = perf_event_open(&attr, 0, -1, -1, 0); if (fd == -1) { // handle error }
in case want count number of instructions. floating point instructions can counted on processor (nehalem) replacing corresponding lines with
attr.type = perf_type_raw; attr.config = 0x8010; // event number = 10h, umask value = 80h
by setting type raw 1 can count every event processor offering; number 0x8010
specifies one. note number highly processor-dependent! 1 can find right numbers in intel manual 3b, part2, chapter 19, picking right subsection.
one can measure code enclosing in
// reset , enable counter ioctl(fd, perf_event_ioc_reset, 0); ioctl(fd, perf_event_ioc_enable, 0); // perform computation should measured here // disable , read out counter ioctl(fd, perf_event_ioc_disable, 0); long long count; read(fd, &count, sizeof(long long)); // count has (approximated) result // close file descriptor close(fd);
Comments
Post a Comment