c++ - Finding performance issue that may be due to thread locking (possibly) -

i've spent little time running valgrind/callgrind profile server lot of tcp/ip communications using many threads. after time improving performance, realised in particular test scenario, process not cpu bound performance "improvements" i'd looked @ of no use.

in theory, cpu should busy. know tcp/ip device connects isn't limitation server runs on 2 machines. 1 pc other embedded device arm processor. embedded device gets 2% cpu usage far fewer transactions - tenth. both systems 2% though we're trying data fast possible.

my guess mutex locked , holding thread. pure guess! there few threads in system common data. perhaps there other possibilities how tell?

is there anyway use tool valgrind/callgrind might show time spent in system calls? can run on windows visual studio 2012 if that's better.

we might have try walking through code or not sure have time.

any tips appreciated.

thanks.

callgrind great profiler have drawbacks. in particular, assumes same instruction executes in same amount of time, , assumes instruction counts important metric.

this fine getting (mostly) reproducible profiling results , analyzing in detail instructions executed, there types of performance problems callgrind doesn't detect:

time spent waiting locks
time spent sleeping (eg. simple sleep()/usleep() calls slow down application won't show in callgrind)
time spent waiting disk i/o or network i/o
time spent waiting data swapped out
influences cpu cache hits/misses (you can try use cachegrind particular topic)
influences cpu pipeline stalls, branch prediction failures , other features of modern cpus can cause same instruction executed faster or slower depending on context

these problems can detected quite using statistical (or sample-based) profiler. examples sysprof , oprofile, or kind of "poor-man's sampling profiler" described eg. @ https://stackoverflow.com/a/378024. vs2012 built-in profiler mentioned whozcraig appears sampling profiler well.

while statistical profilers useful because provide "real-world" results instead of simple instructions counts, have possible drawback don't reproducible results (the results vary little bit every run), , need gather sufficient number of samples detailed results.

Search This Blog

Brent

c++ - Finding performance issue that may be due to thread locking (possibly) -

Comments

Post a Comment

Popular posts from this blog

inversion of control - Autofac named registration constructor injection -

ios - Change Storyboard View using Seague -

verilog - Systemverilog dynamic casting issues -