0
Not a bug

exp() speed windows vs linux comparison

Michal Kvasnicka 3 years ago updated by Pavel Holoborodko 3 years ago 5

Hi I am trying to compare latest versions MCT exp() speed on two platforms (windows and linux). 

Windows:

Windows 7 Pro 64bit, Matlab R2017b, MCT 4.4.4 Build 12668

Linux:

Ubuntu 16.04.3 64bit, Matlab R2017b, MCT 4.4.4 Build 12666


I found very strange results:


rng(1), n = 1000;
A = randn(n); A_mp = mp(A,34);

t = clock; X = exp(A);     t_dp = etime(clock, t)
t = clock; X = exp(A_mp);  t_mp = etime(clock, t)


Windows:

t_dp =

    0.0260

t_mp =

    0.0550


Linux:

t_dp =

    0.0065

t_mp =

    0.2184


Windows and Linux PC has different HW (Linux PC is significantly faster ... see double precision timing), but on linux is quadruple precision computing significantly slower.


Is there some bug in Linux release?



Not a bug

Hi Michal,


There is no bug in GNU Linux version of toolbox.

The difference in performance is because we have better development tools for Windows.


We have full stack of Intel development tools on Windows: Intel Compiler (ICC), Intel Profiler (VTune), etc.

This allows us to reach much better performance on Windows compared to other platforms. 


We are non-for-profit company and we cannot afford full stack of Intel development tools for ALL platforms - Intel gives us no discounts despite our multiple requests.  


Thus we are using GCC on GNU Linux and MacOSX. GCC stack of tools is good but it is light years behind the Intel compiler. In fact, we maintain our own version of GCC to make it of acceptable quality.


Hopefully we will be able to use Intel tools on all platforms in future

Pavel,


thanks for your reply. Yes, I know that Intel parallel studio XE is best compiler suite available (I am using this compiler for many years, too). But what I do not understand is the fact, that you provide commercially the MCT for Linux and MacOSX with so significantly degraded performance. I just made some more thorough speed comparison testing and the speed slowdown factor  > 5x (Linux vs Windows) is very common! This situation is really ugly surprise for me.


So, from my point of view, would be fair to all MCT users explicitly mention this fact, because there are definitely a lot of users which are using the MCT mainly on Linux machines (including me). On the other hand,I am sure, that majority of users operate MCT on Windows platform, so your decision to make best code optimization for Windows platform is fully understandable.


Finally, I am desperately waiting for future Linux MCT release based on Intel development tools, because the performance is the key requirement in multi-precision computing.

Dear Michal,


We do explicitly state that toolbox's performance is lower on *Nix systems. For example, please check this comparison post: https://www.advanpix.com/2016/04/27/performance-of-elementary-functions/


However, we do not consider such performance difference is too dramatic or of any "deceiving nature" from our side.

First of all, Linux is mainly used on high-performance computers with many CPU/cores. High number of cores/CPU compensate the performance loss and actually results in higher speed, if enough cores are used. (Please open mpstartup.m script in toolbox directory and read comments for mp.NumberOfThreads command for optimal tuning).


Secondly,  toolbox performance is higher on GNU Linux in some other areas (e.g. matrix computations for moderate precision ranges) compared to Windows version. Thanks to better thread balance, etc. 

MacOSX allows even better thread handling, which leads to even higher performance. Also different CPUs have different capabilities, instruction set, cache sizes, etc - it is natural for software to deliver variable performance in such diverse environments.


Thirdly, toolbox performance is the highest on *nix systems compared to any other competing software.

This is the most important.


Overall 10% of our users rely on GNU Linux (and 15% use MacOSX). We work hard to provide the best performance everywhere, but some things are out of our control. I am sorry for such inconveniences.

Pavel,

thanks again for your comprehensive answer.


1. This is not fully valid answer. Many users using Linux as a regular desktop computer (not as HPC server). And performance of elementary functions is definitely significantly degraded on Linux relative to Windows. I am not speaking now about matrix operations at all.


2. You are right, that on *nix systems is MCT fastest solution currently available. Well done!!!  I know, as you already said, that there are licensing (money) problem on your site to get Intel dev tools for Linux and MacOSX platform. But, you must understand, that for potential commercial users is this argument nearly irrelevant.


3. Anyway ... In the next year I will try to persuade my boss to buy at least one commercial license to our company :) ... to support your next development.


Thanks for your great toolbox!!!

Thank you for your support Michal! This would definitely help us a lot :). 


One more thing - the difference in performance happens only in quadruple precision mode.

Try for example other than 34 setting in mp.Digits().


That is because quadruple case is where low-level optimizations matters the most - and Intel compiler is the best at this level. 


Other, true multiplrecision code has more uniform performance across the platforms / compilers, since memory bottlenecks is the most important issue there.

Thank you,

Pavel.