0
Answered

Does matrix multiplication use multiple cores?

Michael_ 5 years ago updated by Pavel Holoborodko 5 years ago 15
Hello Pavel.

I was wondering if you updated the software to use multiple cores in case of matrix multiplication. I read something about it in the change log, but it seems to me it still uses 1 core all the time.

Thank you.

Best regards,

Michael
Under review
Matrix multiplication should run in parallel (we added this almost from the beginning - several years ago).

Please let me know the matrix size, precision, toolbox & MATLAB & OS versions and CPU model.
Never mind. It works perfectly and really fast. The problem is/was the evaluation of the besselk-function. I was surprised that I did not have any speed improvements since the version of 2014 but the reason for that is not the matrix multiplication (which is way faster now), it is the evaluation of the besselk function. Are there any plans to improve the performance of the evaluatoins?
Thank you.
The current algorithm for besselk is consecutive and it focuses more on accuracy than speed.
I am not sure how to speed it up quickly. Will review it for new ideas.

Any chance you could try to divise computations of besselk by high-level threads? Using parfor/smpd for example?

(If you are computing matrix elements with besselk, probably you can compute different parts of the matrix in different workers/threads?)

P.S.
Performance of parallel algorithms (e.g. matrix multiplication) can be boosted up much further. But unfortunately there is a nasty bug in Intel threads runtime I am using for parallelism. They use simple spin-lock for 200msec at the end of every thread - which consumes a lot of CPU power! For example, if we use 8 threads, then possible overhead is 8*200msec = 1.6sec!! This alone makes parallelism inefficient for pieces of code which run under 1.6sec - rules out the mid-sized matrices.

If I reduce this delay - their runtime freezes in cases when only 2-cores are under load.
Now I am working on a custom thread manager among other things.
What arguments you use for besselk? Real, complex (what quadrant)?

Now I am working on building efficient rational approximations for besselk for real arguments (quadruple case). Should be fast.
I have only pure real or pure imaginary arguments. Thank you for your efforts.

Dear Michael,


I have just released new version with improved besselk & besseli: http://goo.gl/1jRKP0


Now both are more accurate, stable and faster. Accuracy is guaranteed up to last digit for all integer orders and complex arguments.


Please let me know if you notice any speed improvement.


Current timings on Core i7 (1st gen):

% real arguments
>> X = 150*mp(rand(50));
>> tic; Y = besselk(0,X); toc;
Elapsed time is 0.345778 seconds.
>> tic; Y = besselk(1,X); toc;
Elapsed time is 0.469967 seconds.
% complex arguments
>> X = 150*mp((rand(50)-0.5)+(rand(50)-0.5)*1i);
>> tic; Y = besselk(0,X); toc;
Elapsed time is 1.663378 seconds.
>> tic; Y = besselk(1,X); toc;
Elapsed time is 1.662077 seconds.

There is a good chance that speed will be even higher in next releases - work is in progress.

Dear Pavel,

I got a speed improvement of factor 10-15. Thank you very much.

Do you use quadruple or higher precision?

I used quadruple precision. It is probably even faster. Problem is, I don't have the previous version of the toolbox anymore to make more comparisons, but I think it is between 10-15 times faster. :)

Exact speed-up factor is difficult to estimate, especially when older version had issues with accuracy at some regions :(.


I am asking because I need to know what precision to optimize further :).

Now optimizing the bessely (you also asked for it before, right?)

In general I only use quadruple precision. I will need the bessely function again, but it will have non-integer order. But it is a great improvement for me at this point already.

Half-integer or arbitrary (real, complex)?


P.S. As of today, bessely is faster for integer orders (speed-up is similar to besselk).

Now is in the development trunk, will release it in a few days.

Thank you Pavel. It will use bessel functions with arbitrary order and real or imaginary argument.

+1

I have finished updating the Bessel library today. Now all of them support arbitrary orders & arguments, much more stable and produce guaranteed accuracy up-to last one-two digits.


That was fun week - finally I've done everything I planned for Bessel. The only thing left is deep optimization, I think x50-100 times possible for quadruple mode.


Btw, I found several/many bugs in MATLAB's Bessel functions (from Symbolic Math Toolbox) - few of them are shown in changelog: http://www.advanpix.com/documentation/version-history/

Please be cautious when using the Symbolic Toolbox.