0

Besvaret

# Does matrix multiplication use multiple cores?

Hello Pavel.

I was wondering if you updated the software to use multiple cores in case of matrix multiplication. I read something about it in the change log, but it seems to me it still uses 1 core all the time.

Thank you.

Best regards,

Michael

I was wondering if you updated the software to use multiple cores in case of matrix multiplication. I read something about it in the change log, but it seems to me it still uses 1 core all the time.

Thank you.

Best regards,

Michael

Kundesupport af UserEcho

Please let me know the matrix size, precision, toolbox & MATLAB & OS versions and CPU model.

Thank you.

I am not sure how to speed it up quickly. Will review it for new ideas.

Any chance you could try to divise computations of besselk by high-level threads? Using parfor/smpd for example?

(If you are computing matrix elements with besselk, probably you can compute different parts of the matrix in different workers/threads?)

P.S.

Performance of parallel algorithms (e.g. matrix multiplication) can be boosted up much further. But unfortunately there is a nasty bug in Intel threads runtime I am using for parallelism. They use simple spin-lock for 200msec at the end of every thread - which consumes a lot of CPU power! For example, if we use 8 threads, then possible overhead is 8*200msec = 1.6sec!! This alone makes parallelism inefficient for pieces of code which run under 1.6sec - rules out the mid-sized matrices.

If I reduce this delay - their runtime freezes in cases when only 2-cores are under load.

Now I am working on a custom thread manager among other things.

Now I am working on building efficient rational approximations for besselk for real arguments (quadruple case). Should be fast.

Dear Michael,

I have just released new version with improved besselk & besseli: http://goo.gl/1jRKP0

Now both are more accurate, stable and faster. Accuracy is guaranteed up to last digit for all integer orders and complex arguments.

Please let me know if you notice any speed improvement.

Current timings on Core i7 (1st gen):

There is a good chance that speed will be even higher in next releases - work is in progress.

Dear Pavel,

I got a speed improvement of factor 10-15. Thank you very much.

Do you use quadruple or higher precision?

I used quadruple precision. It is probably even faster. Problem is, I don't have the previous version of the toolbox anymore to make more comparisons, but I think it is between 10-15 times faster. :)

Exact speed-up factor is difficult to estimate, especially when older version had issues with accuracy at some regions :(.

I am asking because I need to know what precision to optimize further :).

Now optimizing the bessely (you also asked for it before, right?)

In general I only use quadruple precision. I will need the bessely function again, but it will have non-integer order. But it is a great improvement for me at this point already.

Half-integer or arbitrary (real, complex)?

P.S. As of today, bessely is faster for integer orders (speed-up is similar to besselk).

Now is in the development trunk, will release it in a few days.

Thank you Pavel. It will use bessel functions with arbitrary order and real or imaginary argument.

I have finished updating the Bessel library today. Now all of them support arbitrary orders & arguments, much more stable and produce guaranteed accuracy up-to last one-two digits.

That was fun week - finally I've done everything I planned for Bessel. The only thing left is deep optimization, I think x50-100 times possible for quadruple mode.

Btw, I found several/many bugs in MATLAB's Bessel functions (from Symbolic Math Toolbox) - few of them are shown in changelog: http://www.advanpix.com/documentation/version-history/

Please be cautious when using the Symbolic Toolbox.