Subject | : Re^2: 線形代数ルーチンの高速化 |
Date | : 2010/11/05(Fri) 17:03:51 |
Contributor | : Akio Morita |
別の環境で BLAS/LAPACKによるベンチマークを行いました SAD:amorita branch r3415 Module:Math/LPACK extension r3441 OS:FreeBSD/amd64 8.1-STABLE CPU:Intel(R) Xeon(R) CPU X5550 @ 2.67GHz (2666.78-MHz K8-class CPU) Date:2010/11/05 --------------------------------------------------------------------------------- Real Eigensystem[teigen] N = 2 L = 32768 T = .004914 +/- .054961 msec # of failures = 0 N = 4 L = 32768 T = .010696 +/- .070075 msec # of failures = 0 TEIGEN convergence failed. Range = 2 5 Lower right corner = 0.52571493004973358 4.63855639037369441E-002 0.73160999814810090 N = 8 L = 32768 T = .033344 +/- .071230 msec # of failures = 0 N = 16 L = 10240 T = .141852 +/- .126846 msec # of failures = 0 N = 32 L = 2560 T = .836331 +/- .128061 msec # of failures = 0 N = 64 L = 640 T = 4.385395 +/- .289299 msec # of failures = 0 N = 128 L = 160 T = 32.750225 +/- 1.118398 msec # of failures = 0 N = 256 L = 40 T = 308.373100 +/- 9.375773 msec # of failures = 0 N = 512 L = 10 T = 2967.998600 +/- 92.109288 msec # of failures = 0 Real Eigensystem[DGEEVX@LAPACK-3.2.2] N = 2 L = 32768 T = .011405 +/- .072050 msec # of failures = 0 N = 4 L = 32768 T = .020033 +/- .071798 msec # of failures = 0 N = 8 L = 32768 T = .073028 +/- .095900 msec # of failures = 0 N = 16 L = 10240 T = .182777 +/- .128211 msec # of failures = 0 N = 32 L = 2560 T = .942186 +/- .056568 msec # of failures = 0 N = 64 L = 640 T = 15.527852 +/- .385307 msec # of failures = 0 N = 128 L = 160 T = 135.973019 +/- 2.716700 msec # of failures = 0 N = 256 L = 40 T = 543.980725 +/- 5.152799 msec # of failures = 0 N = 512 L = 10 T = 2460.381000 +/- 35.651359 msec # of failures = 0 Real Eigensystem[DGEEVX@ATLAS-3.8.3] N = 2 L = 32768 T = .012499 +/- .040732 msec # of failures = 0 N = 4 L = 32768 T = .022517 +/- .071029 msec # of failures = 0 N = 8 L = 32768 T = .055881 +/- .071899 msec # of failures = 0 N = 16 L = 10240 T = .320816 +/- .129054 msec # of failures = 0 N = 32 L = 2560 T = 1.366923 +/- .276729 msec # of failures = 0 N = 64 L = 640 T = 7.134169 +/- .876691 msec # of failures = 0 N = 128 L = 160 T = 55.171881 +/- 4.410096 msec # of failures = 0 N = 256 L = 40 T = 380.538625 +/- 7.824095 msec # of failures = 0 N = 512 L = 10 T = 2340.383800 +/- 61.905827 msec # of failures = 0 --------------------------------------------------------------------------------- Complex Eigensystem[tceigen] N = 2 L = 32768 T = .005745 +/- .045651 msec # of failures = 0 N = 4 L = 32768 T = .014781 +/- .057704 msec # of failures = 0 N = 8 L = 32768 T = .055616 +/- .044269 msec # of failures = 0 N = 16 L = 10240 T = .271051 +/- .077106 msec # of failures = 0 N = 32 L = 2560 T = 1.633234 +/- .145197 msec # of failures = 0 TCEIGEN Convergence fail. N = 64 L = 640 T = 11.523869 +/- .276761 msec # of failures = 0 N = 128 L = 160 T = 93.776237 +/- 7.066373 msec # of failures = 0 N = 256 L = 40 T = 1090.281600 +/- 13.078611 msec # of failures = 0 N = 512 L = 10 T = 9432.465000 +/- 327.177165 msec # of failures = 0 Complex Eigensystem[ZGEEVX@LAPACK-3.2.2] N = 2 L = 32768 T = .013927 +/- .059110 msec # of failures = 0 N = 4 L = 32768 T = .030352 +/- .058527 msec # of failures = 0 N = 8 L = 32768 T = .085863 +/- .071898 msec # of failures = 0 N = 16 L = 10240 T = .372235 +/- .073488 msec # of failures = 0 N = 32 L = 2560 T = 2.088792 +/- .064885 msec # of failures = 0 N = 64 L = 640 T = 13.832725 +/- .237734 msec # of failures = 0 N = 128 L = 160 T = 100.479919 +/- 1.199283 msec # of failures = 0 N = 256 L = 40 T = 874.881150 +/- 13.824662 msec # of failures = 0 N = 512 L = 10 T = 5517.628600 +/- 177.549475 msec # of failures = 0 Complex Eigensystem[ZGEEVX@ATLAS-3.8.3] N = 2 L = 32768 T = .017170 +/- .088134 msec # of failures = 0 N = 4 L = 32768 T = .036427 +/- .045012 msec # of failures = 0 N = 8 L = 32768 T = .104784 +/- .078750 msec # of failures = 0 N = 16 L = 10240 T = .489181 +/- .045275 msec # of failures = 0 N = 32 L = 2560 T = 2.414941 +/- .214154 msec # of failures = 0 N = 64 L = 640 T = 15.113606 +/- .927262 msec # of failures = 0 N = 128 L = 160 T = 107.582538 +/- 2.169691 msec # of failures = 0 N = 256 L = 40 T = 921.144450 +/- 78.077925 msec # of failures = 0 N = 512 L = 10 T = 6997.876600 +/- 65.148720 msec # of failures = 0 --------------------------------------------------------------------------------- Real SingularValues[tsvdm] N = 2 L = 32768 T = .009644 +/- .046258 msec # of failures = 0 N = 4 L = 32768 T = .012214 +/- .067921 msec # of failures = 0 N = 8 L = 32768 T = .024322 +/- .041449 msec # of failures = 0 N = 16 L = 10240 T = .075302 +/- .073760 msec # of failures = 0 N = 32 L = 2560 T = .343740 +/- .036861 msec # of failures = 0 N = 64 L = 640 T = 3.107773 +/- .221245 msec # of failures = 0 N = 128 L = 160 T = 28.426794 +/- 4.764495 msec # of failures = 0 N = 256 L = 40 T = 480.008375 +/- 56.490995 msec # of failures = 0 N = 512 L = 10 T = 3570.363100 +/- 61.897338 msec # of failures = 0 Real SingularValues[DGESDD@LAPACK-3.2.2] N = 2 L = 32768 T = .026266 +/- .082811 msec # of failures = 0 N = 4 L = 32768 T = .039730 +/- .058722 msec # of failures = 0 N = 8 L = 32768 T = .085655 +/- .083125 msec # of failures = 0 N = 16 L = 10240 T = .277530 +/- .075076 msec # of failures = 0 N = 32 L = 2560 T = 1.038829 +/- .022839 msec # of failures = 0 N = 64 L = 640 T = 5.943048 +/- .239012 msec # of failures = 0 N = 128 L = 160 T = 38.459394 +/- .153021 msec # of failures = 0 N = 256 L = 40 T = 286.069500 +/- .713294 msec # of failures = 0 N = 512 L = 10 T = 2081.214000 +/- 2.634993 msec # of failures = 0 Real SingularValues[DGESDD@ATLAS-3.8.3] N = 2 L = 32768 T = .019553 +/- .054526 msec # of failures = 0 N = 4 L = 32768 T = .035733 +/- .075945 msec # of failures = 0 N = 8 L = 32768 T = .071613 +/- .081462 msec # of failures = 0 N = 16 L = 10240 T = .201926 +/- .106510 msec # of failures = 0 N = 32 L = 2560 T = .560016 +/- .142055 msec # of failures = 0 N = 64 L = 640 T = 2.359422 +/- .312576 msec # of failures = 0 N = 128 L = 160 T = 10.252638 +/- .265303 msec # of failures = 0 N = 256 L = 40 T = 55.665500 +/- 3.808465 msec # of failures = 0 N = 512 L = 10 T = 304.373800 +/- 19.260853 msec # of failures = 0 --------------------------------------------------------------------------------- Complex SingularValues[tcsvdm] N = 2 L = 32768 T = .010985 +/- .078168 msec # of failures = 0 N = 4 L = 32768 T = .017390 +/- .010494 msec # of failures = 0 N = 8 L = 32768 T = 5.313555 +/- 4.616588 msec # of failures = 0(*slow) N = 16 L = 10240 T = .173400 +/- .103538 msec # of failures = 0 N = 32 L = 2560 T = .939715 +/- .022199 msec # of failures = 0 N = 64 L = 640 T = 6.465434 +/- .100230 msec # of failures = 0 N = 128 L = 160 T = 57.908787 +/- 4.018248 msec # of failures = 0 N = 256 L = 40 T = 669.606875 +/- 4.180820 msec # of failures = 0 N = 512 L = 10 T = 5147.778300 +/- 45.043540 msec # of failures = 0 Complex SingularValues[ZGESDD@LAPACK-3.2.2] N = 2 L = 32768 T = .022014 +/- .041939 msec # of failures = 0 N = 4 L = 32768 T = .033519 +/- .041508 msec # of failures = 0 N = 8 L = 32768 T = 4.537111 +/- 3.173818 msec # of failures = 0(*slow) N = 16 L = 10240 T = .234432 +/- .007166 msec # of failures = 0 N = 32 L = 2560 T = .996835 +/- .011787 msec # of failures = 0 N = 64 L = 640 T = 6.394497 +/- .065906 msec # of failures = 0 N = 128 L = 160 T = 40.739569 +/- .357006 msec # of failures = 0 N = 256 L = 40 T = 316.512525 +/- 1.539484 msec # of failures = 0 N = 512 L = 10 T = 2255.242000 +/- 3.732018 msec # of failures = 0 Complex SingularValues[ZGESDD@ATLAS-3.8.3] N = 2 L = 32768 T = .025624 +/- .084478 msec # of failures = 0 N = 4 L = 32768 T = .044957 +/- .060120 msec # of failures = 0 N = 8 L = 32768 T = .094604 +/- .060680 msec # of failures = 0 N = 16 L = 10240 T = .259461 +/- .076981 msec # of failures = 0 N = 32 L = 2560 T = 1.003860 +/- .035998 msec # of failures = 0 N = 64 L = 640 T = 4.478936 +/- .153480 msec # of failures = 0 N = 128 L = 160 T = 23.881556 +/- .207277 msec # of failures = 0 N = 256 L = 40 T = 148.580275 +/- 1.568393 msec # of failures = 0 N = 512 L = 10 T = 805.882300 +/- 4.571945 msec # of failures = 0 --------------------------------------------------------------------------------- Real LinearSolve[tsolvm] N = 2 L = 32768 T = .009529 +/- .061895 msec # of failures = 0 N = 4 L = 32768 T = .010196 +/- .057294 msec # of failures = 0 N = 8 L = 32768 T = .014949 +/- .057801 msec # of failures = 0 N = 16 L = 10240 T = .037556 +/- .073777 msec # of failures = 0 N = 32 L = 2560 T = .160285 +/- .145266 msec # of failures = 0 N = 64 L = 640 T = 1.574538 +/- .050661 msec # of failures = 0 N = 128 L = 160 T = 17.699331 +/- .052163 msec # of failures = 0 N = 256 L = 40 T = 225.097700 +/- .333980 msec # of failures = 0 N = 512 L = 10 T = 2193.822700 +/- 10.421862 msec # of failures = 0 Real LinearSolve[DGELSD@LAPACK-3.2.2] N = 2 L = 32768 T = .019176 +/- .071623 msec # of failures = 0 N = 4 L = 32768 T = .028825 +/- .042434 msec # of failures = 0 N = 8 L = 32768 T = .056619 +/- .041981 msec # of failures = 0 N = 16 L = 10240 T = .182530 +/- .009877 msec # of failures = 0 N = 32 L = 2560 T = .755317 +/- .017439 msec # of failures = 0 N = 64 L = 640 T = 4.405711 +/- .174577 msec # of failures = 0 N = 128 L = 160 T = 28.793131 +/- 1.137125 msec # of failures = 0 N = 256 L = 40 T = 215.818675 +/- .645456 msec # of failures = 0 N = 512 L = 10 T = 1561.887900 +/- 2.827443 msec # of failures = 0 Real LinearSolve[DGELSD@ATLAS-3.8.3] N = 2 L = 32768 T = .021839 +/- .082602 msec # of failures = 0 N = 4 L = 32768 T = .037695 +/- .081893 msec # of failures = 0 N = 8 L = 32768 T = .073539 +/- .072191 msec # of failures = 0 N = 16 L = 10240 T = .203711 +/- .107180 msec # of failures = 0 N = 32 L = 2560 T = .657891 +/- .143076 msec # of failures = 0 N = 64 L = 640 T = 2.880484 +/- .055372 msec # of failures = 0 N = 128 L = 160 T = 15.440688 +/- .074582 msec # of failures = 0 N = 256 L = 40 T = 95.026875 +/- 3.481534 msec # of failures = 0 N = 512 L = 10 T = 579.009100 +/- .788075 msec # of failures = 0 ---------------------------------------------------------------------------------