Conference Room SAD
[thread display] [new arrival display] [word search] [past log] [管理用]

Subject Re^2: 線形代数ルーチンの高速化
Date: 2010/11/05(Fri) 17:03:51
ContributorAkio Morita

別の環境で BLAS/LAPACKによるベンチマークを行いました
SAD:amorita branch r3415
Module:Math/LPACK extension r3441
OS:FreeBSD/amd64 8.1-STABLE
CPU:Intel(R) Xeon(R) CPU           X5550  @ 2.67GHz (2666.78-MHz K8-class CPU)
Date:2010/11/05

---------------------------------------------------------------------------------
Real Eigensystem[teigen]
 N =      2 L =  32768 T =      .004914 +/-      .054961 msec   # of failures = 0
 N =      4 L =  32768 T =      .010696 +/-      .070075 msec   # of failures = 0
  TEIGEN convergence failed. Range =           2           5
         Lower right corner =  0.52571493004973358       4.63855639037369441E-002  0.73160999814810090     
 N =      8 L =  32768 T =      .033344 +/-      .071230 msec   # of failures = 0
 N =     16 L =  10240 T =      .141852 +/-      .126846 msec   # of failures = 0
 N =     32 L =   2560 T =      .836331 +/-      .128061 msec   # of failures = 0
 N =     64 L =    640 T =     4.385395 +/-      .289299 msec   # of failures = 0
 N =    128 L =    160 T =    32.750225 +/-     1.118398 msec   # of failures = 0
 N =    256 L =     40 T =   308.373100 +/-     9.375773 msec   # of failures = 0
 N =    512 L =     10 T =  2967.998600 +/-    92.109288 msec   # of failures = 0

Real Eigensystem[DGEEVX@LAPACK-3.2.2]
 N =      2 L =  32768 T =      .011405 +/-      .072050 msec   # of failures = 0
 N =      4 L =  32768 T =      .020033 +/-      .071798 msec   # of failures = 0
 N =      8 L =  32768 T =      .073028 +/-      .095900 msec   # of failures = 0
 N =     16 L =  10240 T =      .182777 +/-      .128211 msec   # of failures = 0
 N =     32 L =   2560 T =      .942186 +/-      .056568 msec   # of failures = 0
 N =     64 L =    640 T =    15.527852 +/-      .385307 msec   # of failures = 0
 N =    128 L =    160 T =   135.973019 +/-     2.716700 msec   # of failures = 0
 N =    256 L =     40 T =   543.980725 +/-     5.152799 msec   # of failures = 0
 N =    512 L =     10 T =  2460.381000 +/-    35.651359 msec   # of failures = 0

Real Eigensystem[DGEEVX@ATLAS-3.8.3]
 N =      2 L =  32768 T =      .012499 +/-      .040732 msec   # of failures = 0
 N =      4 L =  32768 T =      .022517 +/-      .071029 msec   # of failures = 0
 N =      8 L =  32768 T =      .055881 +/-      .071899 msec   # of failures = 0
 N =     16 L =  10240 T =      .320816 +/-      .129054 msec   # of failures = 0
 N =     32 L =   2560 T =     1.366923 +/-      .276729 msec   # of failures = 0
 N =     64 L =    640 T =     7.134169 +/-      .876691 msec   # of failures = 0
 N =    128 L =    160 T =    55.171881 +/-     4.410096 msec   # of failures = 0
 N =    256 L =     40 T =   380.538625 +/-     7.824095 msec   # of failures = 0
 N =    512 L =     10 T =  2340.383800 +/-    61.905827 msec   # of failures = 0

---------------------------------------------------------------------------------
Complex Eigensystem[tceigen]
 N =      2 L =  32768 T =      .005745 +/-      .045651 msec   # of failures = 0
 N =      4 L =  32768 T =      .014781 +/-      .057704 msec   # of failures = 0
 N =      8 L =  32768 T =      .055616 +/-      .044269 msec   # of failures = 0
 N =     16 L =  10240 T =      .271051 +/-      .077106 msec   # of failures = 0
 N =     32 L =   2560 T =     1.633234 +/-      .145197 msec   # of failures = 0
 TCEIGEN Convergence fail.
 N =     64 L =    640 T =    11.523869 +/-      .276761 msec   # of failures = 0
 N =    128 L =    160 T =    93.776237 +/-     7.066373 msec   # of failures = 0
 N =    256 L =     40 T =  1090.281600 +/-    13.078611 msec   # of failures = 0
 N =    512 L =     10 T =  9432.465000 +/-   327.177165 msec   # of failures = 0

Complex Eigensystem[ZGEEVX@LAPACK-3.2.2]
 N =      2 L =  32768 T =      .013927 +/-      .059110 msec   # of failures = 0
 N =      4 L =  32768 T =      .030352 +/-      .058527 msec   # of failures = 0
 N =      8 L =  32768 T =      .085863 +/-      .071898 msec   # of failures = 0
 N =     16 L =  10240 T =      .372235 +/-      .073488 msec   # of failures = 0
 N =     32 L =   2560 T =     2.088792 +/-      .064885 msec   # of failures = 0
 N =     64 L =    640 T =    13.832725 +/-      .237734 msec   # of failures = 0
 N =    128 L =    160 T =   100.479919 +/-     1.199283 msec   # of failures = 0
 N =    256 L =     40 T =   874.881150 +/-    13.824662 msec   # of failures = 0
 N =    512 L =     10 T =  5517.628600 +/-   177.549475 msec   # of failures = 0

Complex Eigensystem[ZGEEVX@ATLAS-3.8.3]
 N =      2 L =  32768 T =      .017170 +/-      .088134 msec   # of failures = 0
 N =      4 L =  32768 T =      .036427 +/-      .045012 msec   # of failures = 0
 N =      8 L =  32768 T =      .104784 +/-      .078750 msec   # of failures = 0
 N =     16 L =  10240 T =      .489181 +/-      .045275 msec   # of failures = 0
 N =     32 L =   2560 T =     2.414941 +/-      .214154 msec   # of failures = 0
 N =     64 L =    640 T =    15.113606 +/-      .927262 msec   # of failures = 0
 N =    128 L =    160 T =   107.582538 +/-     2.169691 msec   # of failures = 0
 N =    256 L =     40 T =   921.144450 +/-    78.077925 msec   # of failures = 0
 N =    512 L =     10 T =  6997.876600 +/-    65.148720 msec   # of failures = 0

---------------------------------------------------------------------------------
Real SingularValues[tsvdm]
 N =      2 L =  32768 T =      .009644 +/-      .046258 msec   # of failures = 0
 N =      4 L =  32768 T =      .012214 +/-      .067921 msec   # of failures = 0
 N =      8 L =  32768 T =      .024322 +/-      .041449 msec   # of failures = 0
 N =     16 L =  10240 T =      .075302 +/-      .073760 msec   # of failures = 0
 N =     32 L =   2560 T =      .343740 +/-      .036861 msec   # of failures = 0
 N =     64 L =    640 T =     3.107773 +/-      .221245 msec   # of failures = 0
 N =    128 L =    160 T =    28.426794 +/-     4.764495 msec   # of failures = 0
 N =    256 L =     40 T =   480.008375 +/-    56.490995 msec   # of failures = 0
 N =    512 L =     10 T =  3570.363100 +/-    61.897338 msec   # of failures = 0

Real SingularValues[DGESDD@LAPACK-3.2.2]
 N =      2 L =  32768 T =      .026266 +/-      .082811 msec   # of failures = 0
 N =      4 L =  32768 T =      .039730 +/-      .058722 msec   # of failures = 0
 N =      8 L =  32768 T =      .085655 +/-      .083125 msec   # of failures = 0
 N =     16 L =  10240 T =      .277530 +/-      .075076 msec   # of failures = 0
 N =     32 L =   2560 T =     1.038829 +/-      .022839 msec   # of failures = 0
 N =     64 L =    640 T =     5.943048 +/-      .239012 msec   # of failures = 0
 N =    128 L =    160 T =    38.459394 +/-      .153021 msec   # of failures = 0
 N =    256 L =     40 T =   286.069500 +/-      .713294 msec   # of failures = 0
 N =    512 L =     10 T =  2081.214000 +/-     2.634993 msec   # of failures = 0

Real SingularValues[DGESDD@ATLAS-3.8.3]
 N =      2 L =  32768 T =      .019553 +/-      .054526 msec   # of failures = 0
 N =      4 L =  32768 T =      .035733 +/-      .075945 msec   # of failures = 0
 N =      8 L =  32768 T =      .071613 +/-      .081462 msec   # of failures = 0
 N =     16 L =  10240 T =      .201926 +/-      .106510 msec   # of failures = 0
 N =     32 L =   2560 T =      .560016 +/-      .142055 msec   # of failures = 0
 N =     64 L =    640 T =     2.359422 +/-      .312576 msec   # of failures = 0
 N =    128 L =    160 T =    10.252638 +/-      .265303 msec   # of failures = 0
 N =    256 L =     40 T =    55.665500 +/-     3.808465 msec   # of failures = 0
 N =    512 L =     10 T =   304.373800 +/-    19.260853 msec   # of failures = 0

---------------------------------------------------------------------------------
Complex SingularValues[tcsvdm]
 N =      2 L =  32768 T =      .010985 +/-      .078168 msec   # of failures = 0
 N =      4 L =  32768 T =      .017390 +/-      .010494 msec   # of failures = 0
 N =      8 L =  32768 T =     5.313555 +/-     4.616588 msec   # of failures = 0(*slow)
 N =     16 L =  10240 T =      .173400 +/-      .103538 msec   # of failures = 0
 N =     32 L =   2560 T =      .939715 +/-      .022199 msec   # of failures = 0
 N =     64 L =    640 T =     6.465434 +/-      .100230 msec   # of failures = 0
 N =    128 L =    160 T =    57.908787 +/-     4.018248 msec   # of failures = 0
 N =    256 L =     40 T =   669.606875 +/-     4.180820 msec   # of failures = 0
 N =    512 L =     10 T =  5147.778300 +/-    45.043540 msec   # of failures = 0

Complex SingularValues[ZGESDD@LAPACK-3.2.2]
 N =      2 L =  32768 T =      .022014 +/-      .041939 msec   # of failures = 0
 N =      4 L =  32768 T =      .033519 +/-      .041508 msec   # of failures = 0
 N =      8 L =  32768 T =     4.537111 +/-     3.173818 msec   # of failures = 0(*slow)
 N =     16 L =  10240 T =      .234432 +/-      .007166 msec   # of failures = 0
 N =     32 L =   2560 T =      .996835 +/-      .011787 msec   # of failures = 0
 N =     64 L =    640 T =     6.394497 +/-      .065906 msec   # of failures = 0
 N =    128 L =    160 T =    40.739569 +/-      .357006 msec   # of failures = 0
 N =    256 L =     40 T =   316.512525 +/-     1.539484 msec   # of failures = 0
 N =    512 L =     10 T =  2255.242000 +/-     3.732018 msec   # of failures = 0

Complex SingularValues[ZGESDD@ATLAS-3.8.3]
 N =      2 L =  32768 T =      .025624 +/-      .084478 msec   # of failures = 0
 N =      4 L =  32768 T =      .044957 +/-      .060120 msec   # of failures = 0
 N =      8 L =  32768 T =      .094604 +/-      .060680 msec   # of failures = 0
 N =     16 L =  10240 T =      .259461 +/-      .076981 msec   # of failures = 0
 N =     32 L =   2560 T =     1.003860 +/-      .035998 msec   # of failures = 0
 N =     64 L =    640 T =     4.478936 +/-      .153480 msec   # of failures = 0
 N =    128 L =    160 T =    23.881556 +/-      .207277 msec   # of failures = 0
 N =    256 L =     40 T =   148.580275 +/-     1.568393 msec   # of failures = 0
 N =    512 L =     10 T =   805.882300 +/-     4.571945 msec   # of failures = 0

---------------------------------------------------------------------------------
Real LinearSolve[tsolvm]
 N =      2 L =  32768 T =      .009529 +/-      .061895 msec   # of failures = 0
 N =      4 L =  32768 T =      .010196 +/-      .057294 msec   # of failures = 0
 N =      8 L =  32768 T =      .014949 +/-      .057801 msec   # of failures = 0
 N =     16 L =  10240 T =      .037556 +/-      .073777 msec   # of failures = 0
 N =     32 L =   2560 T =      .160285 +/-      .145266 msec   # of failures = 0
 N =     64 L =    640 T =     1.574538 +/-      .050661 msec   # of failures = 0
 N =    128 L =    160 T =    17.699331 +/-      .052163 msec   # of failures = 0
 N =    256 L =     40 T =   225.097700 +/-      .333980 msec   # of failures = 0
 N =    512 L =     10 T =  2193.822700 +/-    10.421862 msec   # of failures = 0

Real LinearSolve[DGELSD@LAPACK-3.2.2]
 N =      2 L =  32768 T =      .019176 +/-      .071623 msec   # of failures = 0
 N =      4 L =  32768 T =      .028825 +/-      .042434 msec   # of failures = 0
 N =      8 L =  32768 T =      .056619 +/-      .041981 msec   # of failures = 0
 N =     16 L =  10240 T =      .182530 +/-      .009877 msec   # of failures = 0
 N =     32 L =   2560 T =      .755317 +/-      .017439 msec   # of failures = 0
 N =     64 L =    640 T =     4.405711 +/-      .174577 msec   # of failures = 0
 N =    128 L =    160 T =    28.793131 +/-     1.137125 msec   # of failures = 0
 N =    256 L =     40 T =   215.818675 +/-      .645456 msec   # of failures = 0
 N =    512 L =     10 T =  1561.887900 +/-     2.827443 msec   # of failures = 0

Real LinearSolve[DGELSD@ATLAS-3.8.3]
 N =      2 L =  32768 T =      .021839 +/-      .082602 msec   # of failures = 0
 N =      4 L =  32768 T =      .037695 +/-      .081893 msec   # of failures = 0
 N =      8 L =  32768 T =      .073539 +/-      .072191 msec   # of failures = 0
 N =     16 L =  10240 T =      .203711 +/-      .107180 msec   # of failures = 0
 N =     32 L =   2560 T =      .657891 +/-      .143076 msec   # of failures = 0
 N =     64 L =    640 T =     2.880484 +/-      .055372 msec   # of failures = 0
 N =    128 L =    160 T =    15.440688 +/-      .074582 msec   # of failures = 0
 N =    256 L =     40 T =    95.026875 +/-     3.481534 msec   # of failures = 0
 N =    512 L =     10 T =   579.009100 +/-      .788075 msec   # of failures = 0

---------------------------------------------------------------------------------


- 関連一覧ツリー (Click ▼ to display all articles in a thread.)