Conference Room SAD
[thread display] [new arrival display] [word search] [past log] [管理用]

Subject Re: 線形代数ルーチンの高速化
Date: 2010/11/05(Fri) 15:59:08
ContributorAkio Morita

手元の環境で比較的簡単に入手できる BLAS/LAPACKによるベンチマークを行いました
SAD:amorita branch r3415
Module:Math/LPACK extension r3441
OS:FreeBSD/amd64 8.1-STABLE
CPU:Quad-Core AMD Opteron(tm) Processor 2376 (2300.11-MHz K8-class CPU)

Real Eigensystem[teigen]
 N =      2 L =  32768 T =      .008237 +/-      .140829 msec   # of failures = 0
 N =      4 L =  32768 T =      .014519 +/-      .062383 msec   # of failures = 0
  TEIGEN convergence failed. Range =           2           5
         Lower right corner =  0.52571493004973358       4.63855639037369441E-002  0.73160999814810090     
 N =      8 L =  32768 T =      .048708 +/-      .144691 msec   # of failures = 0
 N =     16 L =  10240 T =      .216757 +/-      .132459 msec   # of failures = 0
 N =     32 L =   2560 T =     1.159240 +/-      .257751 msec   # of failures = 0
 N =     64 L =    640 T =     6.664422 +/-      .284198 msec   # of failures = 0
 N =    128 L =    160 T =    54.954325 +/-     1.280301 msec   # of failures = 0
 N =    256 L =     40 T =   474.934650 +/-     9.550046 msec   # of failures = 0
 N =    512 L =     10 T =  4954.452600 +/-    51.146981 msec   # of failures = 0

Real Eigensystem[DGEEVX@LAPACK-3.2.2]
 N =      2 L =  32768 T =      .013218 +/-      .157030 msec   # of failures = 0
 N =      4 L =  32768 T =      .024014 +/-      .067510 msec   # of failures = 0
 N =      8 L =  32768 T =      .070447 +/-      .110767 msec   # of failures = 0
 N =     16 L =  10240 T =      .280239 +/-      .204191 msec   # of failures = 0
 N =     32 L =   2560 T =     1.517591 +/-      .306330 msec   # of failures = 0
 N =     64 L =    640 T =     8.358041 +/-      .380745 msec   # of failures = 0
 N =    128 L =    160 T =    94.711063 +/-     8.258527 msec   # of failures = 0
 N =    256 L =     40 T =   684.155500 +/-    10.204925 msec   # of failures = 0
 N =    512 L =     10 T =  3075.963100 +/-    45.029070 msec   # of failures = 0

Real Eigensystem[DGEEVX@ATLAS-3.8.3]
 N =      2 L =  32768 T =      .014668 +/-      .160511 msec   # of failures = 0
 N =      4 L =  32768 T =      .027324 +/-      .072267 msec   # of failures = 0
 N =      8 L =  32768 T =      .075819 +/-      .112824 msec   # of failures = 0
 N =     16 L =  10240 T =      .498010 +/-      .199880 msec   # of failures = 0
 N =     32 L =   2560 T =     2.269072 +/-      .379338 msec   # of failures = 0
 N =     64 L =    640 T =    10.689588 +/-     1.262235 msec   # of failures = 0
 N =    128 L =    160 T =    95.835031 +/-     8.315531 msec   # of failures = 0
 N =    256 L =     40 T =   681.952975 +/-    16.886181 msec   # of failures = 0
 N =    512 L =     10 T =  4109.489400 +/-   127.084073 msec   # of failures = 0

Real Eigensystem[DGEEVX@GotoBLAS-2.1.13]
 N =      2 L =  32768 T =      .014868 +/-      .167066 msec   # of failures = 0
 N =      4 L =  32768 T =      .026354 +/-      .093779 msec   # of failures = 0
 N =      8 L =  32768 T =      .070111 +/-      .104149 msec   # of failures = 0
 N =     16 L =  10240 T =      .256042 +/-      .191495 msec   # of failures = 0
 N =     32 L =   2560 T =     1.271190 +/-      .221631 msec   # of failures = 0
 N =     64 L =    640 T =     6.846203 +/-      .493545 msec   # of failures = 0
 N =    128 L =    160 T =    80.511619 +/-     8.542452 msec   # of failures = 0
 N =    256 L =     40 T =   401.314400 +/-     9.575067 msec   # of failures = 0
 N =    512 L =     10 T =  1388.119100 +/-    25.401272 msec   # of failures = 0

Complex Eigensystem[tceigen]
 N =      2 L =  32768 T =      .008042 +/-      .126553 msec   # of failures = 0
 N =      4 L =  32768 T =      .021977 +/-      .058947 msec   # of failures = 0
 N =      8 L =  32768 T =      .084895 +/-      .108231 msec   # of failures = 0
 N =     16 L =  10240 T =      .414323 +/-      .124297 msec   # of failures = 0
 N =     32 L =   2560 T =     2.466422 +/-      .159884 msec   # of failures = 0
 TCEIGEN Convergence fail.
 N =     64 L =    640 T =    17.269752 +/-      .420907 msec   # of failures = 0
 N =    128 L =    160 T =   151.543269 +/-     2.482483 msec   # of failures = 0
 N =    256 L =     40 T =  1536.990525 +/-    18.398861 msec   # of failures = 0
 N =    512 L =     10 T = 13866.533400 +/-   159.996810 msec   # of failures = 0

Complex Eigensystem[ZGEEVX@LAPACK-3.2.2]
 N =      2 L =  32768 T =      .017757 +/-      .059012 msec   # of failures = 0
 N =      4 L =  32768 T =      .040167 +/-      .092559 msec   # of failures = 0
 N =      8 L =  32768 T =      .119518 +/-      .072383 msec   # of failures = 0
 N =     16 L =  10240 T =      .507698 +/-      .105572 msec   # of failures = 0
 N =     32 L =   2560 T =     2.669335 +/-      .141993 msec   # of failures = 0
 N =     64 L =    640 T =    16.661434 +/-      .290309 msec   # of failures = 0
 N =    128 L =    160 T =   123.347500 +/-     1.494198 msec   # of failures = 0
 N =    256 L =     40 T =  1033.310050 +/-    13.795092 msec   # of failures = 0
 N =    512 L =     10 T =  6427.120600 +/-    66.541751 msec   # of failures = 0

Complex Eigensystem[ZGEEVX@ATLAS-3.8.3]
 N =      2 L =  32768 T =      .020239 +/-      .164884 msec   # of failures = 0
 N =      4 L =  32768 T =      .049169 +/-      .102389 msec   # of failures = 0
 N =      8 L =  32768 T =      .137168 +/-      .097606 msec   # of failures = 0
 N =     16 L =  10240 T =      .675185 +/-      .113269 msec   # of failures = 0
 N =     32 L =   2560 T =     3.237613 +/-      .222750 msec   # of failures = 0
 N =     64 L =    640 T =    19.622386 +/-      .812437 msec   # of failures = 0
 N =    128 L =    160 T =   143.707500 +/-     2.858266 msec   # of failures = 0
 N =    256 L =     40 T =  1256.908150 +/-    20.315763 msec   # of failures = 0
 N =    512 L =     10 T = 11165.999600 +/-   172.824775 msec   # of failures = 0

Complex Eigensystem[ZGEEVX@GotoBLAS-2.1.13]
 N =      2 L =  32768 T =      .019002 +/-      .153284 msec   # of failures = 0
 N =      4 L =  32768 T =      .043108 +/-      .073305 msec   # of failures = 0
 N =      8 L =  32768 T =      .114783 +/-      .068736 msec   # of failures = 0
 N =     16 L =  10240 T =      .560256 +/-      .131265 msec   # of failures = 0
 N =     32 L =   2560 T =     2.694456 +/-      .128608 msec   # of failures = 0
 N =     64 L =    640 T =    15.112442 +/-      .442129 msec   # of failures = 0
 N =    128 L =    160 T =    91.809056 +/-     1.331968 msec   # of failures = 0
 N =    256 L =     40 T =   424.419800 +/-     4.313518 msec   # of failures = 0
 N =    512 L =     10 T =  2694.422600 +/-    48.715928 msec   # of failures = 0

Real SingularValues[tsvdm]
 N =      2 L =  32768 T =      .011872 +/-      .135057 msec   # of failures = 0
 N =      4 L =  32768 T =      .017514 +/-      .056119 msec   # of failures = 0
 N =      8 L =  32768 T =      .033522 +/-      .049175 msec   # of failures = 0
 N =     16 L =  10240 T =      .112781 +/-      .078994 msec   # of failures = 0
 N =     32 L =   2560 T =      .513979 +/-      .205664 msec   # of failures = 0
 N =     64 L =    640 T =     3.567764 +/-      .212971 msec   # of failures = 0
 N =    128 L =    160 T =    49.825387 +/-      .410513 msec   # of failures = 0
 N =    256 L =     40 T =   561.219125 +/-     3.144857 msec   # of failures = 0
 N =    512 L =     10 T = 10544.715000 +/-    56.068778 msec   # of failures = 0

Real SingularValues[DGESDD@LAPACK-3.2.2]
 N =      2 L =  32768 T =      .022678 +/-      .187658 msec   # of failures = 0
 N =      4 L =  32768 T =      .032596 +/-      .060802 msec   # of failures = 0
 N =      8 L =  32768 T =      .074625 +/-      .068605 msec   # of failures = 0
 N =     16 L =  10240 T =      .250749 +/-      .085152 msec   # of failures = 0
 N =     32 L =   2560 T =      .920881 +/-      .073305 msec   # of failures = 0
 N =     64 L =    640 T =     4.754948 +/-      .094989 msec   # of failures = 0
 N =    128 L =    160 T =    26.753725 +/-      .139137 msec   # of failures = 0
 N =    256 L =     40 T =   187.731250 +/-      .537708 msec   # of failures = 0
 N =    512 L =     10 T =  1384.883600 +/-     3.334391 msec   # of failures = 0

Real SingularValues[DGESDD@ATLAS-3.8.3]
 N =      2 L =  32768 T =      .019903 +/-      .171499 msec   # of failures = 0
 N =      4 L =  32768 T =      .038882 +/-      .123298 msec   # of failures = 0
 N =      8 L =  32768 T =      .087390 +/-      .083651 msec   # of failures = 0
 N =     16 L =  10240 T =      .271005 +/-      .132160 msec   # of failures = 0
 N =     32 L =   2560 T =      .779123 +/-      .155682 msec   # of failures = 0
 N =     64 L =    640 T =     2.982009 +/-      .240249 msec   # of failures = 0
 N =    128 L =    160 T =    14.705319 +/-      .116347 msec   # of failures = 0
 N =    256 L =     40 T =    76.330825 +/-      .542039 msec   # of failures = 0
 N =    512 L =     10 T =   458.911900 +/-      .526803 msec   # of failures = 0

Real SingularValues[DGESDD@GotoBLAS-2.1.13]
 N =      2 L =  32768 T =      .021825 +/-      .177369 msec   # of failures = 0
 N =      4 L =  32768 T =      .032880 +/-      .081391 msec   # of failures = 0
 N =      8 L =  32768 T =      .069007 +/-      .088089 msec   # of failures = 0
 N =     16 L =  10240 T =      .195701 +/-      .052913 msec   # of failures = 0
 N =     32 L =   2560 T =      .583645 +/-      .071883 msec   # of failures = 0
 N =     64 L =    640 T =     2.153711 +/-      .180950 msec   # of failures = 0
 N =    128 L =    160 T =     9.765350 +/-      .120727 msec   # of failures = 0
 N =    256 L =     40 T =    56.321225 +/-      .302002 msec   # of failures = 0
 N =    512 L =     10 T =   378.101700 +/-      .426204 msec   # of failures = 0

Complex SingularValues[tcsvdm]
 N =      2 L =  32768 T =      .013533 +/-      .147806 msec   # of failures = 0
 N =      4 L =  32768 T =      .024613 +/-      .077979 msec   # of failures = 0
 N =      8 L =  32768 T =      .403952 +/-      .307302 msec   # of failures = 0
 N =     16 L =  10240 T =      .246291 +/-      .131441 msec   # of failures = 0
 N =     32 L =   2560 T =     1.311957 +/-      .081337 msec   # of failures = 0
 N =     64 L =    640 T =    10.824608 +/-     1.309137 msec   # of failures = 0
 N =    128 L =    160 T =   103.553988 +/-     1.870567 msec   # of failures = 0
 N =    256 L =     40 T =  1062.854400 +/-     5.127493 msec   # of failures = 0
 N =    512 L =     10 T = 17645.269400 +/-   881.390705 msec   # of failures = 0

Complex SingularValues[ZGESDD@LAPACK-3.2.2]
 N =      2 L =  32768 T =      .027102 +/-      .206568 msec   # of failures = 0
 N =      4 L =  32768 T =      .044584 +/-      .175870 msec   # of failures = 0
 N =      8 L =  32768 T =     6.774005 +/-     4.371503 msec   # of failures = 0(*slow)
 N =     16 L =  10240 T =      .336427 +/-      .025955 msec   # of failures = 0
 N =     32 L =   2560 T =     1.376423 +/-      .152718 msec   # of failures = 0
 N =     64 L =    640 T =     8.209416 +/-      .262218 msec   # of failures = 0
 N =    128 L =    160 T =    49.707750 +/-      .226245 msec   # of failures = 0
 N =    256 L =     40 T =   377.112475 +/-     1.425033 msec   # of failures = 0
 N =    512 L =     10 T =  2763.281800 +/-    12.012295 msec   # of failures = 0

Complex SingularValues[ZGESDD@ATLAS-3.8.3]
 N =      2 L =  32768 T =      .031600 +/-      .221910 msec   # of failures = 0
 N =      4 L =  32768 T =      .055400 +/-      .171011 msec   # of failures = 0
 N =      8 L =  32768 T =     7.043678 +/-     4.634607 msec   # of failures = 0(*slow)
 N =     16 L =  10240 T =      .372316 +/-      .162209 msec   # of failures = 0
 N =     32 L =   2560 T =     1.320179 +/-      .055972 msec   # of failures = 0
 N =     64 L =    640 T =     6.073683 +/-      .232270 msec   # of failures = 0
 N =    128 L =    160 T =    35.523719 +/-      .844582 msec   # of failures = 0
 N =    256 L =     40 T =   249.999800 +/-    10.159306 msec   # of failures = 0
 N =    512 L =     10 T =  1616.151700 +/-    35.023459 msec   # of failures = 0

Complex SingularValues[ZGESDD@GotoBLAS-2.1.13]
 N =      2 L =  32768 T =      .027644 +/-      .188919 msec   # of failures = 0
 N =      4 L =  32768 T =      .042908 +/-      .069401 msec   # of failures = 0
 N =      8 L =  32768 T =     6.596583 +/-     4.197659 msec   # of failures = 0(*slow)
 N =     16 L =  10240 T =      .259719 +/-      .014824 msec   # of failures = 0
 N =     32 L =   2560 T =      .879463 +/-      .223959 msec   # of failures = 0
 N =     64 L =    640 T =     3.943514 +/-      .047710 msec   # of failures = 0
 N =    128 L =    160 T =    20.680231 +/-      .259194 msec   # of failures = 0
 N =    256 L =     40 T =   134.837150 +/-      .787147 msec   # of failures = 0
 N =    512 L =     10 T =   884.559200 +/-    11.894268 msec   # of failures = 0

Real LinearSolve[tsolvm]
 N =      2 L =  32768 T =      .012932 +/-      .162209 msec   # of failures = 0
 N =      4 L =  32768 T =      .012732 +/-      .047418 msec   # of failures = 0
 N =      8 L =  32768 T =      .020592 +/-      .047073 msec   # of failures = 0
 N =     16 L =  10240 T =      .059602 +/-      .025690 msec   # of failures = 0
 N =     32 L =   2560 T =      .246922 +/-      .049520 msec   # of failures = 0
 N =     64 L =    640 T =     1.590381 +/-      .116929 msec   # of failures = 0
 N =    128 L =    160 T =    41.466450 +/-      .115509 msec   # of failures = 0
 N =    256 L =     40 T =   404.196600 +/-      .347976 msec   # of failures = 0
 N =    512 L =     10 T =  4553.732400 +/-     6.132115 msec   # of failures = 0

Real LinearSolve[DGELSD@LAPACK-3.2.2]
 N =      2 L =  32768 T =      .020902 +/-      .171563 msec   # of failures = 0
 N =      4 L =  32768 T =      .033962 +/-      .072665 msec   # of failures = 0
 N =      8 L =  32768 T =      .079500 +/-      .042726 msec   # of failures = 0
 N =     16 L =  10240 T =      .280457 +/-      .112379 msec   # of failures = 0
 N =     32 L =   2560 T =     1.144411 +/-      .081641 msec   # of failures = 0
 N =     64 L =    640 T =     6.406339 +/-      .079681 msec   # of failures = 0
 N =    128 L =    160 T =    40.585156 +/-      .134758 msec   # of failures = 0
 N =    256 L =     40 T =   310.538725 +/-      .785799 msec   # of failures = 0
 N =    512 L =     10 T =  2862.740200 +/-    12.274714 msec   # of failures = 0

Real LinearSolve[DGELSD@ATLAS-3.8.3]
 N =      2 L =  32768 T =      .024493 +/-      .190497 msec   # of failures = 0
 N =      4 L =  32768 T =      .040578 +/-      .101808 msec   # of failures = 0
 N =      8 L =  32768 T =      .093644 +/-      .114790 msec   # of failures = 0
 N =     16 L =  10240 T =      .283796 +/-      .136987 msec   # of failures = 0
 N =     32 L =   2560 T =      .897974 +/-      .101839 msec   # of failures = 0
 N =     64 L =    640 T =     3.688769 +/-      .129619 msec   # of failures = 0
 N =    128 L =    160 T =    22.626219 +/-      .179070 msec   # of failures = 0
 N =    256 L =     40 T =   134.290775 +/-      .545878 msec   # of failures = 0
 N =    512 L =     10 T =  1196.390700 +/-    41.841181 msec   # of failures = 0

Real LinearSolve[DGELSD@GotoBLAS-2.1.13]
 N =      2 L =  32768 T =      .023342 +/-      .195374 msec   # of failures = 0
 N =      4 L =  32768 T =      .034271 +/-      .082321 msec   # of failures = 0
 N =      8 L =  32768 T =      .072005 +/-      .105429 msec   # of failures = 0
 N =     16 L =  10240 T =      .205889 +/-      .056656 msec   # of failures = 0
 N =     32 L =   2560 T =      .681020 +/-      .155405 msec   # of failures = 0
 N =     64 L =    640 T =     2.651733 +/-      .128700 msec   # of failures = 0
 N =    128 L =    160 T =    16.469419 +/-      .132502 msec   # of failures = 0
 N =    256 L =     40 T =   115.589175 +/-      .481383 msec   # of failures = 0
 N =    512 L =     10 T =  1019.370300 +/-     1.474057 msec   # of failures = 0


- 関連一覧ツリー (Click ▼ to display all articles in a thread.)