Subject | : Re: 線形代数ルーチンの高速化 |
Date | : 2010/11/05(Fri) 15:59:08 |
Contributor | : Akio Morita |
手元の環境で比較的簡単に入手できる BLAS/LAPACKによるベンチマークを行いました SAD:amorita branch r3415 Module:Math/LPACK extension r3441 OS:FreeBSD/amd64 8.1-STABLE CPU:Quad-Core AMD Opteron(tm) Processor 2376 (2300.11-MHz K8-class CPU) Date:2010/11/05 --------------------------------------------------------------------------------- Real Eigensystem[teigen] N = 2 L = 32768 T = .008237 +/- .140829 msec # of failures = 0 N = 4 L = 32768 T = .014519 +/- .062383 msec # of failures = 0 TEIGEN convergence failed. Range = 2 5 Lower right corner = 0.52571493004973358 4.63855639037369441E-002 0.73160999814810090 N = 8 L = 32768 T = .048708 +/- .144691 msec # of failures = 0 N = 16 L = 10240 T = .216757 +/- .132459 msec # of failures = 0 N = 32 L = 2560 T = 1.159240 +/- .257751 msec # of failures = 0 N = 64 L = 640 T = 6.664422 +/- .284198 msec # of failures = 0 N = 128 L = 160 T = 54.954325 +/- 1.280301 msec # of failures = 0 N = 256 L = 40 T = 474.934650 +/- 9.550046 msec # of failures = 0 N = 512 L = 10 T = 4954.452600 +/- 51.146981 msec # of failures = 0 Real Eigensystem[DGEEVX@LAPACK-3.2.2] N = 2 L = 32768 T = .013218 +/- .157030 msec # of failures = 0 N = 4 L = 32768 T = .024014 +/- .067510 msec # of failures = 0 N = 8 L = 32768 T = .070447 +/- .110767 msec # of failures = 0 N = 16 L = 10240 T = .280239 +/- .204191 msec # of failures = 0 N = 32 L = 2560 T = 1.517591 +/- .306330 msec # of failures = 0 N = 64 L = 640 T = 8.358041 +/- .380745 msec # of failures = 0 N = 128 L = 160 T = 94.711063 +/- 8.258527 msec # of failures = 0 N = 256 L = 40 T = 684.155500 +/- 10.204925 msec # of failures = 0 N = 512 L = 10 T = 3075.963100 +/- 45.029070 msec # of failures = 0 Real Eigensystem[DGEEVX@ATLAS-3.8.3] N = 2 L = 32768 T = .014668 +/- .160511 msec # of failures = 0 N = 4 L = 32768 T = .027324 +/- .072267 msec # of failures = 0 N = 8 L = 32768 T = .075819 +/- .112824 msec # of failures = 0 N = 16 L = 10240 T = .498010 +/- .199880 msec # of failures = 0 N = 32 L = 2560 T = 2.269072 +/- .379338 msec # of failures = 0 N = 64 L = 640 T = 10.689588 +/- 1.262235 msec # of failures = 0 N = 128 L = 160 T = 95.835031 +/- 8.315531 msec # of failures = 0 N = 256 L = 40 T = 681.952975 +/- 16.886181 msec # of failures = 0 N = 512 L = 10 T = 4109.489400 +/- 127.084073 msec # of failures = 0 Real Eigensystem[DGEEVX@GotoBLAS-2.1.13] N = 2 L = 32768 T = .014868 +/- .167066 msec # of failures = 0 N = 4 L = 32768 T = .026354 +/- .093779 msec # of failures = 0 N = 8 L = 32768 T = .070111 +/- .104149 msec # of failures = 0 N = 16 L = 10240 T = .256042 +/- .191495 msec # of failures = 0 N = 32 L = 2560 T = 1.271190 +/- .221631 msec # of failures = 0 N = 64 L = 640 T = 6.846203 +/- .493545 msec # of failures = 0 N = 128 L = 160 T = 80.511619 +/- 8.542452 msec # of failures = 0 N = 256 L = 40 T = 401.314400 +/- 9.575067 msec # of failures = 0 N = 512 L = 10 T = 1388.119100 +/- 25.401272 msec # of failures = 0 --------------------------------------------------------------------------------- Complex Eigensystem[tceigen] N = 2 L = 32768 T = .008042 +/- .126553 msec # of failures = 0 N = 4 L = 32768 T = .021977 +/- .058947 msec # of failures = 0 N = 8 L = 32768 T = .084895 +/- .108231 msec # of failures = 0 N = 16 L = 10240 T = .414323 +/- .124297 msec # of failures = 0 N = 32 L = 2560 T = 2.466422 +/- .159884 msec # of failures = 0 TCEIGEN Convergence fail. N = 64 L = 640 T = 17.269752 +/- .420907 msec # of failures = 0 N = 128 L = 160 T = 151.543269 +/- 2.482483 msec # of failures = 0 N = 256 L = 40 T = 1536.990525 +/- 18.398861 msec # of failures = 0 N = 512 L = 10 T = 13866.533400 +/- 159.996810 msec # of failures = 0 Complex Eigensystem[ZGEEVX@LAPACK-3.2.2] N = 2 L = 32768 T = .017757 +/- .059012 msec # of failures = 0 N = 4 L = 32768 T = .040167 +/- .092559 msec # of failures = 0 N = 8 L = 32768 T = .119518 +/- .072383 msec # of failures = 0 N = 16 L = 10240 T = .507698 +/- .105572 msec # of failures = 0 N = 32 L = 2560 T = 2.669335 +/- .141993 msec # of failures = 0 N = 64 L = 640 T = 16.661434 +/- .290309 msec # of failures = 0 N = 128 L = 160 T = 123.347500 +/- 1.494198 msec # of failures = 0 N = 256 L = 40 T = 1033.310050 +/- 13.795092 msec # of failures = 0 N = 512 L = 10 T = 6427.120600 +/- 66.541751 msec # of failures = 0 Complex Eigensystem[ZGEEVX@ATLAS-3.8.3] N = 2 L = 32768 T = .020239 +/- .164884 msec # of failures = 0 N = 4 L = 32768 T = .049169 +/- .102389 msec # of failures = 0 N = 8 L = 32768 T = .137168 +/- .097606 msec # of failures = 0 N = 16 L = 10240 T = .675185 +/- .113269 msec # of failures = 0 N = 32 L = 2560 T = 3.237613 +/- .222750 msec # of failures = 0 N = 64 L = 640 T = 19.622386 +/- .812437 msec # of failures = 0 N = 128 L = 160 T = 143.707500 +/- 2.858266 msec # of failures = 0 N = 256 L = 40 T = 1256.908150 +/- 20.315763 msec # of failures = 0 N = 512 L = 10 T = 11165.999600 +/- 172.824775 msec # of failures = 0 Complex Eigensystem[ZGEEVX@GotoBLAS-2.1.13] N = 2 L = 32768 T = .019002 +/- .153284 msec # of failures = 0 N = 4 L = 32768 T = .043108 +/- .073305 msec # of failures = 0 N = 8 L = 32768 T = .114783 +/- .068736 msec # of failures = 0 N = 16 L = 10240 T = .560256 +/- .131265 msec # of failures = 0 N = 32 L = 2560 T = 2.694456 +/- .128608 msec # of failures = 0 N = 64 L = 640 T = 15.112442 +/- .442129 msec # of failures = 0 N = 128 L = 160 T = 91.809056 +/- 1.331968 msec # of failures = 0 N = 256 L = 40 T = 424.419800 +/- 4.313518 msec # of failures = 0 N = 512 L = 10 T = 2694.422600 +/- 48.715928 msec # of failures = 0 --------------------------------------------------------------------------------- Real SingularValues[tsvdm] N = 2 L = 32768 T = .011872 +/- .135057 msec # of failures = 0 N = 4 L = 32768 T = .017514 +/- .056119 msec # of failures = 0 N = 8 L = 32768 T = .033522 +/- .049175 msec # of failures = 0 N = 16 L = 10240 T = .112781 +/- .078994 msec # of failures = 0 N = 32 L = 2560 T = .513979 +/- .205664 msec # of failures = 0 N = 64 L = 640 T = 3.567764 +/- .212971 msec # of failures = 0 N = 128 L = 160 T = 49.825387 +/- .410513 msec # of failures = 0 N = 256 L = 40 T = 561.219125 +/- 3.144857 msec # of failures = 0 N = 512 L = 10 T = 10544.715000 +/- 56.068778 msec # of failures = 0 Real SingularValues[DGESDD@LAPACK-3.2.2] N = 2 L = 32768 T = .022678 +/- .187658 msec # of failures = 0 N = 4 L = 32768 T = .032596 +/- .060802 msec # of failures = 0 N = 8 L = 32768 T = .074625 +/- .068605 msec # of failures = 0 N = 16 L = 10240 T = .250749 +/- .085152 msec # of failures = 0 N = 32 L = 2560 T = .920881 +/- .073305 msec # of failures = 0 N = 64 L = 640 T = 4.754948 +/- .094989 msec # of failures = 0 N = 128 L = 160 T = 26.753725 +/- .139137 msec # of failures = 0 N = 256 L = 40 T = 187.731250 +/- .537708 msec # of failures = 0 N = 512 L = 10 T = 1384.883600 +/- 3.334391 msec # of failures = 0 Real SingularValues[DGESDD@ATLAS-3.8.3] N = 2 L = 32768 T = .019903 +/- .171499 msec # of failures = 0 N = 4 L = 32768 T = .038882 +/- .123298 msec # of failures = 0 N = 8 L = 32768 T = .087390 +/- .083651 msec # of failures = 0 N = 16 L = 10240 T = .271005 +/- .132160 msec # of failures = 0 N = 32 L = 2560 T = .779123 +/- .155682 msec # of failures = 0 N = 64 L = 640 T = 2.982009 +/- .240249 msec # of failures = 0 N = 128 L = 160 T = 14.705319 +/- .116347 msec # of failures = 0 N = 256 L = 40 T = 76.330825 +/- .542039 msec # of failures = 0 N = 512 L = 10 T = 458.911900 +/- .526803 msec # of failures = 0 Real SingularValues[DGESDD@GotoBLAS-2.1.13] N = 2 L = 32768 T = .021825 +/- .177369 msec # of failures = 0 N = 4 L = 32768 T = .032880 +/- .081391 msec # of failures = 0 N = 8 L = 32768 T = .069007 +/- .088089 msec # of failures = 0 N = 16 L = 10240 T = .195701 +/- .052913 msec # of failures = 0 N = 32 L = 2560 T = .583645 +/- .071883 msec # of failures = 0 N = 64 L = 640 T = 2.153711 +/- .180950 msec # of failures = 0 N = 128 L = 160 T = 9.765350 +/- .120727 msec # of failures = 0 N = 256 L = 40 T = 56.321225 +/- .302002 msec # of failures = 0 N = 512 L = 10 T = 378.101700 +/- .426204 msec # of failures = 0 --------------------------------------------------------------------------------- Complex SingularValues[tcsvdm] N = 2 L = 32768 T = .013533 +/- .147806 msec # of failures = 0 N = 4 L = 32768 T = .024613 +/- .077979 msec # of failures = 0 N = 8 L = 32768 T = .403952 +/- .307302 msec # of failures = 0 N = 16 L = 10240 T = .246291 +/- .131441 msec # of failures = 0 N = 32 L = 2560 T = 1.311957 +/- .081337 msec # of failures = 0 N = 64 L = 640 T = 10.824608 +/- 1.309137 msec # of failures = 0 N = 128 L = 160 T = 103.553988 +/- 1.870567 msec # of failures = 0 N = 256 L = 40 T = 1062.854400 +/- 5.127493 msec # of failures = 0 N = 512 L = 10 T = 17645.269400 +/- 881.390705 msec # of failures = 0 Complex SingularValues[ZGESDD@LAPACK-3.2.2] N = 2 L = 32768 T = .027102 +/- .206568 msec # of failures = 0 N = 4 L = 32768 T = .044584 +/- .175870 msec # of failures = 0 N = 8 L = 32768 T = 6.774005 +/- 4.371503 msec # of failures = 0(*slow) N = 16 L = 10240 T = .336427 +/- .025955 msec # of failures = 0 N = 32 L = 2560 T = 1.376423 +/- .152718 msec # of failures = 0 N = 64 L = 640 T = 8.209416 +/- .262218 msec # of failures = 0 N = 128 L = 160 T = 49.707750 +/- .226245 msec # of failures = 0 N = 256 L = 40 T = 377.112475 +/- 1.425033 msec # of failures = 0 N = 512 L = 10 T = 2763.281800 +/- 12.012295 msec # of failures = 0 Complex SingularValues[ZGESDD@ATLAS-3.8.3] N = 2 L = 32768 T = .031600 +/- .221910 msec # of failures = 0 N = 4 L = 32768 T = .055400 +/- .171011 msec # of failures = 0 N = 8 L = 32768 T = 7.043678 +/- 4.634607 msec # of failures = 0(*slow) N = 16 L = 10240 T = .372316 +/- .162209 msec # of failures = 0 N = 32 L = 2560 T = 1.320179 +/- .055972 msec # of failures = 0 N = 64 L = 640 T = 6.073683 +/- .232270 msec # of failures = 0 N = 128 L = 160 T = 35.523719 +/- .844582 msec # of failures = 0 N = 256 L = 40 T = 249.999800 +/- 10.159306 msec # of failures = 0 N = 512 L = 10 T = 1616.151700 +/- 35.023459 msec # of failures = 0 Complex SingularValues[ZGESDD@GotoBLAS-2.1.13] N = 2 L = 32768 T = .027644 +/- .188919 msec # of failures = 0 N = 4 L = 32768 T = .042908 +/- .069401 msec # of failures = 0 N = 8 L = 32768 T = 6.596583 +/- 4.197659 msec # of failures = 0(*slow) N = 16 L = 10240 T = .259719 +/- .014824 msec # of failures = 0 N = 32 L = 2560 T = .879463 +/- .223959 msec # of failures = 0 N = 64 L = 640 T = 3.943514 +/- .047710 msec # of failures = 0 N = 128 L = 160 T = 20.680231 +/- .259194 msec # of failures = 0 N = 256 L = 40 T = 134.837150 +/- .787147 msec # of failures = 0 N = 512 L = 10 T = 884.559200 +/- 11.894268 msec # of failures = 0 --------------------------------------------------------------------------------- Real LinearSolve[tsolvm] N = 2 L = 32768 T = .012932 +/- .162209 msec # of failures = 0 N = 4 L = 32768 T = .012732 +/- .047418 msec # of failures = 0 N = 8 L = 32768 T = .020592 +/- .047073 msec # of failures = 0 N = 16 L = 10240 T = .059602 +/- .025690 msec # of failures = 0 N = 32 L = 2560 T = .246922 +/- .049520 msec # of failures = 0 N = 64 L = 640 T = 1.590381 +/- .116929 msec # of failures = 0 N = 128 L = 160 T = 41.466450 +/- .115509 msec # of failures = 0 N = 256 L = 40 T = 404.196600 +/- .347976 msec # of failures = 0 N = 512 L = 10 T = 4553.732400 +/- 6.132115 msec # of failures = 0 Real LinearSolve[DGELSD@LAPACK-3.2.2] N = 2 L = 32768 T = .020902 +/- .171563 msec # of failures = 0 N = 4 L = 32768 T = .033962 +/- .072665 msec # of failures = 0 N = 8 L = 32768 T = .079500 +/- .042726 msec # of failures = 0 N = 16 L = 10240 T = .280457 +/- .112379 msec # of failures = 0 N = 32 L = 2560 T = 1.144411 +/- .081641 msec # of failures = 0 N = 64 L = 640 T = 6.406339 +/- .079681 msec # of failures = 0 N = 128 L = 160 T = 40.585156 +/- .134758 msec # of failures = 0 N = 256 L = 40 T = 310.538725 +/- .785799 msec # of failures = 0 N = 512 L = 10 T = 2862.740200 +/- 12.274714 msec # of failures = 0 Real LinearSolve[DGELSD@ATLAS-3.8.3] N = 2 L = 32768 T = .024493 +/- .190497 msec # of failures = 0 N = 4 L = 32768 T = .040578 +/- .101808 msec # of failures = 0 N = 8 L = 32768 T = .093644 +/- .114790 msec # of failures = 0 N = 16 L = 10240 T = .283796 +/- .136987 msec # of failures = 0 N = 32 L = 2560 T = .897974 +/- .101839 msec # of failures = 0 N = 64 L = 640 T = 3.688769 +/- .129619 msec # of failures = 0 N = 128 L = 160 T = 22.626219 +/- .179070 msec # of failures = 0 N = 256 L = 40 T = 134.290775 +/- .545878 msec # of failures = 0 N = 512 L = 10 T = 1196.390700 +/- 41.841181 msec # of failures = 0 Real LinearSolve[DGELSD@GotoBLAS-2.1.13] N = 2 L = 32768 T = .023342 +/- .195374 msec # of failures = 0 N = 4 L = 32768 T = .034271 +/- .082321 msec # of failures = 0 N = 8 L = 32768 T = .072005 +/- .105429 msec # of failures = 0 N = 16 L = 10240 T = .205889 +/- .056656 msec # of failures = 0 N = 32 L = 2560 T = .681020 +/- .155405 msec # of failures = 0 N = 64 L = 640 T = 2.651733 +/- .128700 msec # of failures = 0 N = 128 L = 160 T = 16.469419 +/- .132502 msec # of failures = 0 N = 256 L = 40 T = 115.589175 +/- .481383 msec # of failures = 0 N = 512 L = 10 T = 1019.370300 +/- 1.474057 msec # of failures = 0 ---------------------------------------------------------------------------------