All articles in a thread |
---|
CSR@MakeZL[]のホットスポットは、 * tfeigensystem()から呼び出される tqr(), thess() * tflinearsolve()から呼び出される tsolvm() の二つであり、条件によるが、固有値・固有ベクトル分解で 60%、 逆行列計算で 35%のCPUタイムが費されています。 この種の問題を解く数学的なアルゴリズムの研究と実装の改良は、 数値計算の研究領域では日進月歩であり、CPU毎に最適化された パッケージやSMPや並列計算機に最適化された実装が存在しています。 数値線形代数では、BLAS/LAPACKが事実上の標準インターフェースであり、 こうしたアルゴリズムの改良や実装最適化の成果はBLAS/LAPACK API経由であれば、 比較的簡単に利用することが可能です。 amorita branchにて実験的に、Eigensystem[]の内部実装を BLAS/LAPACKに置き換え、 高速なBLAS実装として知られる GotoBLAS2をバックエンドにした実装で、 実一般行列(N=512程度)の固有値・固有ベクトル分解で約5倍程度の高速化が得られ、 CSR@MakeZL[]の実行時間でも手元のサンプルでは 40%程度の時間短縮効果が出ています。 # 標準の BLAS/LAPACKでは速くなりません 追試・開発したい方は、次の環境で実験出来ます * amorita branch revision 3366以降 * LAPACK extension module * CSR extension module * 最適化 or 並列化された BLAS/LAPACKライブラリ |
BLAS & LAPACKを使った線形代数ルーチンの高速化を目的とした extension module Math/LAPACKが一応完成しました 高速化されるのは、 * LinearSolve[] * Inverse[] * SingularValues[] * Eigensystem[] の4関数(Inverse[]はLinearSolve[]で実装されているので実質3関数)です 使い方 1. システムに BLAS & LAPACKをインストール ATLASや GotoBLAS等の最適化BLASを推奨します 2. LAPACK extension moduleをダウンロードして make & make install システムにインストールされている BLAS & LAPACKに応じて * USE_BLAS変数を設定する * LDOPT_ADD変数にリンクする BLAS & LAPACKライブラリを指定する とうの設定作業が必要です 3. SADScript上で Library@Require["Math/LAPACK"] 以上の手順で `Lapack' プレフィックスの付いた線形代数ルーチンの高速版が 使用可能になります。(Ex. LapackLinearSolve[]) SAD coreの提供する関数を完全に置き換える場合は、Makefile中の ``COPT_ADD=-DUSE_LAPACK_PREFIX''をコメントアウトしてください 動作環境は、SAD amorita branch revision 3407以降です 動作速度は、扱う行列サイズやBLASの最適化の程度に影響されますが、 良く最適化された BLASを使った場合、ある程度大きな行列では SAD coreのものに対して、約5倍から10倍程度の改善が得られます |
手元の環境で比較的簡単に入手できる BLAS/LAPACKによるベンチマークを行いました SAD:amorita branch r3415 Module:Math/LPACK extension r3441 OS:FreeBSD/amd64 8.1-STABLE CPU:Quad-Core AMD Opteron(tm) Processor 2376 (2300.11-MHz K8-class CPU) Date:2010/11/05 --------------------------------------------------------------------------------- Real Eigensystem[teigen] N = 2 L = 32768 T = .008237 +/- .140829 msec # of failures = 0 N = 4 L = 32768 T = .014519 +/- .062383 msec # of failures = 0 TEIGEN convergence failed. Range = 2 5 Lower right corner = 0.52571493004973358 4.63855639037369441E-002 0.73160999814810090 N = 8 L = 32768 T = .048708 +/- .144691 msec # of failures = 0 N = 16 L = 10240 T = .216757 +/- .132459 msec # of failures = 0 N = 32 L = 2560 T = 1.159240 +/- .257751 msec # of failures = 0 N = 64 L = 640 T = 6.664422 +/- .284198 msec # of failures = 0 N = 128 L = 160 T = 54.954325 +/- 1.280301 msec # of failures = 0 N = 256 L = 40 T = 474.934650 +/- 9.550046 msec # of failures = 0 N = 512 L = 10 T = 4954.452600 +/- 51.146981 msec # of failures = 0 Real Eigensystem[DGEEVX@LAPACK-3.2.2] N = 2 L = 32768 T = .013218 +/- .157030 msec # of failures = 0 N = 4 L = 32768 T = .024014 +/- .067510 msec # of failures = 0 N = 8 L = 32768 T = .070447 +/- .110767 msec # of failures = 0 N = 16 L = 10240 T = .280239 +/- .204191 msec # of failures = 0 N = 32 L = 2560 T = 1.517591 +/- .306330 msec # of failures = 0 N = 64 L = 640 T = 8.358041 +/- .380745 msec # of failures = 0 N = 128 L = 160 T = 94.711063 +/- 8.258527 msec # of failures = 0 N = 256 L = 40 T = 684.155500 +/- 10.204925 msec # of failures = 0 N = 512 L = 10 T = 3075.963100 +/- 45.029070 msec # of failures = 0 Real Eigensystem[DGEEVX@ATLAS-3.8.3] N = 2 L = 32768 T = .014668 +/- .160511 msec # of failures = 0 N = 4 L = 32768 T = .027324 +/- .072267 msec # of failures = 0 N = 8 L = 32768 T = .075819 +/- .112824 msec # of failures = 0 N = 16 L = 10240 T = .498010 +/- .199880 msec # of failures = 0 N = 32 L = 2560 T = 2.269072 +/- .379338 msec # of failures = 0 N = 64 L = 640 T = 10.689588 +/- 1.262235 msec # of failures = 0 N = 128 L = 160 T = 95.835031 +/- 8.315531 msec # of failures = 0 N = 256 L = 40 T = 681.952975 +/- 16.886181 msec # of failures = 0 N = 512 L = 10 T = 4109.489400 +/- 127.084073 msec # of failures = 0 Real Eigensystem[DGEEVX@GotoBLAS-2.1.13] N = 2 L = 32768 T = .014868 +/- .167066 msec # of failures = 0 N = 4 L = 32768 T = .026354 +/- .093779 msec # of failures = 0 N = 8 L = 32768 T = .070111 +/- .104149 msec # of failures = 0 N = 16 L = 10240 T = .256042 +/- .191495 msec # of failures = 0 N = 32 L = 2560 T = 1.271190 +/- .221631 msec # of failures = 0 N = 64 L = 640 T = 6.846203 +/- .493545 msec # of failures = 0 N = 128 L = 160 T = 80.511619 +/- 8.542452 msec # of failures = 0 N = 256 L = 40 T = 401.314400 +/- 9.575067 msec # of failures = 0 N = 512 L = 10 T = 1388.119100 +/- 25.401272 msec # of failures = 0 --------------------------------------------------------------------------------- Complex Eigensystem[tceigen] N = 2 L = 32768 T = .008042 +/- .126553 msec # of failures = 0 N = 4 L = 32768 T = .021977 +/- .058947 msec # of failures = 0 N = 8 L = 32768 T = .084895 +/- .108231 msec # of failures = 0 N = 16 L = 10240 T = .414323 +/- .124297 msec # of failures = 0 N = 32 L = 2560 T = 2.466422 +/- .159884 msec # of failures = 0 TCEIGEN Convergence fail. N = 64 L = 640 T = 17.269752 +/- .420907 msec # of failures = 0 N = 128 L = 160 T = 151.543269 +/- 2.482483 msec # of failures = 0 N = 256 L = 40 T = 1536.990525 +/- 18.398861 msec # of failures = 0 N = 512 L = 10 T = 13866.533400 +/- 159.996810 msec # of failures = 0 Complex Eigensystem[ZGEEVX@LAPACK-3.2.2] N = 2 L = 32768 T = .017757 +/- .059012 msec # of failures = 0 N = 4 L = 32768 T = .040167 +/- .092559 msec # of failures = 0 N = 8 L = 32768 T = .119518 +/- .072383 msec # of failures = 0 N = 16 L = 10240 T = .507698 +/- .105572 msec # of failures = 0 N = 32 L = 2560 T = 2.669335 +/- .141993 msec # of failures = 0 N = 64 L = 640 T = 16.661434 +/- .290309 msec # of failures = 0 N = 128 L = 160 T = 123.347500 +/- 1.494198 msec # of failures = 0 N = 256 L = 40 T = 1033.310050 +/- 13.795092 msec # of failures = 0 N = 512 L = 10 T = 6427.120600 +/- 66.541751 msec # of failures = 0 Complex Eigensystem[ZGEEVX@ATLAS-3.8.3] N = 2 L = 32768 T = .020239 +/- .164884 msec # of failures = 0 N = 4 L = 32768 T = .049169 +/- .102389 msec # of failures = 0 N = 8 L = 32768 T = .137168 +/- .097606 msec # of failures = 0 N = 16 L = 10240 T = .675185 +/- .113269 msec # of failures = 0 N = 32 L = 2560 T = 3.237613 +/- .222750 msec # of failures = 0 N = 64 L = 640 T = 19.622386 +/- .812437 msec # of failures = 0 N = 128 L = 160 T = 143.707500 +/- 2.858266 msec # of failures = 0 N = 256 L = 40 T = 1256.908150 +/- 20.315763 msec # of failures = 0 N = 512 L = 10 T = 11165.999600 +/- 172.824775 msec # of failures = 0 Complex Eigensystem[ZGEEVX@GotoBLAS-2.1.13] N = 2 L = 32768 T = .019002 +/- .153284 msec # of failures = 0 N = 4 L = 32768 T = .043108 +/- .073305 msec # of failures = 0 N = 8 L = 32768 T = .114783 +/- .068736 msec # of failures = 0 N = 16 L = 10240 T = .560256 +/- .131265 msec # of failures = 0 N = 32 L = 2560 T = 2.694456 +/- .128608 msec # of failures = 0 N = 64 L = 640 T = 15.112442 +/- .442129 msec # of failures = 0 N = 128 L = 160 T = 91.809056 +/- 1.331968 msec # of failures = 0 N = 256 L = 40 T = 424.419800 +/- 4.313518 msec # of failures = 0 N = 512 L = 10 T = 2694.422600 +/- 48.715928 msec # of failures = 0 --------------------------------------------------------------------------------- Real SingularValues[tsvdm] N = 2 L = 32768 T = .011872 +/- .135057 msec # of failures = 0 N = 4 L = 32768 T = .017514 +/- .056119 msec # of failures = 0 N = 8 L = 32768 T = .033522 +/- .049175 msec # of failures = 0 N = 16 L = 10240 T = .112781 +/- .078994 msec # of failures = 0 N = 32 L = 2560 T = .513979 +/- .205664 msec # of failures = 0 N = 64 L = 640 T = 3.567764 +/- .212971 msec # of failures = 0 N = 128 L = 160 T = 49.825387 +/- .410513 msec # of failures = 0 N = 256 L = 40 T = 561.219125 +/- 3.144857 msec # of failures = 0 N = 512 L = 10 T = 10544.715000 +/- 56.068778 msec # of failures = 0 Real SingularValues[DGESDD@LAPACK-3.2.2] N = 2 L = 32768 T = .022678 +/- .187658 msec # of failures = 0 N = 4 L = 32768 T = .032596 +/- .060802 msec # of failures = 0 N = 8 L = 32768 T = .074625 +/- .068605 msec # of failures = 0 N = 16 L = 10240 T = .250749 +/- .085152 msec # of failures = 0 N = 32 L = 2560 T = .920881 +/- .073305 msec # of failures = 0 N = 64 L = 640 T = 4.754948 +/- .094989 msec # of failures = 0 N = 128 L = 160 T = 26.753725 +/- .139137 msec # of failures = 0 N = 256 L = 40 T = 187.731250 +/- .537708 msec # of failures = 0 N = 512 L = 10 T = 1384.883600 +/- 3.334391 msec # of failures = 0 Real SingularValues[DGESDD@ATLAS-3.8.3] N = 2 L = 32768 T = .019903 +/- .171499 msec # of failures = 0 N = 4 L = 32768 T = .038882 +/- .123298 msec # of failures = 0 N = 8 L = 32768 T = .087390 +/- .083651 msec # of failures = 0 N = 16 L = 10240 T = .271005 +/- .132160 msec # of failures = 0 N = 32 L = 2560 T = .779123 +/- .155682 msec # of failures = 0 N = 64 L = 640 T = 2.982009 +/- .240249 msec # of failures = 0 N = 128 L = 160 T = 14.705319 +/- .116347 msec # of failures = 0 N = 256 L = 40 T = 76.330825 +/- .542039 msec # of failures = 0 N = 512 L = 10 T = 458.911900 +/- .526803 msec # of failures = 0 Real SingularValues[DGESDD@GotoBLAS-2.1.13] N = 2 L = 32768 T = .021825 +/- .177369 msec # of failures = 0 N = 4 L = 32768 T = .032880 +/- .081391 msec # of failures = 0 N = 8 L = 32768 T = .069007 +/- .088089 msec # of failures = 0 N = 16 L = 10240 T = .195701 +/- .052913 msec # of failures = 0 N = 32 L = 2560 T = .583645 +/- .071883 msec # of failures = 0 N = 64 L = 640 T = 2.153711 +/- .180950 msec # of failures = 0 N = 128 L = 160 T = 9.765350 +/- .120727 msec # of failures = 0 N = 256 L = 40 T = 56.321225 +/- .302002 msec # of failures = 0 N = 512 L = 10 T = 378.101700 +/- .426204 msec # of failures = 0 --------------------------------------------------------------------------------- Complex SingularValues[tcsvdm] N = 2 L = 32768 T = .013533 +/- .147806 msec # of failures = 0 N = 4 L = 32768 T = .024613 +/- .077979 msec # of failures = 0 N = 8 L = 32768 T = .403952 +/- .307302 msec # of failures = 0 N = 16 L = 10240 T = .246291 +/- .131441 msec # of failures = 0 N = 32 L = 2560 T = 1.311957 +/- .081337 msec # of failures = 0 N = 64 L = 640 T = 10.824608 +/- 1.309137 msec # of failures = 0 N = 128 L = 160 T = 103.553988 +/- 1.870567 msec # of failures = 0 N = 256 L = 40 T = 1062.854400 +/- 5.127493 msec # of failures = 0 N = 512 L = 10 T = 17645.269400 +/- 881.390705 msec # of failures = 0 Complex SingularValues[ZGESDD@LAPACK-3.2.2] N = 2 L = 32768 T = .027102 +/- .206568 msec # of failures = 0 N = 4 L = 32768 T = .044584 +/- .175870 msec # of failures = 0 N = 8 L = 32768 T = 6.774005 +/- 4.371503 msec # of failures = 0(*slow) N = 16 L = 10240 T = .336427 +/- .025955 msec # of failures = 0 N = 32 L = 2560 T = 1.376423 +/- .152718 msec # of failures = 0 N = 64 L = 640 T = 8.209416 +/- .262218 msec # of failures = 0 N = 128 L = 160 T = 49.707750 +/- .226245 msec # of failures = 0 N = 256 L = 40 T = 377.112475 +/- 1.425033 msec # of failures = 0 N = 512 L = 10 T = 2763.281800 +/- 12.012295 msec # of failures = 0 Complex SingularValues[ZGESDD@ATLAS-3.8.3] N = 2 L = 32768 T = .031600 +/- .221910 msec # of failures = 0 N = 4 L = 32768 T = .055400 +/- .171011 msec # of failures = 0 N = 8 L = 32768 T = 7.043678 +/- 4.634607 msec # of failures = 0(*slow) N = 16 L = 10240 T = .372316 +/- .162209 msec # of failures = 0 N = 32 L = 2560 T = 1.320179 +/- .055972 msec # of failures = 0 N = 64 L = 640 T = 6.073683 +/- .232270 msec # of failures = 0 N = 128 L = 160 T = 35.523719 +/- .844582 msec # of failures = 0 N = 256 L = 40 T = 249.999800 +/- 10.159306 msec # of failures = 0 N = 512 L = 10 T = 1616.151700 +/- 35.023459 msec # of failures = 0 Complex SingularValues[ZGESDD@GotoBLAS-2.1.13] N = 2 L = 32768 T = .027644 +/- .188919 msec # of failures = 0 N = 4 L = 32768 T = .042908 +/- .069401 msec # of failures = 0 N = 8 L = 32768 T = 6.596583 +/- 4.197659 msec # of failures = 0(*slow) N = 16 L = 10240 T = .259719 +/- .014824 msec # of failures = 0 N = 32 L = 2560 T = .879463 +/- .223959 msec # of failures = 0 N = 64 L = 640 T = 3.943514 +/- .047710 msec # of failures = 0 N = 128 L = 160 T = 20.680231 +/- .259194 msec # of failures = 0 N = 256 L = 40 T = 134.837150 +/- .787147 msec # of failures = 0 N = 512 L = 10 T = 884.559200 +/- 11.894268 msec # of failures = 0 --------------------------------------------------------------------------------- Real LinearSolve[tsolvm] N = 2 L = 32768 T = .012932 +/- .162209 msec # of failures = 0 N = 4 L = 32768 T = .012732 +/- .047418 msec # of failures = 0 N = 8 L = 32768 T = .020592 +/- .047073 msec # of failures = 0 N = 16 L = 10240 T = .059602 +/- .025690 msec # of failures = 0 N = 32 L = 2560 T = .246922 +/- .049520 msec # of failures = 0 N = 64 L = 640 T = 1.590381 +/- .116929 msec # of failures = 0 N = 128 L = 160 T = 41.466450 +/- .115509 msec # of failures = 0 N = 256 L = 40 T = 404.196600 +/- .347976 msec # of failures = 0 N = 512 L = 10 T = 4553.732400 +/- 6.132115 msec # of failures = 0 Real LinearSolve[DGELSD@LAPACK-3.2.2] N = 2 L = 32768 T = .020902 +/- .171563 msec # of failures = 0 N = 4 L = 32768 T = .033962 +/- .072665 msec # of failures = 0 N = 8 L = 32768 T = .079500 +/- .042726 msec # of failures = 0 N = 16 L = 10240 T = .280457 +/- .112379 msec # of failures = 0 N = 32 L = 2560 T = 1.144411 +/- .081641 msec # of failures = 0 N = 64 L = 640 T = 6.406339 +/- .079681 msec # of failures = 0 N = 128 L = 160 T = 40.585156 +/- .134758 msec # of failures = 0 N = 256 L = 40 T = 310.538725 +/- .785799 msec # of failures = 0 N = 512 L = 10 T = 2862.740200 +/- 12.274714 msec # of failures = 0 Real LinearSolve[DGELSD@ATLAS-3.8.3] N = 2 L = 32768 T = .024493 +/- .190497 msec # of failures = 0 N = 4 L = 32768 T = .040578 +/- .101808 msec # of failures = 0 N = 8 L = 32768 T = .093644 +/- .114790 msec # of failures = 0 N = 16 L = 10240 T = .283796 +/- .136987 msec # of failures = 0 N = 32 L = 2560 T = .897974 +/- .101839 msec # of failures = 0 N = 64 L = 640 T = 3.688769 +/- .129619 msec # of failures = 0 N = 128 L = 160 T = 22.626219 +/- .179070 msec # of failures = 0 N = 256 L = 40 T = 134.290775 +/- .545878 msec # of failures = 0 N = 512 L = 10 T = 1196.390700 +/- 41.841181 msec # of failures = 0 Real LinearSolve[DGELSD@GotoBLAS-2.1.13] N = 2 L = 32768 T = .023342 +/- .195374 msec # of failures = 0 N = 4 L = 32768 T = .034271 +/- .082321 msec # of failures = 0 N = 8 L = 32768 T = .072005 +/- .105429 msec # of failures = 0 N = 16 L = 10240 T = .205889 +/- .056656 msec # of failures = 0 N = 32 L = 2560 T = .681020 +/- .155405 msec # of failures = 0 N = 64 L = 640 T = 2.651733 +/- .128700 msec # of failures = 0 N = 128 L = 160 T = 16.469419 +/- .132502 msec # of failures = 0 N = 256 L = 40 T = 115.589175 +/- .481383 msec # of failures = 0 N = 512 L = 10 T = 1019.370300 +/- 1.474057 msec # of failures = 0 --------------------------------------------------------------------------------- |
別の環境で BLAS/LAPACKによるベンチマークを行いました SAD:amorita branch r3415 Module:Math/LPACK extension r3441 OS:FreeBSD/amd64 8.1-STABLE CPU:Intel(R) Xeon(R) CPU X5550 @ 2.67GHz (2666.78-MHz K8-class CPU) Date:2010/11/05 --------------------------------------------------------------------------------- Real Eigensystem[teigen] N = 2 L = 32768 T = .004914 +/- .054961 msec # of failures = 0 N = 4 L = 32768 T = .010696 +/- .070075 msec # of failures = 0 TEIGEN convergence failed. Range = 2 5 Lower right corner = 0.52571493004973358 4.63855639037369441E-002 0.73160999814810090 N = 8 L = 32768 T = .033344 +/- .071230 msec # of failures = 0 N = 16 L = 10240 T = .141852 +/- .126846 msec # of failures = 0 N = 32 L = 2560 T = .836331 +/- .128061 msec # of failures = 0 N = 64 L = 640 T = 4.385395 +/- .289299 msec # of failures = 0 N = 128 L = 160 T = 32.750225 +/- 1.118398 msec # of failures = 0 N = 256 L = 40 T = 308.373100 +/- 9.375773 msec # of failures = 0 N = 512 L = 10 T = 2967.998600 +/- 92.109288 msec # of failures = 0 Real Eigensystem[DGEEVX@LAPACK-3.2.2] N = 2 L = 32768 T = .011405 +/- .072050 msec # of failures = 0 N = 4 L = 32768 T = .020033 +/- .071798 msec # of failures = 0 N = 8 L = 32768 T = .073028 +/- .095900 msec # of failures = 0 N = 16 L = 10240 T = .182777 +/- .128211 msec # of failures = 0 N = 32 L = 2560 T = .942186 +/- .056568 msec # of failures = 0 N = 64 L = 640 T = 15.527852 +/- .385307 msec # of failures = 0 N = 128 L = 160 T = 135.973019 +/- 2.716700 msec # of failures = 0 N = 256 L = 40 T = 543.980725 +/- 5.152799 msec # of failures = 0 N = 512 L = 10 T = 2460.381000 +/- 35.651359 msec # of failures = 0 Real Eigensystem[DGEEVX@ATLAS-3.8.3] N = 2 L = 32768 T = .012499 +/- .040732 msec # of failures = 0 N = 4 L = 32768 T = .022517 +/- .071029 msec # of failures = 0 N = 8 L = 32768 T = .055881 +/- .071899 msec # of failures = 0 N = 16 L = 10240 T = .320816 +/- .129054 msec # of failures = 0 N = 32 L = 2560 T = 1.366923 +/- .276729 msec # of failures = 0 N = 64 L = 640 T = 7.134169 +/- .876691 msec # of failures = 0 N = 128 L = 160 T = 55.171881 +/- 4.410096 msec # of failures = 0 N = 256 L = 40 T = 380.538625 +/- 7.824095 msec # of failures = 0 N = 512 L = 10 T = 2340.383800 +/- 61.905827 msec # of failures = 0 --------------------------------------------------------------------------------- Complex Eigensystem[tceigen] N = 2 L = 32768 T = .005745 +/- .045651 msec # of failures = 0 N = 4 L = 32768 T = .014781 +/- .057704 msec # of failures = 0 N = 8 L = 32768 T = .055616 +/- .044269 msec # of failures = 0 N = 16 L = 10240 T = .271051 +/- .077106 msec # of failures = 0 N = 32 L = 2560 T = 1.633234 +/- .145197 msec # of failures = 0 TCEIGEN Convergence fail. N = 64 L = 640 T = 11.523869 +/- .276761 msec # of failures = 0 N = 128 L = 160 T = 93.776237 +/- 7.066373 msec # of failures = 0 N = 256 L = 40 T = 1090.281600 +/- 13.078611 msec # of failures = 0 N = 512 L = 10 T = 9432.465000 +/- 327.177165 msec # of failures = 0 Complex Eigensystem[ZGEEVX@LAPACK-3.2.2] N = 2 L = 32768 T = .013927 +/- .059110 msec # of failures = 0 N = 4 L = 32768 T = .030352 +/- .058527 msec # of failures = 0 N = 8 L = 32768 T = .085863 +/- .071898 msec # of failures = 0 N = 16 L = 10240 T = .372235 +/- .073488 msec # of failures = 0 N = 32 L = 2560 T = 2.088792 +/- .064885 msec # of failures = 0 N = 64 L = 640 T = 13.832725 +/- .237734 msec # of failures = 0 N = 128 L = 160 T = 100.479919 +/- 1.199283 msec # of failures = 0 N = 256 L = 40 T = 874.881150 +/- 13.824662 msec # of failures = 0 N = 512 L = 10 T = 5517.628600 +/- 177.549475 msec # of failures = 0 Complex Eigensystem[ZGEEVX@ATLAS-3.8.3] N = 2 L = 32768 T = .017170 +/- .088134 msec # of failures = 0 N = 4 L = 32768 T = .036427 +/- .045012 msec # of failures = 0 N = 8 L = 32768 T = .104784 +/- .078750 msec # of failures = 0 N = 16 L = 10240 T = .489181 +/- .045275 msec # of failures = 0 N = 32 L = 2560 T = 2.414941 +/- .214154 msec # of failures = 0 N = 64 L = 640 T = 15.113606 +/- .927262 msec # of failures = 0 N = 128 L = 160 T = 107.582538 +/- 2.169691 msec # of failures = 0 N = 256 L = 40 T = 921.144450 +/- 78.077925 msec # of failures = 0 N = 512 L = 10 T = 6997.876600 +/- 65.148720 msec # of failures = 0 --------------------------------------------------------------------------------- Real SingularValues[tsvdm] N = 2 L = 32768 T = .009644 +/- .046258 msec # of failures = 0 N = 4 L = 32768 T = .012214 +/- .067921 msec # of failures = 0 N = 8 L = 32768 T = .024322 +/- .041449 msec # of failures = 0 N = 16 L = 10240 T = .075302 +/- .073760 msec # of failures = 0 N = 32 L = 2560 T = .343740 +/- .036861 msec # of failures = 0 N = 64 L = 640 T = 3.107773 +/- .221245 msec # of failures = 0 N = 128 L = 160 T = 28.426794 +/- 4.764495 msec # of failures = 0 N = 256 L = 40 T = 480.008375 +/- 56.490995 msec # of failures = 0 N = 512 L = 10 T = 3570.363100 +/- 61.897338 msec # of failures = 0 Real SingularValues[DGESDD@LAPACK-3.2.2] N = 2 L = 32768 T = .026266 +/- .082811 msec # of failures = 0 N = 4 L = 32768 T = .039730 +/- .058722 msec # of failures = 0 N = 8 L = 32768 T = .085655 +/- .083125 msec # of failures = 0 N = 16 L = 10240 T = .277530 +/- .075076 msec # of failures = 0 N = 32 L = 2560 T = 1.038829 +/- .022839 msec # of failures = 0 N = 64 L = 640 T = 5.943048 +/- .239012 msec # of failures = 0 N = 128 L = 160 T = 38.459394 +/- .153021 msec # of failures = 0 N = 256 L = 40 T = 286.069500 +/- .713294 msec # of failures = 0 N = 512 L = 10 T = 2081.214000 +/- 2.634993 msec # of failures = 0 Real SingularValues[DGESDD@ATLAS-3.8.3] N = 2 L = 32768 T = .019553 +/- .054526 msec # of failures = 0 N = 4 L = 32768 T = .035733 +/- .075945 msec # of failures = 0 N = 8 L = 32768 T = .071613 +/- .081462 msec # of failures = 0 N = 16 L = 10240 T = .201926 +/- .106510 msec # of failures = 0 N = 32 L = 2560 T = .560016 +/- .142055 msec # of failures = 0 N = 64 L = 640 T = 2.359422 +/- .312576 msec # of failures = 0 N = 128 L = 160 T = 10.252638 +/- .265303 msec # of failures = 0 N = 256 L = 40 T = 55.665500 +/- 3.808465 msec # of failures = 0 N = 512 L = 10 T = 304.373800 +/- 19.260853 msec # of failures = 0 --------------------------------------------------------------------------------- Complex SingularValues[tcsvdm] N = 2 L = 32768 T = .010985 +/- .078168 msec # of failures = 0 N = 4 L = 32768 T = .017390 +/- .010494 msec # of failures = 0 N = 8 L = 32768 T = 5.313555 +/- 4.616588 msec # of failures = 0(*slow) N = 16 L = 10240 T = .173400 +/- .103538 msec # of failures = 0 N = 32 L = 2560 T = .939715 +/- .022199 msec # of failures = 0 N = 64 L = 640 T = 6.465434 +/- .100230 msec # of failures = 0 N = 128 L = 160 T = 57.908787 +/- 4.018248 msec # of failures = 0 N = 256 L = 40 T = 669.606875 +/- 4.180820 msec # of failures = 0 N = 512 L = 10 T = 5147.778300 +/- 45.043540 msec # of failures = 0 Complex SingularValues[ZGESDD@LAPACK-3.2.2] N = 2 L = 32768 T = .022014 +/- .041939 msec # of failures = 0 N = 4 L = 32768 T = .033519 +/- .041508 msec # of failures = 0 N = 8 L = 32768 T = 4.537111 +/- 3.173818 msec # of failures = 0(*slow) N = 16 L = 10240 T = .234432 +/- .007166 msec # of failures = 0 N = 32 L = 2560 T = .996835 +/- .011787 msec # of failures = 0 N = 64 L = 640 T = 6.394497 +/- .065906 msec # of failures = 0 N = 128 L = 160 T = 40.739569 +/- .357006 msec # of failures = 0 N = 256 L = 40 T = 316.512525 +/- 1.539484 msec # of failures = 0 N = 512 L = 10 T = 2255.242000 +/- 3.732018 msec # of failures = 0 Complex SingularValues[ZGESDD@ATLAS-3.8.3] N = 2 L = 32768 T = .025624 +/- .084478 msec # of failures = 0 N = 4 L = 32768 T = .044957 +/- .060120 msec # of failures = 0 N = 8 L = 32768 T = .094604 +/- .060680 msec # of failures = 0 N = 16 L = 10240 T = .259461 +/- .076981 msec # of failures = 0 N = 32 L = 2560 T = 1.003860 +/- .035998 msec # of failures = 0 N = 64 L = 640 T = 4.478936 +/- .153480 msec # of failures = 0 N = 128 L = 160 T = 23.881556 +/- .207277 msec # of failures = 0 N = 256 L = 40 T = 148.580275 +/- 1.568393 msec # of failures = 0 N = 512 L = 10 T = 805.882300 +/- 4.571945 msec # of failures = 0 --------------------------------------------------------------------------------- Real LinearSolve[tsolvm] N = 2 L = 32768 T = .009529 +/- .061895 msec # of failures = 0 N = 4 L = 32768 T = .010196 +/- .057294 msec # of failures = 0 N = 8 L = 32768 T = .014949 +/- .057801 msec # of failures = 0 N = 16 L = 10240 T = .037556 +/- .073777 msec # of failures = 0 N = 32 L = 2560 T = .160285 +/- .145266 msec # of failures = 0 N = 64 L = 640 T = 1.574538 +/- .050661 msec # of failures = 0 N = 128 L = 160 T = 17.699331 +/- .052163 msec # of failures = 0 N = 256 L = 40 T = 225.097700 +/- .333980 msec # of failures = 0 N = 512 L = 10 T = 2193.822700 +/- 10.421862 msec # of failures = 0 Real LinearSolve[DGELSD@LAPACK-3.2.2] N = 2 L = 32768 T = .019176 +/- .071623 msec # of failures = 0 N = 4 L = 32768 T = .028825 +/- .042434 msec # of failures = 0 N = 8 L = 32768 T = .056619 +/- .041981 msec # of failures = 0 N = 16 L = 10240 T = .182530 +/- .009877 msec # of failures = 0 N = 32 L = 2560 T = .755317 +/- .017439 msec # of failures = 0 N = 64 L = 640 T = 4.405711 +/- .174577 msec # of failures = 0 N = 128 L = 160 T = 28.793131 +/- 1.137125 msec # of failures = 0 N = 256 L = 40 T = 215.818675 +/- .645456 msec # of failures = 0 N = 512 L = 10 T = 1561.887900 +/- 2.827443 msec # of failures = 0 Real LinearSolve[DGELSD@ATLAS-3.8.3] N = 2 L = 32768 T = .021839 +/- .082602 msec # of failures = 0 N = 4 L = 32768 T = .037695 +/- .081893 msec # of failures = 0 N = 8 L = 32768 T = .073539 +/- .072191 msec # of failures = 0 N = 16 L = 10240 T = .203711 +/- .107180 msec # of failures = 0 N = 32 L = 2560 T = .657891 +/- .143076 msec # of failures = 0 N = 64 L = 640 T = 2.880484 +/- .055372 msec # of failures = 0 N = 128 L = 160 T = 15.440688 +/- .074582 msec # of failures = 0 N = 256 L = 40 T = 95.026875 +/- 3.481534 msec # of failures = 0 N = 512 L = 10 T = 579.009100 +/- .788075 msec # of failures = 0 --------------------------------------------------------------------------------- |
Opteron上の ATLAS-3.8.3の QR Decomposition性能はかなり残念な数値であったが、 ATLAS-3.9.x系列にAMD K10(Opteron)向けのLAPACK QR tuningが行われたとの記述が 有ったので、ATLAS-3.9.11でのベンチマークも行った * GotoBLASには及ばないが、ATLAS-3.8.3に比べると QRは、かなり改善されている * Complex SingularValues N=8でのデグレードに関しては、テストベクターの検証が必要と思われる SAD:amorita branch r3415 Module:Math/LPACK extension r3441 OS:FreeBSD/amd64 8.1-STABLE CPU:Quad-Core AMD Opteron(tm) Processor 2376 (2300.11-MHz K8-class CPU) Date:2010/11/05 --------------------------------------------------------------------------------- Real Eigensystem[teigen] N = 2 L = 32768 T = .008237 +/- .140829 msec # of failures = 0 N = 4 L = 32768 T = .014519 +/- .062383 msec # of failures = 0 TEIGEN convergence failed. Range = 2 5 Lower right corner = 0.52571493004973358 4.63855639037369441E-002 0.73160999814810090 N = 8 L = 32768 T = .048708 +/- .144691 msec # of failures = 0 N = 16 L = 10240 T = .216757 +/- .132459 msec # of failures = 0 N = 32 L = 2560 T = 1.159240 +/- .257751 msec # of failures = 0 N = 64 L = 640 T = 6.664422 +/- .284198 msec # of failures = 0 N = 128 L = 160 T = 54.954325 +/- 1.280301 msec # of failures = 0 N = 256 L = 40 T = 474.934650 +/- 9.550046 msec # of failures = 0 N = 512 L = 10 T = 4954.452600 +/- 51.146981 msec # of failures = 0 Real Eigensystem[DGEEVX@LAPACK-3.2.2] N = 2 L = 32768 T = .013218 +/- .157030 msec # of failures = 0 N = 4 L = 32768 T = .024014 +/- .067510 msec # of failures = 0 N = 8 L = 32768 T = .070447 +/- .110767 msec # of failures = 0 N = 16 L = 10240 T = .280239 +/- .204191 msec # of failures = 0 N = 32 L = 2560 T = 1.517591 +/- .306330 msec # of failures = 0 N = 64 L = 640 T = 8.358041 +/- .380745 msec # of failures = 0 N = 128 L = 160 T = 94.711063 +/- 8.258527 msec # of failures = 0 N = 256 L = 40 T = 684.155500 +/- 10.204925 msec # of failures = 0 N = 512 L = 10 T = 3075.963100 +/- 45.029070 msec # of failures = 0 Real Eigensystem[DGEEVX@ATLAS-3.8.3] N = 2 L = 32768 T = .014668 +/- .160511 msec # of failures = 0 N = 4 L = 32768 T = .027324 +/- .072267 msec # of failures = 0 N = 8 L = 32768 T = .075819 +/- .112824 msec # of failures = 0 N = 16 L = 10240 T = .498010 +/- .199880 msec # of failures = 0 N = 32 L = 2560 T = 2.269072 +/- .379338 msec # of failures = 0 N = 64 L = 640 T = 10.689588 +/- 1.262235 msec # of failures = 0 N = 128 L = 160 T = 95.835031 +/- 8.315531 msec # of failures = 0 N = 256 L = 40 T = 681.952975 +/- 16.886181 msec # of failures = 0 N = 512 L = 10 T = 4109.489400 +/- 127.084073 msec # of failures = 0 Real Eigensystem[DGEEVX@ATLAS-3.9.11] N = 2 L = 32768 T = .014025 +/- .149389 msec # of failures = 0 N = 4 L = 32768 T = .026715 +/- .100128 msec # of failures = 0 N = 8 L = 32768 T = .074533 +/- .113714 msec # of failures = 0 N = 16 L = 10240 T = .287458 +/- .191506 msec # of failures = 0 N = 32 L = 2560 T = 1.484938 +/- .415549 msec # of failures = 0 N = 64 L = 640 T = 7.842805 +/- .391136 msec # of failures = 0 N = 128 L = 160 T = 93.693800 +/- 8.725549 msec # of failures = 0 N = 256 L = 40 T = 479.481400 +/- 8.149593 msec # of failures = 0 N = 512 L = 10 T = 1646.219700 +/- 31.246515 msec # of failures = 0 Real Eigensystem[DGEEVX@GotoBLAS-2.1.13] N = 2 L = 32768 T = .014868 +/- .167066 msec # of failures = 0 N = 4 L = 32768 T = .026354 +/- .093779 msec # of failures = 0 N = 8 L = 32768 T = .070111 +/- .104149 msec # of failures = 0 N = 16 L = 10240 T = .256042 +/- .191495 msec # of failures = 0 N = 32 L = 2560 T = 1.271190 +/- .221631 msec # of failures = 0 N = 64 L = 640 T = 6.846203 +/- .493545 msec # of failures = 0 N = 128 L = 160 T = 80.511619 +/- 8.542452 msec # of failures = 0 N = 256 L = 40 T = 401.314400 +/- 9.575067 msec # of failures = 0 N = 512 L = 10 T = 1388.119100 +/- 25.401272 msec # of failures = 0 --------------------------------------------------------------------------------- Complex Eigensystem[tceigen] N = 2 L = 32768 T = .008042 +/- .126553 msec # of failures = 0 N = 4 L = 32768 T = .021977 +/- .058947 msec # of failures = 0 N = 8 L = 32768 T = .084895 +/- .108231 msec # of failures = 0 N = 16 L = 10240 T = .414323 +/- .124297 msec # of failures = 0 N = 32 L = 2560 T = 2.466422 +/- .159884 msec # of failures = 0 TCEIGEN Convergence fail. N = 64 L = 640 T = 17.269752 +/- .420907 msec # of failures = 0 N = 128 L = 160 T = 151.543269 +/- 2.482483 msec # of failures = 0 N = 256 L = 40 T = 1536.990525 +/- 18.398861 msec # of failures = 0 N = 512 L = 10 T = 13866.533400 +/- 159.996810 msec # of failures = 0 Complex Eigensystem[ZGEEVX@LAPACK-3.2.2] N = 2 L = 32768 T = .017757 +/- .059012 msec # of failures = 0 N = 4 L = 32768 T = .040167 +/- .092559 msec # of failures = 0 N = 8 L = 32768 T = .119518 +/- .072383 msec # of failures = 0 N = 16 L = 10240 T = .507698 +/- .105572 msec # of failures = 0 N = 32 L = 2560 T = 2.669335 +/- .141993 msec # of failures = 0 N = 64 L = 640 T = 16.661434 +/- .290309 msec # of failures = 0 N = 128 L = 160 T = 123.347500 +/- 1.494198 msec # of failures = 0 N = 256 L = 40 T = 1033.310050 +/- 13.795092 msec # of failures = 0 N = 512 L = 10 T = 6427.120600 +/- 66.541751 msec # of failures = 0 Complex Eigensystem[ZGEEVX@ATLAS-3.8.3] N = 2 L = 32768 T = .020239 +/- .164884 msec # of failures = 0 N = 4 L = 32768 T = .049169 +/- .102389 msec # of failures = 0 N = 8 L = 32768 T = .137168 +/- .097606 msec # of failures = 0 N = 16 L = 10240 T = .675185 +/- .113269 msec # of failures = 0 N = 32 L = 2560 T = 3.237613 +/- .222750 msec # of failures = 0 N = 64 L = 640 T = 19.622386 +/- .812437 msec # of failures = 0 N = 128 L = 160 T = 143.707500 +/- 2.858266 msec # of failures = 0 N = 256 L = 40 T = 1256.908150 +/- 20.315763 msec # of failures = 0 N = 512 L = 10 T = 11165.999600 +/- 172.824775 msec # of failures = 0 Complex Eigensystem[ZGEEVX@ATLAS-3.9.11] N = 2 L = 32768 T = .022342 +/- .169591 msec # of failures = 0 N = 4 L = 32768 T = .046625 +/- .106590 msec # of failures = 0 N = 8 L = 32768 T = .129303 +/- .108754 msec # of failures = 0 N = 16 L = 10240 T = .515950 +/- .122293 msec # of failures = 0 N = 32 L = 2560 T = 2.585764 +/- .121874 msec # of failures = 0 N = 64 L = 640 T = 15.882036 +/- .293558 msec # of failures = 0 N = 128 L = 160 T = 117.267269 +/- 1.404716 msec # of failures = 0 N = 256 L = 40 T = 700.721650 +/- 8.036642 msec # of failures = 0 N = 512 L = 10 T = 3687.228800 +/- 51.588553 msec # of failures = 0 Complex Eigensystem[ZGEEVX@GotoBLAS-2.1.13] N = 2 L = 32768 T = .019002 +/- .153284 msec # of failures = 0 N = 4 L = 32768 T = .043108 +/- .073305 msec # of failures = 0 N = 8 L = 32768 T = .114783 +/- .068736 msec # of failures = 0 N = 16 L = 10240 T = .560256 +/- .131265 msec # of failures = 0 N = 32 L = 2560 T = 2.694456 +/- .128608 msec # of failures = 0 N = 64 L = 640 T = 15.112442 +/- .442129 msec # of failures = 0 N = 128 L = 160 T = 91.809056 +/- 1.331968 msec # of failures = 0 N = 256 L = 40 T = 424.419800 +/- 4.313518 msec # of failures = 0 N = 512 L = 10 T = 2694.422600 +/- 48.715928 msec # of failures = 0 --------------------------------------------------------------------------------- Real SingularValues[tsvdm] N = 2 L = 32768 T = .011872 +/- .135057 msec # of failures = 0 N = 4 L = 32768 T = .017514 +/- .056119 msec # of failures = 0 N = 8 L = 32768 T = .033522 +/- .049175 msec # of failures = 0 N = 16 L = 10240 T = .112781 +/- .078994 msec # of failures = 0 N = 32 L = 2560 T = .513979 +/- .205664 msec # of failures = 0 N = 64 L = 640 T = 3.567764 +/- .212971 msec # of failures = 0 N = 128 L = 160 T = 49.825387 +/- .410513 msec # of failures = 0 N = 256 L = 40 T = 561.219125 +/- 3.144857 msec # of failures = 0 N = 512 L = 10 T = 10544.715000 +/- 56.068778 msec # of failures = 0 Real SingularValues[DGESDD@LAPACK-3.2.2] N = 2 L = 32768 T = .022678 +/- .187658 msec # of failures = 0 N = 4 L = 32768 T = .032596 +/- .060802 msec # of failures = 0 N = 8 L = 32768 T = .074625 +/- .068605 msec # of failures = 0 N = 16 L = 10240 T = .250749 +/- .085152 msec # of failures = 0 N = 32 L = 2560 T = .920881 +/- .073305 msec # of failures = 0 N = 64 L = 640 T = 4.754948 +/- .094989 msec # of failures = 0 N = 128 L = 160 T = 26.753725 +/- .139137 msec # of failures = 0 N = 256 L = 40 T = 187.731250 +/- .537708 msec # of failures = 0 N = 512 L = 10 T = 1384.883600 +/- 3.334391 msec # of failures = 0 Real SingularValues[DGESDD@ATLAS-3.8.3] N = 2 L = 32768 T = .019903 +/- .171499 msec # of failures = 0 N = 4 L = 32768 T = .038882 +/- .123298 msec # of failures = 0 N = 8 L = 32768 T = .087390 +/- .083651 msec # of failures = 0 N = 16 L = 10240 T = .271005 +/- .132160 msec # of failures = 0 N = 32 L = 2560 T = .779123 +/- .155682 msec # of failures = 0 N = 64 L = 640 T = 2.982009 +/- .240249 msec # of failures = 0 N = 128 L = 160 T = 14.705319 +/- .116347 msec # of failures = 0 N = 256 L = 40 T = 76.330825 +/- .542039 msec # of failures = 0 N = 512 L = 10 T = 458.911900 +/- .526803 msec # of failures = 0 Real SingularValues[DGESDD@ATLAS-3.9.11] N = 2 L = 32768 T = .020868 +/- .157107 msec # of failures = 0 N = 4 L = 32768 T = .035345 +/- .122849 msec # of failures = 0 N = 8 L = 32768 T = .080285 +/- .098572 msec # of failures = 0 N = 16 L = 10240 T = .273210 +/- .061332 msec # of failures = 0 N = 32 L = 2560 T = .783836 +/- .080288 msec # of failures = 0 N = 64 L = 640 T = 3.235420 +/- .120992 msec # of failures = 0 N = 128 L = 160 T = 14.952244 +/- .117878 msec # of failures = 0 N = 256 L = 40 T = 73.638775 +/- .300941 msec # of failures = 0 N = 512 L = 10 T = 461.864400 +/- .574151 msec # of failures = 0 Real SingularValues[DGESDD@GotoBLAS-2.1.13] N = 2 L = 32768 T = .021825 +/- .177369 msec # of failures = 0 N = 4 L = 32768 T = .032880 +/- .081391 msec # of failures = 0 N = 8 L = 32768 T = .069007 +/- .088089 msec # of failures = 0 N = 16 L = 10240 T = .195701 +/- .052913 msec # of failures = 0 N = 32 L = 2560 T = .583645 +/- .071883 msec # of failures = 0 N = 64 L = 640 T = 2.153711 +/- .180950 msec # of failures = 0 N = 128 L = 160 T = 9.765350 +/- .120727 msec # of failures = 0 N = 256 L = 40 T = 56.321225 +/- .302002 msec # of failures = 0 N = 512 L = 10 T = 378.101700 +/- .426204 msec # of failures = 0 --------------------------------------------------------------------------------- Complex SingularValues[tcsvdm] N = 2 L = 32768 T = .013533 +/- .147806 msec # of failures = 0 N = 4 L = 32768 T = .024613 +/- .077979 msec # of failures = 0 N = 8 L = 32768 T = .403952 +/- .307302 msec # of failures = 0 N = 16 L = 10240 T = .246291 +/- .131441 msec # of failures = 0 N = 32 L = 2560 T = 1.311957 +/- .081337 msec # of failures = 0 N = 64 L = 640 T = 10.824608 +/- 1.309137 msec # of failures = 0 N = 128 L = 160 T = 103.553988 +/- 1.870567 msec # of failures = 0 N = 256 L = 40 T = 1062.854400 +/- 5.127493 msec # of failures = 0 N = 512 L = 10 T = 17645.269400 +/- 881.390705 msec # of failures = 0 Complex SingularValues[ZGESDD@LAPACK-3.2.2] N = 2 L = 32768 T = .027102 +/- .206568 msec # of failures = 0 N = 4 L = 32768 T = .044584 +/- .175870 msec # of failures = 0 N = 8 L = 32768 T = 6.774005 +/- 4.371503 msec # of failures = 0(*slow) N = 16 L = 10240 T = .336427 +/- .025955 msec # of failures = 0 N = 32 L = 2560 T = 1.376423 +/- .152718 msec # of failures = 0 N = 64 L = 640 T = 8.209416 +/- .262218 msec # of failures = 0 N = 128 L = 160 T = 49.707750 +/- .226245 msec # of failures = 0 N = 256 L = 40 T = 377.112475 +/- 1.425033 msec # of failures = 0 N = 512 L = 10 T = 2763.281800 +/- 12.012295 msec # of failures = 0 Complex SingularValues[ZGESDD@ATLAS-3.8.3] N = 2 L = 32768 T = .031600 +/- .221910 msec # of failures = 0 N = 4 L = 32768 T = .055400 +/- .171011 msec # of failures = 0 N = 8 L = 32768 T = 7.043678 +/- 4.634607 msec # of failures = 0(*slow) N = 16 L = 10240 T = .372316 +/- .162209 msec # of failures = 0 N = 32 L = 2560 T = 1.320179 +/- .055972 msec # of failures = 0 N = 64 L = 640 T = 6.073683 +/- .232270 msec # of failures = 0 N = 128 L = 160 T = 35.523719 +/- .844582 msec # of failures = 0 N = 256 L = 40 T = 249.999800 +/- 10.159306 msec # of failures = 0 N = 512 L = 10 T = 1616.151700 +/- 35.023459 msec # of failures = 0 Complex SingularValues[ZGESDD@ATLAS-3.9.11] N = 2 L = 32768 T = .030354 +/- .199105 msec # of failures = 0 N = 4 L = 32768 T = .047115 +/- .050325 msec # of failures = 0 N = 8 L = 32768 T = 6.618956 +/- 4.202718 msec # of failures = 0(*slow) N = 16 L = 10240 T = .332314 +/- .104221 msec # of failures = 0 N = 32 L = 2560 T = 1.189729 +/- .021061 msec # of failures = 0 N = 64 L = 640 T = 6.267263 +/- .078448 msec # of failures = 0 N = 128 L = 160 T = 33.797069 +/- .236447 msec # of failures = 0 N = 256 L = 40 T = 219.956975 +/- 1.235945 msec # of failures = 0 N = 512 L = 10 T = 1454.313100 +/- 6.742284 msec # of failures = 0 Complex SingularValues[ZGESDD@GotoBLAS-2.1.13] N = 2 L = 32768 T = .027644 +/- .188919 msec # of failures = 0 N = 4 L = 32768 T = .042908 +/- .069401 msec # of failures = 0 N = 8 L = 32768 T = 6.596583 +/- 4.197659 msec # of failures = 0(*slow) N = 16 L = 10240 T = .259719 +/- .014824 msec # of failures = 0 N = 32 L = 2560 T = .879463 +/- .223959 msec # of failures = 0 N = 64 L = 640 T = 3.943514 +/- .047710 msec # of failures = 0 N = 128 L = 160 T = 20.680231 +/- .259194 msec # of failures = 0 N = 256 L = 40 T = 134.837150 +/- .787147 msec # of failures = 0 N = 512 L = 10 T = 884.559200 +/- 11.894268 msec # of failures = 0 --------------------------------------------------------------------------------- Real LinearSolve[tsolvm] N = 2 L = 32768 T = .012932 +/- .162209 msec # of failures = 0 N = 4 L = 32768 T = .012732 +/- .047418 msec # of failures = 0 N = 8 L = 32768 T = .020592 +/- .047073 msec # of failures = 0 N = 16 L = 10240 T = .059602 +/- .025690 msec # of failures = 0 N = 32 L = 2560 T = .246922 +/- .049520 msec # of failures = 0 N = 64 L = 640 T = 1.590381 +/- .116929 msec # of failures = 0 N = 128 L = 160 T = 41.466450 +/- .115509 msec # of failures = 0 N = 256 L = 40 T = 404.196600 +/- .347976 msec # of failures = 0 N = 512 L = 10 T = 4553.732400 +/- 6.132115 msec # of failures = 0 Real LinearSolve[DGELSD@LAPACK-3.2.2] N = 2 L = 32768 T = .020902 +/- .171563 msec # of failures = 0 N = 4 L = 32768 T = .033962 +/- .072665 msec # of failures = 0 N = 8 L = 32768 T = .079500 +/- .042726 msec # of failures = 0 N = 16 L = 10240 T = .280457 +/- .112379 msec # of failures = 0 N = 32 L = 2560 T = 1.144411 +/- .081641 msec # of failures = 0 N = 64 L = 640 T = 6.406339 +/- .079681 msec # of failures = 0 N = 128 L = 160 T = 40.585156 +/- .134758 msec # of failures = 0 N = 256 L = 40 T = 310.538725 +/- .785799 msec # of failures = 0 N = 512 L = 10 T = 2862.740200 +/- 12.274714 msec # of failures = 0 Real LinearSolve[DGELSD@ATLAS-3.8.3] N = 2 L = 32768 T = .024493 +/- .190497 msec # of failures = 0 N = 4 L = 32768 T = .040578 +/- .101808 msec # of failures = 0 N = 8 L = 32768 T = .093644 +/- .114790 msec # of failures = 0 N = 16 L = 10240 T = .283796 +/- .136987 msec # of failures = 0 N = 32 L = 2560 T = .897974 +/- .101839 msec # of failures = 0 N = 64 L = 640 T = 3.688769 +/- .129619 msec # of failures = 0 N = 128 L = 160 T = 22.626219 +/- .179070 msec # of failures = 0 N = 256 L = 40 T = 134.290775 +/- .545878 msec # of failures = 0 N = 512 L = 10 T = 1196.390700 +/- 41.841181 msec # of failures = 0 Real LinearSolve[DGELSD@ATLAS-3.9.11] N = 2 L = 32768 T = .024281 +/- .195702 msec # of failures = 0 N = 4 L = 32768 T = .037526 +/- .113346 msec # of failures = 0 N = 8 L = 32768 T = .085461 +/- .083935 msec # of failures = 0 N = 16 L = 10240 T = .272406 +/- .061417 msec # of failures = 0 N = 32 L = 2560 T = .887479 +/- .084327 msec # of failures = 0 N = 64 L = 640 T = 3.960634 +/- .130929 msec # of failures = 0 N = 128 L = 160 T = 22.656294 +/- .118033 msec # of failures = 0 N = 256 L = 40 T = 132.068025 +/- .431280 msec # of failures = 0 N = 512 L = 10 T = 1235.568900 +/- 3.348272 msec # of failures = 0 Real LinearSolve[DGELSD@GotoBLAS-2.1.13] N = 2 L = 32768 T = .023342 +/- .195374 msec # of failures = 0 N = 4 L = 32768 T = .034271 +/- .082321 msec # of failures = 0 N = 8 L = 32768 T = .072005 +/- .105429 msec # of failures = 0 N = 16 L = 10240 T = .205889 +/- .056656 msec # of failures = 0 N = 32 L = 2560 T = .681020 +/- .155405 msec # of failures = 0 N = 64 L = 640 T = 2.651733 +/- .128700 msec # of failures = 0 N = 128 L = 160 T = 16.469419 +/- .132502 msec # of failures = 0 N = 256 L = 40 T = 115.589175 +/- .481383 msec # of failures = 0 N = 512 L = 10 T = 1019.370300 +/- 1.474057 msec # of failures = 0 --------------------------------------------------------------------------------- |
MAIN trunk上で実行するのに必要な変更点 |
TFCODE.incのモジュール化以外の buildに必要な変更は 1.0.10.4.14a2まででバックポート完了 |