Virginia Tech G5 Supercomputing Cluster

Posted: 11-7-2003





I just got back from the O'Reilly conference, and attended an evening presentation by Dr. Srinidhi Varadarajan on the Virginia Tech G5 Supercomputing Cluster. It was a very interesting talk.

Below are some notes on the presentation. I am trying to see if I can get the full slides and/or other documentation on what they have done.


Timetable:


March/April 2003---project begins

April/May--finances raised

June 23, 2003--Apple announces G5

June 26, 2003--VA Tech contacts Apple. Deal sealed with Apple in a few days. Apple is stunned :-). Thought that Dr. Vanadarajan must be a Mac fanatic. Turns out he never touched a Mac before, but is certainly proficient in using Mach and Linux/Unix. VA Tech actually ordered the machines via the Apple Store mechanism!

September 5-11, 2003---G5's arrive

September 23, 2003--facility begins preliminary operations

October 1 through mid November will be performance optimizations

Experimental runs by users can begin today.

Full production use by the start of 2004.

Cost:


2 million for upgraded the facilities in which the system is housed.
5.2 million for hardware (besides computers, includes cards, cables, storage, etc)

Current facility scheduled to be followed by a 2nd system in a new building in 2006.

General design criteria:


--Major factor is Performance/Price
--64 bit design (32 bit systems need not apply)
--Benchmarks will depend on double-precision floating point--Altivec not being used in this case.
--Connectivity to Internet 1, Internet 2 (Abilene) and soon into NLR (National Lambda ???).
--High bandwidth with ultralow latency communication
--Infiniband switched network, 20 GB/sec/port full duplex, latency of less than 10 microseconds on top of MPI.

Platforms considered:


Some vendors proposed a variety of "turnkey" systems. This drove up the cost of some bids into the range of 9-12 million dollars which was well beyond the budget.

Dell with Itanium 2----lost on processor and system cost and to overall performance
IBM with Opteron---lost on performance and overall system cost
IBM with PowerPC 970--won on performance, but lost in delivery time (January 2004) and overall system cost
Sun with SPARC--lost on performance and cost
Apple with PowerPC 970--won on performance and overall system cost.

Opteron apparently does not support the "fused multiply-add" (I may have the spelling wrong) function which gives G5 an edge in floating point performance. As such, G5 can outpace an Opteron by a factor of 2 in floating point.

Itanium 2 apparently gives a GEMM efficiency (see Results section below) as much as 15 percent better than G5 right now. However it is very expensive, and also loses whatever GEMM efficiency advantage it has due to other things like its slower clock speed. That is definitely an ironics twist. :-)

Software:


--Each G5 machine has a stock install of OSX 10.2.7
--Mellanox Infiniband drivers
--MPI implemented using MVAPICH from D.K. Panda's group at Ohio State University. Code ported from Linux with additions of message caching and dynamic memory management.

--Cache optimized memory manager for scientific apps written for OSX as a KEXT (written in-house)

--Scaleable job starting system for MVAPICH (written in-house).

--Deja Vu as a system for fault tolerance ported to the G5. Intended to be separate for ordinary application logic. (written in-house)

Compilers used:


For C and C++, IBM xlc and GCC 3.3
For Fortran, IBM xlf and NAGWare

Results:


--Mellanox driver version 1 started in July and finished in mid-august. Subsequent tweaks have improved things by around 10 percent.

--Benchmarked using LinPack
--G5 solved a system of equations at N = 500K
--dense matrix operations
--main phase is LU decomposition. Gaussian elimination with partial row pivoting 0 (n^3)
--back solution follows at a lower order 0(n^2)

--Used BLAS libraries

--Core routines--matrix multiply (GEMM) optimized by Kazushiga Goto in Japan. 84.1 percent efficiency at this time. Apple's veclib framework also used.

AND THE CURRENT RESULTS AS OF 10/28/2003 ARE .................


9.6 teraflops


So on the current list, this puts them at number 3.


Immediate future plans:


Upgrade G5's to Panther in the next couple of weeks. All codes compile fine under Panther.
Along with some other optimization tricks, anticipating for at least another 10 percent improvement in performance.

Expecting to make their MPI enhancements and in-house software open source. For the Infiniband drivers, Dr. Varadarajan could not speak for them, but is hopeful that those drivers will be made available as open source as well. But that is Mellanox's call.

Related Articles:

http://www.computing.vt.edu/ (Virginia Tech Project: Terascale Cluster)

http://don.cc.vt.edu/ (Pictures: Terascale Cluster)

http://www.computerweekly.com/ (Apple chosen for supercomputing cluster)

http://macslash.org/ (TenCon Keynote - Dr. Srinidhi Varadarajan)

http://www.wired.com/ (Mac Supercomputer Just Got Faster



Larry Peng
Directors Office