The first six lists featured a see-saw battle between the United States and Japan for the No. 1 position. The seventh list, published at the Supercomputer 1996 in Mannheim, saw Japan maintain its hold on the top spot, but with the University of Tokyo displacing the National Aerospace Laboratory of Japan. The No. 1 system was a 1,024-processor SR2201 built by Hitachi. It achieved 232.4 gigaflops running the Linpack benchmark.
The Hitachi SR2201 was a distributed memory parallel system that was introduced in March 1996. Its processor, the 150 MHz HARP-1E based on the PA-RISC 1.1 architecture, solved the cache miss penalty by pseudo vector processing (PVP). In PVP, data was loaded by prefetching to a special register bank, bypassing the cache. Each processor had a peak performance of 300 Mflop/s, giving the SR2201 a peak performance of 600 Gflop/s. Up to 2048 RISC processors could be connected via a high-speed three dimensional crossbar network, which was able to transfer data at 300 MB/s over each link.
In 1996, the authors of the TOP500 list noted certain trends. In the first list (June 1993), 66 percent of the installed systems were based on ECL (Emitter-coupled Logic), in the June 1996 list, only 20 percent of the 500 listed systems are built with ECL.
Hand in hand with this evolution was the trend toward computing nodes built with boards coming directly from workstations and being binary compatible with the workstation systems. The most dominant examples were the Power Challenge, the IBM SP/2 and Convex SPP. In all, these systems represented 53 percent of all 500 systems.