Products

DNBFC_S12_12_Cluster

DNBFC_S12_12_Cluster
Xilinx Spartan-6 FPGA Rack Mount FPGA/HPC Cluster

  • 4U Rackmount Chassis containing
    • 1 dual Intel Xeon® processor card
    • 12 DNBFC_S12_PCIe FPGA cards each with 13 Xilinx of the largest Spartan-6 FPGAs (XC6SLX150)
      • PCIe 4-lane (GEN1)
      • 156 FPGAs in total, 100% dedicated to application
    • 2 bays for SATA-2 hard drives
  • Processor card
    • Dual Intel, Xeon® EC5500 series processors, 2 GHz
      • Quad-Core, 8MB shared L2 cache
      • 6 GB ECC DDR memory per processor (12 GB total)
        • Options to 12 GB per processor (24 GB total)
      • VGA with standard D-Sub connector
      • 10/100/1000BASE-T Ethernet (2 ports)
      • USB 2.0 (4 ports total)
        • 2 ports on front panel
        • 2 ports on back bracket
      • Supports virtually all Linux distributions
  • DNBFC_S12_PCIe FPGA HPC Acceleration card
    • 13 Xilinx Spartan-6 FPGAs per card
      • 156 FPGAs with 12 cards installed
      • Power/cooling to handle up to 5W per FPGA
    • 12 of the largest Xilinx Spartan-6 FPGAs: 6SLX150-2
      • 12, 128M x 16 (2Gb) DDR3-800 memories (1 per FPGA)
    • 1 Xilinx Spartan-6 FPGAs: 6SLX150T-2
      • 2, 128M x 16 (2Gb) DDR3-800 memories
    • Fixed 4-lane GEN2-capable PCIe interface and controller
      • Full mastering DMA – 3 separate engines
    • Xilinx FPGA Spartan-6 LX150-2/LX150T-2 – 13 total
      • 184,464 flip-flops per FPGA
        • 92K flips-flops with 6-input LUT
      • 182, 18×18 multipliers + 48-bit accumulator per FPGA
        • 268, 18 Kbit block RAM (603 Kbytes) per FPGA
          • Fully dual-ported
          • Each block RAM configurable as:
            • 16Kx1, 8Kx2, 4Kx4, 2Kx8/9, 1Kx16/18 or 512 x 32/36
      • Options for LX150-1L (lower power) or LX150-3 (higher frequency)
  • FPGA to FPGA interconnect single-ended
    • Source synchronous FPGA -> FPGA frequency: 150MHz
      • 300 Mb/s per pin when using DDR
  • SuperFastBus (SFB) connects all Spartan-6 FPGAs (8 signals)
    • 60MHz
  • 128Mb x 16 fixed external DDR3 memory dedicated to each field FPGA (12 total)
    • 2 – 128Mb x 16 fixed external DDR3 memories dedicated to USER Dataflow Manager FPGA
      • DDR3-800 (400MHz or 800 Mb/s), 12.8 Gb/s in total
    • Full support for FPGA memory block controller (MBC)
      • Up to 8 open banks
      • Configurable multi-port interface to FPGA fabric
        • 32-, 64-, or 128-bit data bus
      • Easy implementation with Xilinx CORE® Generator™
  • Inter chassis board to board communication utilizing GTP transceivers (LX150T)
    • 3.125 Gb/s per lane, each direction (TX and RX)
    • 4 lanes (4 RX and 4 TX) for daisy chain left
    • 4 lanes (4 RX and 4 TX) for daisy chain right
    • Board to board data communication
      • >1 GB/s per connector TX
      • >1 GB/s per connector RX
      • Non-proprietary, off-the-shelf Samtec cable assembly
  • Three independent low-skew global clock networks distributed differentially and balanced
    • G0: programmable in 1 MHz increments (ICS84314 clock synthesizer)
      • 31.25 MHz to 350 MHz
    • G1: 100MHz PCIe reference
    • G2: SuperFastBus (SFB) clock
  • Fast and Painless FPGA configuration via PCIe
    • On-board battery for AES bitstream encryption
    • Unique Device DNA identifier for design authentication
  • Full support for embedded logic analyzers via JTAG interface
    • ChipScope, and other third-party debug solutions

Overview

The DNBFC_S12_12_Cluster is a complete, 4U rack mount FPGA acceleration cluster. The standard configuration contains the following

Trenton JXT6966 Dual Xeon processor card
12 DNBFC_S12_PCIe Spartan-6 FPGA cards with 13 LX150 FPGAs per card.
500 GB SATA II Hard Drive

This system contains the maximum number of cost effective FPGA that can be reasonability integrated into a 4U chassis. Power and cooling are the constraining variables. High performance data paths between FPGA boards enable data movement under algorithmic control that is wholly separate from the host processor, enabling FPGA-based acceleration of whole new classes of data intensive algorithms.

In short, the DNBFC_S12_12_Cluster is a massive number of large, low cost FPGAs integrated with an excellent dual Xeon-based processor host. High speed serial cables between FPGA cards add as much a 5 GB/s data throughput within the chassis.

A partial list of possible applications includes:

  • bioinformatics
  • Genomic search
  • financial analytics
    • low latency analysis
    • derivative calculations
  • image processing
  • signal processing
  • radar
  • scientific computing
  • video compression
  • encryption/decryption (cryptography )
The Processor Card – dual Xeons

Central to the DNBFC_S12_12_Cluster is the Trenton JXT6966 host processor card. This single-board computer has dual Intel Xeon processors, clocked at 2GHz. Each processor has 3 SODIMM slots and we stuff 2GB DDR3 memories into each, resulting in 6GB of memory per processor. The processor card has two 10/100/1000 Base-T Ethernet ports, along with 4, USB2.0 ports. The chassis can host up to 2 SATA drives. Power and cooling are provided for up to 12 DNBFC_S12_PCIe cards. Power is cabled to the FPGA cards separately and not drawn from the motherboard, allowing us to exceed the 25W slot PCIe limitation. The power budget is 50W per board. Note that this requires a lot of airflow and the fans are noisy. Fully populated, the system is perhaps too noisy to be in close quarters with an engineer.

The DNBFC_S12_PCIe – 13 Xilinx Spartan-6 FPGAs

Designed for high performance computing (HPC) applications, the DNBFC_S12_PCIe is an FPGA-based peripheral that allows algorithm developers to employ hardware-in-the-loop acceleration utilizing cost effective, Xilinx Spartan-6 FPGAs. The DNBFC_S12_12_Cluster can host up to 12 of these FPGA cards. Data movement between the host processor and each FPGA card is accomplished via a fixed 4-lane, GEN1 PCIe interface. Each Spartan-6 FPGA has its own 128M x 16 DDR3 memory capable of clocked speeds up to 400MHz (800 Mb/s per data pin). Two additional 128M x 16 DDR3 memories are connected to the USER Dataflow Manager FPGA (LX150T) for bulk data storage.

The DNBFC_S12_PCIe contains a fixed, full function, 4-lane master/target PCIe controller, freeing the user from integrating the PCIe function in with the application code.GEN1 PCIe is utilized here.

Spartan-6 FPGAs from Xilinx

The Xilinx LX150 (and LX150T) Spartan-6, 45 nm FPGA is utilized and it is the largest member of this cost effective (read: CHEAP) family. The Spartan-6 FPGA family has an impressive price/performance ratio for hardware-in-the-loop accelerators, with device power consumption much lower than the higher performance FPGA families.

Features of Spartan-6 include efficient, dual-register 6-input look-up table (LUT) logic, 18 Kb (2 x 9 Kb) block RAMs, second generation DSP48A1 slices (includes 18 x 18 multipliers), and DDR3 memory controllers. Enhanced IP security with AES and Device DNA protection is new to this family and helps keep your proprietary IP secret.

We use the largest device from this family, the LX150, in the FF484 package. 100% of the 13 FPGAs on each board are dedicated to your application. All FPGAs, excluding the PCIe controller, are configured from the host via PCIe. The PCIe FPGA can be updated remotely in the field.

Memory

Each of the 12 FPGAs has a dedicated 2Gb DDR3 memory. We test the FPGA to memory interface at the fastest frequency allowed by the given speed grade of FPGA stuffed. The fastest speed grade is -3 and if stuffed with the LX150-3, we test this interface at 400MHz. DDR3 is double data rate, multiplying to 800 Mb/s per pin. The configuration is 128M x 16, yielding 12.8 Gb/s (1.25 GB/s) maximum data rate per DDR3 memory.

The Xilinx Spartan-6 family has integrated hard IP for controlling this dedicated DDR3. The fixed memory controller block (MCB) significantly eases the implementation of high performance dataflow. The MCB can have up to 6 ports, and each port can be configured to have a 32-bit, 64-bit, or 128-bit bus interface. Configurable arbitration is included and up to 8 memory banks can be open simultaneously.

The User FPGA Dataflow Manager has two of its own 128M x 16 DDR3 memories and these memories are useful for bulk memory storage.

As always, we provide examples and references designs to help you with all of your memory interface issues. Please check with us to make sure that what we ship for no charge meets your requirements.

Board to Board Dataflow via GTP serial transceivers

The DNBFC_S12_PCIe has expansion capabilities using the gigabit transceivers on the LX150T, labeled on the block diagram as the USER DATAFLOW MANAGER FPGA. The LX150T has a total of eight, 3.125 Gb/s transceivers. Two, non-proprietary Samtec connectors contain 4 GTP lanes each. Eight, general purpose FPGA I/Os are also included. A standard cable is used to connect the installed boards in a daisy chain. The last board in the chain is connected back to the first board completing the loop. Four GTP lanes clocked at 3.125 GHz are capable of transmitting and receiving a data bandwidth of more than 2 GB/s (>1GB/s each for independent TX and RX). This GTP daisy chain allows the user to move large amounts of data board to board without the intervention of the host processor, significantly speeding up algorithms that contain multiple different stages.

Power Consumption

The PCI Express specification limits slot power to 25 watts. The DNBFC_S12_PCIe is capable of consuming power significantly beyond that. In addition to the PCIe fingers, a separate HDD connector adds a second path for power. This product is shipped with adequate heatsinking to consume 50 watts/board, and we supply enough
airflow to dissipate the heat for 12 installed DNBFC_S12_PCIe. We ship high reliability passive heatsinks on the FPGAs with an option for active heatsinks (i.e. with fan).

Status LEDs, Debug

Although no specific testing was performed, sophisticated statistical finite element models and back of the envelope calculations are showing the number of status LEDs to be bright enough to provide emergency illumination for a small parking structure. These LEDs are user controllable from the FPGAs so can be used as visual feedback in addition to emergency lighting. A JTAG connector provides an interface to ChipScope and other third party debug tools.

List of available FPGAs for DNBFC_S12_10_Cluster


Related Documents

Related Resources