본문 바로가기

NetFPGA

NetFPGA Guide

Guide

From NetFPGAWiki

Jump to: navigation, search

Image:NetFPGA_Logo.gif

Contents

[hide]

Introduction

The NetFPGA is a low-cost platform, primarily designed as a tool for teaching networking hardware and router design. It has also proved to be a useful tool for networking researchers. Through partnerships and donations from sponsor of the project, the NetFPGA is widely available to students, teachers, researchers, and anyone else interested in experimenting with new ideas in high-speed networking hardware.

Usage Models

At a high level, the board contains four 1 Gigabit/second Ethernet (GigE) interfaces, a user programmable Field Programmable Gate Array (FPGA), and four banks of locally-attached Static and Dynamic Random Access Memory (SRAM and DRAM). It has a standard PCI interface allowing it to be connected to a desktop PC or server. A reference design can be downloaded from the http://NetFPGA.org website that contains a hardware-accelerated Network Interface Card (NIC) or an Internet Protocol Version 4 (IPv4) router that can be readily configured into the NetFPGA hardware. The router kit allows the NetFPGA to interoperate with other IPv4 routers.

Image:System_Diagram_1.gif

The NetFPGA offloads processing from a host processor. The host's CPU has access to main memory and can DMA to read and write registers and memories on the NetFPGA. Unlike other open-source projects, the NetFPGA provides a hardware-accelerated hardware datapath. The NetFPGA provides a direct hardware interface connected to four GigE ports and multiple banks of local memory installed on the card.

NetFPGA packages (NFPs) are available that contains source code (both for hardware and software) that implement networking functions. Using the reference router as an example, there are three main ways that a developer can use the NFP. In the first usage model, the default router hardware can be configured into the FPGA and the software can be modified to implement a custom protocol.

Image:System_Diagram_2.gif

Another way to modify the NetFPGA is to start with the reference router and extend the design with a custom user module. Finally, it is also possible to implement a completely new design where the user can place their own logic and data processing functions directly in the FPGA.

  1. Use the hardware as is as an accelerator and modify the software to implement new protocols. In this scenario, the NetFPGA board is programmed with IPv4 hardware and the Linux host uses the Router Kit Software distributed in the NFP. The Router Kit daemon mirrors the routing table and ARP cache from software to the tables in the hardware allowing for IPv4 routing at line rate. The user can modify Linux to implement new protocols and test them using the full system.
  2. Start with the provided hardware from the official NFP (or from a third-party NFP), modify it by using modules from the NFP's library or by writing your own Verilog code, then compile the source code using industry standard design tools. The implemented bitfile can then be downloaded to the FPGA. The new functionality can be complemented by additional software or modifications to the existing software. For the IPv4 router, an example of this would be implementing a Trie longest prefix match (LPM) lookup instead of the currently implemented CAM LPM lookup for the hardware routing table. Another example would be to modify the router to implement NAT or a firewall.
  3. Implement a new design from scratch: The design can use modules from the official NFP's library or third party modules to implement the needed functionality or can use completely new source code.

Major Components

A block diagram that shows the major components of NetFPGA platform is shown below.

Block diagram of the major components of the NetFPGA

The NetFPGA platform contains one large Xilinx Virtex2-Pro 50 FPGA which is programmed with user-defined logic and has a core clock that runs at 125MHz. The NetFPGA platform also contains one small Xilinx Spartan II FPGA holding the logic that implements the control logic for the PCI interface to the host processor.

Two 18 MBit external Cypress SRAMs are arranged in a configuration of 512k words by 36 bits (4.5 Mbytes total) and operate synchronously with the FPGA logic at 125 MHz. One bank of external Micron DDR2 SDRAM is arranged in a configuration of 16M words by 32 bits (64 MBytes total). Using both edges of a separate 200 MHz clock, the memory has a bandwidth of 400 MWords/second (1,600 MBytes/s = 12,800 Mbits/s).

The Broadcom Gigabit/second external physical-layer transceiver (PHY) sends packets over standard category 5, 5e, or 6 twisted-pair cables. The quad PHY interfaces with four Gigabit Ethernet Media Access Controllers (MACs) instantiated as a soft core on the FPGA. The NetFPGA also includes two interfaces with Serial ATA (SATA) connectors that enable multiple NetFPGA boards in a system to exchange traffic directly without use of the PCI bus.

How to read this Guide

Depending on your goals, you may find certain chapters of this guide more relevant than others.

to set up a laboratory

If your task is to set up machines, start by reading the steps to obtain hardware and software, follow the steps to install software, then verify the software and hardware.

to use the NetFPGA packages

If you already have NetFPGA systems up and running in your laboratory and want to understand how it works, read the walkthroughs of the Reference Designs to understand the operation of the reference NIC, the software component of the router (SCONE), the router kit, the reference router hardware, and the buffer monitoring system.

Connecting with the Community

We encourage feedback and discussion about the progress and problems with the NetFPGA. An bug-tracking system called Bugzilla is available to read about and post bugs. A forum is available to communicate with other members of the community and submit patches.

Track Bugs with Bugzilla

We track and maintain bugs using BugZilla

NetFPGA Forums

We've created a forum where users can post their questions and have them answered. The forum is easily searchable and anyone can register. It is available here.

  • Feel free to use the forum for:
    • Announcements on progress with the NetFPGA
    • General questions about the scripts and code
    • Answers to questions (feel free to contribute)
    • Submit patches! We always welcome these :)

NetFPGA-Beta Email list

This is here for historic purposes. The email list has been superseded by the NetFPGA Forums (see above). Please DO NOT use the beta e-mail list!

NO GUARANTEES

We do not guarantee that any or all of the NetFPGA components will work for you. FPGAs allow for an enormous range of freedom in the implementation of circuits. We do not guarantee that anything you get from us will not damage the hardware on the NetFPGA, the software on the PC, or anything else. And finally we do not guarantee that you will get support. However, we do guarantee that we did/will do our best. Hence, the license.

Obtain Hardware and Software

The first thing to be done is putting the board in the the box and making sure it runs. To get started, you'll need to perform the following steps:

Obtaining NetFPGA Hardware
How you can acquire NetFPGA hardware
Obtaining a Host PC for the NetFPGA
How you can buy or build your Host PC.
Obtaining Gateware/Software Package
How you can get an account and download the Beta package from NetFPGA.org
Obtain Designs
Projects:Packet_generator
Contributed Designs

Obtaining NetFPGA Hardware

Image:NetFPGA_150.gif

The NetFPGA boards can be obtained from a third-party company, Digilent Inc. The cards are sold for a discounted price when used for Educational purposes. They are also available for commercial use, but pricing is higher. Stanford University provided the open reference design to Digilent Inc., but is not involved in the sale of the hardware.

Complete NetFPGA systems can also be ordered on-line that include the NetFPGA hardware pre-installed in a host PC.

Ordering From the Web

The easiest way to purchase hardware is to order on-line from Digilent

Ordering with a Purchase Order by Email or Phone

Academic institutions can order the hardware with a discount by placing a purchase order.

  • To start the process, send an email to request a quote from: sales@digilentinc.com
  • Have your university execute a purchase order and have that sent to Digilent Inc.
  • For help with ordering, call: (509) 334-6306

Obtaining a Host PC for the NetFPGA

NetFPGAs host systems can be built from commodity, off-the-shelf (COTS) parts. The NetFPGA card fits into a standard PCI slot in a desktop or server-class PC. We have only tested the NetFPGA in a few of a few widely-available systems. The NetFPGA may work with other PCs as well, but we do not support such configurations.

There are currently multiple ways to obtain a NetFPGA host system:

  1. Assemble your on PC using from components
  2. Purchase a Dell 2950 from Dell.com then add the NetFPGA.
  3. Purchase a complete pre-built system

To install a NetFPGA, you will need to open the case to your computer. To minimize the chance that you damage your computer or the NetFPGA module, we suggest that you wear an anti-static wrist strap when handing the hardware.

Assemble your PC from Components

The most cost-effective way to build a high-performance NetFPGA host system is to purchase the components from on-line vendors and assemble your own machine. This effort is not for the faint of heart, however, as you will need to place multiple orders for components and have the time to assemble and test the PC. We assembled all of the nf-test machines at Stanford University. You can use the Bill of Materials (BOM) below to do the same.

Image:CAD_PC.jpg

List of PC Components

At Stanford, we built 11 nf-test PCs that we use in the lab and used at the North American tutorials. This is the least expensive way to build a high-end development system, but does take some time to assemble the parts.

Motherboard
  • Use Micro ATX (uATX) for small case
  • Option 1: Gigabyte MA78GM-US2H mATX MB
    • AMD 780G Chipset / SB 700 / Rev 1.0
    • Includes one port of GigE on the motherboard
    • Includes ATI Radeon HE3200 Video (leaves PCI-E slot open)
    • DDR2 1200 DRAM support (supports RAM faster than DDR2 800)
    • 2PCI+PCIe x1+PCIe x16
    • AM2+ Phenom II Support (allows for use of quad-core CPU)
      • Be sure to upgrade BIOS to latest available to make use of all features
    • Availability
  • Option 2: ASUS M2N-VM DVI - Micro ATX Motherboard
    • Item=N82E16813131214 from NewEgg.com : $59.99
    • Set the BIOS to use the on-board Video. Use the PCI-express bus for the NIC
    • We built a dozen nf-test cube machines at Stanford using this motherboard in 2007-2008 combined with the dual-core CPU. If you can't still locate this (now) older board, use option (1)
CPU
Host Memory
DVD Reader/Writer (for boot disk)
MicroATX Chassis with clear covers
Intel Pro/1000 Dual-port Gigabit PCI-Express PCI-express x4 NIC
Hard Disk
Cat5E or Cat6 Ethernet Cables
  • Category 5e or Category 6 Ethernet Cables
    • Short-length: 1 foot ~= 30 cm, Blue (for host)
    • Short-length: 1 foot ~= 30 cm, Orange (for host)
    • Medium-length: 6 foot ~= 2m, White (for neighbor machine)
    • Medium-length: 6 foot ~= 2m, Red (for neighbor machine)
    • Long-length: 12 foot ~= 4m, Blue (for Internet)
Other Misc. Parts
Total estimated cost to build a cube
  • About $700 USD
  • Note that prices will vary
    • (but generally become less expensive over time)

Purchase a Dell 2950

A pre-configured Dell 2950 2U Rackmount PC can be purchased from Dell. We have verified that the NetFPGA works in the PCI-X slot of the Dell 2950 2U Rack-mount server. The cost for a pre-built Dell server typically ranges from $3,000 to $5,000 depending on the configuration you select. Running the selftest requires purchasing a SATA cable and two Ethernet cables.

Image:Dell_2950.jpg

Note: When installing the NetFPGA in a system, it is important that the card is securely fastened to the chassis.

In addition to locking in the faceplate at the front of the system, the card should also be locked in at the rear of the card using a mounting bracket. As shown below, there is a gap between the end of the card and the slot that holds a full-length PCI card, shown shown below:

Image:NetFPGA_in_Dell_2950.jpg

For PCs that have standard full-length PCI slots, Retainers to secure the back of the NetFPGA to a chassis are available from Gompf as: http://www.bracket.com/downloads/brackets/pdf/91060000AFG.pdf

The actual bracket required depends on the size of the chassis. http://www.bracket.com/retainerslist.asp

To use the card in the Dell 2950, a laser-cut extender was built to enable the card to extend to the end of the card slot. This may help if you ship the NetFPGA in the 2950, or if the 2950 is in a high-vibration environment. Support bracket design files and ordering info.

During shipment, the printed circuit board can vibrate or shake within the chasis causing mechanical damage. Systems should not be shipped with the NetFPGA card pre-installed.

Purchase a Pre-built Machine

A third-party vendor has just started building complete system with the NetFPGA hardware and software pre-installed. The complete turn-key system, including the NetFPGA card, are available from Accent Technolgy Inc.

http://www.accenttechnologyinc.com/product_details.php?category_id=0&item_id=1

Image:Prebuilt_NetFPGA_System.jpg

During shipment, the printed circuit board can vibrate or shake within the chasis causing mechanical damage. Systems should not be shipped with the NetFPGA card pre-installed. Cards should be shipped separately from the chassis and installed on site to avoid damage.

Obtaining Gateware/Software Package

The Beta release of the NetFPGA Package (NFP) contains the source code for gateware, system software, and regression tests. The NFP includes an IPv4 Router, a four-port NIC, an IPv4 Router with Output Queues Monitoring System, the PW-OSPF software that interacts with the IPv4 Router (SCONE), and the Router Kit which is a daemon that reflects the routing table and ARP cache from the Linux host to the IPv4 router on NetFPGA.

The instructions in this section have been superseded by the instructions in the Install Software 1.2 section below.

Register to download the Beta NetFPGA Package (NFP)

To download the Beta NFP:

  1. if you don't already have a dev or alpha account, sign up for a new beta account as:
    http://netfpga.org/netfpgawiki/index.php?title=Special:Userlogin&type=signup
  2. when your new account is created:
    1. you will be automatically added the NetFPGA Beta mailing list.
      1. This email list will be used to post announcements about the NetFPGA
      2. Let your SPAM filter pass email for: netfpga-beta@lists.stanford.edu
    2. you will be given a Beta account on the NetFPGA Wiki
      1. Note that the first letter of the login name is Capitalized
      2. Remember this password, as you'll need it to download source code
    3. you will also be sent an email for a message from NetFPGAwiki.
      1. within that message will be URL that needs to be opened
      2. click on the URL to verify that the email address you provided is valid.
  3. if you have an account but have forgotten your password, click the e-mail password button on:
    http://netfpga.org/netfpgawiki/index.php?title=Special:Userlogin

Download the Beta NetFPGA Package (NFP)

The NFP currently comprises two tarballs:

  • netfpga_base_beta_1_x.tar.gz, which includes regression scripts and binary versions of the reference projects. Replace 'x' with the latest version.
  • netfpga_lib.tar.gz, which includes all external java code needed by the router gui

Download the tarballs from http://NetFPGA.org/beta/distributions. Later, you will unpack them in the same directory.

Download the extended NetFPGA Package (BetaPlus, optional)

An extended package that includes source code to the reference router is available to instructors of courses like Stanford's CS344 and researchers that need full source code to the router. The full source distribution is NOT available to the general public because course instructors that use the NetFPGA assign projects to implement components of the router.

For users that join the BetaPlus group, they must promise not to redistribute the source code. For the sake of educating future generations of networking students, it is critical that teachers and researchers that download this package do not redistribute it.


To qualify for access to the additional source code:

  1. Complete the survey on-line as: http://netfpga.org/survey.html
  2. Send an email to jwlockwd@stanford.edu to request access to the Beta-plus group
    1. Be sure to send the email from .edu domain and include your credentials (position, homepage)
    2. Provide your Wiki login in your message (just the login, not the password)
    3. Provide a written guarantee that you will not re-distribute the source code or make it available to any students that may take a NetFPGA-like course
  3. Allow 7 days for review of your application
  4. Upon recipt of email confirmation that your application has been approved, download the NFP as: http://NetFPGA.org/betaplus

Install Software 1.2

Installing an Operating System on the Host PC
Describes how to install CentOS on the host computer
Software installation 1.2
Install the NetFPGA device driver and self-test program & bitfile
Install CAD Tools
Install Computer Aided Design tools to enable synthesis and simulation of hardware circuits (Optional)

Installing an Operating System on the Host PC

We support use of the popular Linux Distribution CentOS as the operating system for the Host PC. CentOS is a free variation of the popular RedHat distribution.

CentOS Installation Instructions

We have tested the NetFPGA with the 32-bit version of CentOS 4.4, CentOS 4.5, CentOS 5.1, and CentOS 5.2 operating systems.

You can create your own bootable CentOS DVD by downloading an ISO [1]

Burn the image onto a DVD:

Image:CentOS_DVD.jpg

Install CentOS http://netfpga.org/CentOS_Install.pdf

  • We use GNOME as the default window manager
  • Do not install Virtualization
  • Set SELinux to: Not enforcing
    • Otherwise you will need to manually adjust security settings
  • Attach your Eth0 (motherboard's primary interface) to the external network
    • It is easiest to use DHCP to set the IP address, gateway, DNS

Apply CentOS Updates

  • After installation, Package updater should prompt for an update
  • Select to install all updates

Other tested but unsupported operating systems

Use of other operating systems is possible, but we do not support them.


Software installation

For archival purposes the install instructions for the NetFPGA Package 1.0 can be found at Install_Software_1.0. Use the instructions below to install newest version NetFPGA package.

Log in as root

  • Log in as root or 'su -' to root

Install Java

  • Download the Java JDK (JDK 6 Update 6) Linux RPM in self-extracting file from SUN
Java JDK 6 update 6
  • If running the command:
 java -version
  • Add execute permission to JDK file
 chmod +x jdk-6u6-linux-i586-rpm.bin
  • Install JDK. Scroll down and say 'yes' when prompted.
 ./jdk-6u6-linux-i586-rpm.bin
  • Install the key for the JPackage repository
 rpm --import http://jpackage.org/jpackage.asc
  • Install the JPackage repository information for yum
 cd /etc/yum.repos.d
wget http://www.jpackage.org/jpackage17.repo
  • Install the Java JRE
yum -y --enablerepo=jpackage-generic-nonfree install java-1.6.0-sun-compat.i586
  • Expected Output
 Dependencies Resolved

=============================================================================
Package Arch Version Repository Size
=============================================================================
Installing:
java-1.6.0-sun-compat i586 1.6.0.06-1jpp jpackage-generic-nonfree
54 k

Transaction Summary
=============================================================================
Install 1 Package(s)
Update 0 Package(s)
Remove 0 Package(s)
Total download size: 54 k
Is this ok [y/N]: y
Downloading Packages:
(1/1): java-1.6.0-sun-com 100% |=========================| 54 kB 00:00
Running Transaction Test
Finished Transaction Test
Running Transaction
Installing : java-1.6.0-sun-compat ######################### [1/1]

Installed: java-1.6.0-sun-compat.i586 0:1.6.0.06-1jpp
Complete!
  • Set default JAVA path to new JRE
 /usr/sbin/alternatives --config java
Expected Output
 There are 2 programs which provide 'java'.

Selection Command
-----------------------------------------------
1 /usr/lib/jvm/jre-1.4.2-gcj/bin/java
*+ 2 /usr/lib/jvm/jre-1.6.0-sun/bin/java

Enter to keep the current selection[+], or type selection number:
Select number corresponding to jre-1.6.0-sun

Install RPMforge Yum repository

  • Install the RPMforge repository for your operating system.
For CentOS 4:
CentOS 4 wiki documentation on installing RPMforge
For CentOS 5:
CentOS 5 wiki documentation on installing RPMforge

Install NetFPGA Base Package

  • Install NetFPGA yum repository and GPG Key - there are two different versions for CentOS 4 and 5. To determine your version, run the command:
 cat /etc/redhat-release
For CentOS 4:
rpm -Uhv http://netfpga.org/yum/el4/RPMS/noarch/netfpga-repo-1-1_CentOS4.noarch.rpm
For CentOS 5:
rpm -Uhv http://netfpga.org/yum/el5/RPMS/noarch/netfpga-repo-1-1_CentOS5.noarch.rpm
  • Next, for both versions, run the following command to install the NetFPGA base package
yum install netfpga-base
  • Note that there may be some dependencies. Select 'y' to install these dependent packages.

Create NF2 directory in your user account

Run the following script to copy the entire NF2 directory into your account (typically: /root/NF2). WARNING: Running this command WILL overwrite any existing NF2 directory or files in your user account! If you have files that you want to preserve, 'mv' your NF2 directory to another location, such as NF2_backup.

To copy the NetFPGA directory and set the environment variables run the following command

 /usr/local/NF2/lib/scripts/user_account_setup/user_account_setup.pl

It also adds the following environment variables to your .bashrc file.

  • NF2_ROOT
  • NF2_DESIGN_DIR
  • NF2_WORK_DIR
  • PYTHONPATH
  • PERL5LIB

Reboot your machine

Reboot your machine in order to finalize the installation.

Install CAD Tools

We provide the Verilog source code the modules so that users can compile, simulate, and synthesize gateware for the NetFPGA. We have tested simulation and synthesis using a specific version of the Xilinx tools (as described below). Use of other versions of the tools (older or newer) is not supported. If you do not plan to rebuild the hardware circuits, you can skip installation of CAD tools.

Install Xilinx ISE

  • Xilinx: ISE Foundation, Version: 9.2i SP4
    • Install Service Pack 4
    • Install IP Update 2
    • Use of other versions of the tools (older or newer) is not supported.
    • Obtain a license for the V2Pro TEMAC core from Xilinx.
      • Part Number: DO-DI-TEMAC, Ver 3.3
      • For a free evaluation copy
        • Request "Full System hardware Evaluation"
        • Allows use of the TEMAC for 30 days, 8 hour run-time
      • Academic users can request a donation of the core and CAD tools
        • Mention use of the NetFPGA when you submit the request
      • Commercial users can purchase the core through their local sales representative.

Install ModelSim

  • To simulate Verilog, install:
    • Mentor Graphics: ModelSim
      • Version SE 6.2G
      • Allows simulation of circuits and viewing of simulated waveforms.
      • Testbench software assumes use of this version of ModelSim.

Debug with ChipScope

  • To debug signals on the FPGA using an on-chip logic analyzer, install:
    • Xilinx: ChipScope Pro
      • Version 9.1.02i
      • Allows monitoring of signals on NetFPGA
      • Requires use of a PC with JTAG interface

Install Memory Modules for Simulation

Micron DDR2 SDRAM

Cypress SRAM

Verify the software and hardware

Compile and Load Driver
Run the makefile to build the executables
Run Selftest
Verify the functionality of your NetFPGA system
Run Regression Tests
Each project has a set of regression tests that verify the functionality of the distributed code.
These should be run before starting to use any of the projects from the NFP.


Compile and Load Driver

Compile driver and tools

  • Compile
cd ~/NF2/
make
  • Sample correct output:
make -C C
make[1]: Entering directory `/home/gac1/temp/NF2/lib/C'
make -C kernel
make[2]: Entering directory `/home/gac1/temp/NF2/lib/C/kernel'
make -C /lib/modules/2.6.9-55.0.9.ELsmp/build M=/home/gac1/temp/NF2/lib/C/kernel LDDINC=/home/gac1/temp/NF2/lib/C/kernel/../include modules
make[3]: Entering directory `/usr/src/kernels/2.6.9-55.0.9.EL-smp-i686'
Building modules, stage 2.
MODPOST
make[3]: Leaving directory `/usr/src/kernels/2.6.9-55.0.9.EL-smp-i686'
make[2]: Leaving directory `/home/gac1/temp/NF2/lib/C/kernel'
make -C download
make[2]: Entering directory `/home/gac1/temp/NF2/lib/C/download'
make -C ../common
make[3]: Entering directory `/home/gac1/temp/NF2/lib/C/common'
make[3]: Nothing to be done for `all'.
make[3]: Leaving directory `/home/gac1/temp/NF2/lib/C/common'
make[2]: Leaving directory `/home/gac1/temp/NF2/lib/C/download'
make -C reg_access
make[2]: Entering directory `/home/gac1/temp/NF2/lib/C/reg_access'
make -C ../common
make[3]: Entering directory `/home/gac1/temp/NF2/lib/C/common'
make[3]: Nothing to be done for `all'.
make[3]: Leaving directory `/home/gac1/temp/NF2/lib/C/common'
make[2]: Leaving directory `/home/gac1/temp/NF2/lib/C/reg_access'
make -C router
make[2]: Entering directory `/home/gac1/temp/NF2/lib/C/router'
gcc -lncurses cli.o ../common/nf2util.o ../common/util.o ../common/reg_defines.h -o cli
gcc -lncurses regdump.o ../common/nf2util.o ../common/reg_defines.h -o regdump
gcc -lncurses show_stats.o ../common/nf2util.o ../common/util.o ../common/reg_defines.h -o show_stats
make[2]: Leaving directory `/home/gac1/temp/NF2/lib/C/router'
make[1]: Leaving directory `/home/gac1/temp/NF2/lib/C'
make -C scripts
make[1]: Entering directory `/home/gac1/temp/NF2/lib/scripts'
make -C cpci_reprogram
make[2]: Entering directory `/home/gac1/temp/NF2/lib/scripts/cpci_reprogram'
make[2]: Nothing to be done for `all'.
make[2]: Leaving directory `/home/gac1/temp/NF2/lib/scripts/cpci_reprogram'
make -C cpci_config_reg_access
make[2]: Entering directory `/home/gac1/temp/NF2/lib/scripts/cpci_config_reg_access'
make[2]: Nothing to be done for `all'.
make[2]: Leaving directory `/home/gac1/temp/NF2/lib/scripts/cpci_config_reg_access'
make[1]: Leaving directory `/home/gac1/temp/NF2/lib/scripts'
  • If you get an error message such as "make: *** /lib/modules/2.6.9-42.ELsmp/build: No such file or directory. Stop.", then kernel sources are need to build the driver.


Load driver and tools

  • Install the driver and reboot. The driver will be stored in /lib/modules/`uname -r`/kernel/drivers/nf2.ko
make install
  • Sample correct output:
 for dir in lib bitfiles projects/scone/base projects/selftest/sw ; do \
make -C $dir install; \
done
make[1]: Entering directory `/home/gac1/temp/NF2/lib'
for dir in C scripts java/gui ; do \
make -C $dir install; \
done
make[2]: Entering directory `/home/gac1/temp/NF2/lib/C'
for dir in kernel download reg_access router ; do \
make -C $dir install; \
done
make[3]: Entering directory `/home/gac1/temp/NF2/lib/C/kernel'
make -C /lib/modules/2.6.9-55.0.9.ELsmp/build M=/home/gac1/temp/NF2/lib/C/kernel LDDINC=/home/gac1/temp/NF2/lib/C/kernel/../include modules
make[4]: Entering directory `/usr/src/kernels/2.6.9-55.0.9.EL-smp-i686'
Building modules, stage 2.
MODPOST
make[4]: Leaving directory `/usr/src/kernels/2.6.9-55.0.9.EL-smp-i686'
install -m 644 nf2.ko /lib/modules/`uname -r`/kernel/drivers/nf2.ko /sbin/depmod -a
make[3]: Leaving directory `/home/gac1/temp/NF2/lib/C/kernel'
make[3]: Entering directory `/home/gac1/temp/NF2/lib/C/download'
install nf2_download /usr/local/bin
make[3]: Leaving directory `/home/gac1/temp/NF2/lib/C/download'
make[3]: Entering directory `/home/gac1/temp/NF2/lib/C/reg_access'
install regread /usr/local/bin
install regwrite /usr/local/bin
make[3]: Leaving directory `/home/gac1/temp/NF2/lib/C/reg_access'
make[3]: Entering directory `/home/gac1/temp/NF2/lib/C/router'
make[3]: Nothing to be done for `install'.
make[3]: Leaving directory `/home/gac1/temp/NF2/lib/C/router'
make[2]: Leaving directory `/home/gac1/temp/NF2/lib/C'
make[2]: Entering directory `/home/gac1/temp/NF2/lib/scripts'
for dir in cpci_reprogram cpci_config_reg_access ; do \
make -C $dir install; \
done
make[3]: Entering directory `/home/gac1/temp/NF2/lib/scripts/cpci_reprogram'
install cpci_reprogram.pl /usr/local/sbin
make[3]: Leaving directory `/home/gac1/temp/NF2/lib/scripts/cpci_reprogram'
make[3]: Entering directory `/home/gac1/temp/NF2/lib/scripts/cpci_config_reg_access'
install dumpregs.sh /usr/local/sbin
install loadregs.sh /usr/local/sbin
make[3]: Leaving directory `/home/gac1/temp/NF2/lib/scripts/cpci_config_reg_access'
make[2]: Leaving directory `/home/gac1/temp/NF2/lib/scripts'
make[2]: Entering directory `/home/gac1/temp/NF2/lib/java/gui'
make[2]: Nothing to be done for `install'.
make[2]: Leaving directory `/home/gac1/temp/NF2/lib/java/gui'
make[1]: Leaving directory `/home/gac1/temp/NF2/lib'
make[1]: Entering directory `/home/gac1/temp/NF2/bitfiles'
for bitfile in CPCI_2.1.bit cpci_reprogrammer.bit ; do \
install -D -m 0644 $bitfile /usr/local/NF2/bitfiles/$bitfile ; \
done
make[1]: Leaving directory `/home/gac1/temp/NF2/bitfiles'
make[1]: Entering directory `/home/gac1/temp/NF2/projects/scone/base'
make[1]: Nothing to be done for `install'.
make[1]: Leaving directory `/home/gac1/temp/NF2/projects/scone/base'
make[1]: Entering directory `/home/gac1/temp/NF2/projects/selftest/sw'
make[1]: Nothing to be done for `install'.
make[1]: Leaving directory `/home/gac1/temp/NF2/projects/selftest/sw'
  • Reboot the machine. The driver currently crashes upon rmmod, so a reboot is required to load the newly compiled driver. You may want to check if other users are on the machine with the 'who' command first. If you don't like the other current machine users or you're the only one on the machine, run the following:
reboot
  • After reboot log in as root.
  • Verify that the driver loaded:
lsmod  | grep nf2
  • Sample correct output:
nf2                    28428  0

Verify NetFPGA interfaces

  • Verify that four nf2cX interfaces have successfully loaded:
ifconfig -a | grep nf2
  • Sample correct output:
nf2c0     Link encap:Ethernet  HWaddr 00:4E:46:32:43:00
nf2c1 Link encap:Ethernet HWaddr 00:4E:46:32:43:01
nf2c2 Link encap:Ethernet HWaddr 00:4E:46:32:43:02
nf2c3 Link encap:Ethernet HWaddr 00:4E:46:32:43:03

Reprogram the CPCI

  • Run the cpci reprogramming script
/usr/local/sbin/cpci_reprogram.pl --all 
(to reprogram all NetFPGAs in a system)
  • Expected output:
 Loading the CPCI Reprogrammer on NetFPGA 0
Loading the CPCI on NetFPGA 0
CPCI on NetFPGA 0 has been successfully reprogrammed
  • Every time you restart the computer, you need to reload the CPCI!
  • To have the CPCI reprogrammed when the computer boots add the following line to /etc/rc.local
 /usr/local/NF2/lib/scripts/cpci_reprogram/cpci_reprogram.pl --all
  • If the NetFPGA refuses to send packets, and the regression or selftest is failing, make sure you've reprogrammed the cpci.

Run Selftest

The NetFPGA self-test is an FPGA bitfile and software that ensures that all of the components on your platform are fully functional. The self-test consists of both an FPGA bitfile that contains logic and interfaces to external components as well the software that displays the results. The self-test excercises all of the hardware in parallel. The test continues to run repeatedly until terminated by the user. The self-test was run at the factory just after the cards were manufactured. Cards are not distributed unless they completely pass all functions of the self-test process.

The self-test bitfile performs rigorous testing of the SRAM and DDR2 DRAM to ensure that all memory lines can be properly written to and read back with the same data. Multiple data patterns are used to ensure that no address or data lines have faults. The network test sends bursts of packets on the Ethernet interfaces and the loopback cables are put in place to that packets can be read and compared to the data that was transmitted. The SATA loopback test transmits data using the Multi-Gigabit I/O lines (MGIOs) to ensure that data can be reliably transmitted on the high-speed I/O interfaces. The DMA test exercises the PCI Controller (CPCI), the VirtexII, and the PCI bus to ensure that large blocks of data can be sent between the NetFPGA the host computer's memory. The selftest bitfile runs all of the tests above in parallel and continously runs until it is terminated. The self-test software displays the results of testing on a console.

We provide the self-test bitfile and the software to end-users so that the self-test can be run when the hardware is delivered. When you receive a NetFPGA card, we suggest that you run the self-test to ensure that the card is still fully functional and that the card works properly in your environment. Before running the self-test, be sure that you have connected the loopback cables as shown in the directions on how to set up a NetFPGA in your system. If the loopback cables are not connected, the self-test will correctly report that an interface appears non-functional.

The following instructions assume that you have successfully installed a NetFPGA card with CentOS. The selftest is an enhanced version of the test run on every NetFPGA at the factory by Digilent to verify proper hardware operation.

Connect loopback cables

Install a SATA cable to loopback the board-to-board high-speed serial I/O.

Note: To minimize the chance that you damage your computer or the NetFPGA module, wear an anti-static wrist strap when handing the hardware.

Image:SATA_Loopback.jpg

Install two Ethernet cables as shown:

Image:ENET_Loopback1.jpg


Image:ENET_Loopback2.jpg

Bring nf2cX interfaces up

  • Type:
for i in `seq 0 3`; do ifconfig nf2c$i up; done

Load self-test bitfile

  • Type:
nf2_download ~/NF2/bitfiles/selftest.bit

Run Selftest

  • If you have connected a SATA cable to the NetFPGA, type the following command.
 ~/NF2/projects/selftest/sw/selftest
  • Otherwise, type the following command.
 ~/NF2/projects/selftest/sw/selftest -n
  • Expected Output:
 Found net device: nf2c0
NetFPGA selftest 1.00 alpha
Running..... PASSED

Run Regression Tests

The regression test suite is a set of tests that exercise the functionality of the released gateware and software. On a fast machine, this test should take approximately 10 minutes.

The features exercised by regression test suite are the only features we will try to provide support for. Additional features might be available and functional in the released gateware, but they are not supported.

For more information on the features we support, as defined by tests, see the following:

  • The NIC supports a set of features
    • The details of how each feature is tested is described in the NIC Regression test document available both on the Wiki and Web.
  • The Reference Router (RR) supports a set of features
    • The details of how each feature is tested is described in the RR Regression test, a large document available both on the Wiki and Web.

Please make sure that you have successfully completed the selftest before proceeding to the regression tests.

Connect Ethernet test cables

  • Connect 'eth1' to 'nf2c0' (c0 is the port closest to the mainboard)
  • Connect 'eth2' to 'nf2c1' (c1 is the port one away from the mainboard)
    • The location of your eth1 and eth2 ports may vary depending on your NIC
    • The photo below shows the configuration of a nf-test machine
      • Image:Ethernet_Test_Cables.jpg

Log in as root through X session

  • Log in as root or 'su -' to root using an X session, because we will be testing the GUI Scone

Load reference_router bitfile

  • Download the reference bitfile to the NetFPGA board:
nf2_download ~/NF2/bitfiles/reference_router.bit
  • Sample correct output:
Found net device: nf2c0
Bit file built from: nf2_top_par.ncd
Part: 2vp50ff1152
Date: 2007/10/ 9
Time: 22: 3: 4
Error Registers: 1000000
Good, after resetting programming interface the FIFO is empty
Download completed - 2377668 bytes. (expected 2377668).
DONE went high - chip has been successfully programmed.

Run regression test suite

  • Run the regression test suite. The tests should take about 10 minutes total.
~/NF2/bin/nf21_regress_test.pl
  • Sample correct output:
 Running tests on project 'driver'...
Running test 'driver_compile'... PASS
Running test 'driver_install'... PASS
Running test 'verify_mtu'... PASS
Running global teardown... PASS

Running tests on project 'reference_nic'...
Running test 'download_nic'... PASS
Running test 'test_loopback_random'... PASS
Running test 'test_loopback_minsize'... PASS
Running test 'test_loopback_maxsize'... PASS
Running test 'test_loopback_drop'... PASS
Running test 'test_ip_interface'... PASS
Running global teardown... PASS

Running tests on project 'reference_router'...
Running global setup... PASS
Running test 'test_router_cpusend/run.pl'... PASS
Running test 'test_wrong_dest_mac'... PASS
Running test 'test_nonip_packet'... PASS
Running test 'test_nonipv4_packet'... PASS
Running test 'test_invalidttl_packet'... PASS
Running test 'test_lpm_misses'... PASS
Running test 'test_arp_misses'... PASS
Running test 'test_badipchecksum_packet'... PASS
Running test 'test_ipdest_filter_hit'... PASS
Running test 'test_packet_forwarding'... PASS
Running test 'test_lpm'... PASS
Running test 'test_lpm_next_hop'... PASS
Running test 'test_queue_overflow'... PASS
Running test 'test_oq_limit'... PASS
Running test 'test_ipdest_filter'... PASS
Running test 'test_oq_sram_sz_cpu'... PASS
Running test 'test_oq_sram_sz_mac'... PASS
Running test 'test_router_table/run.pl'... PASS
Running test 'test_send_rec/run.pl'... PASS
Running test 'test_lut_forward'... PASS
Running global teardown... PASS

Running tests on project 'scone'...
Running global setup... PASS
Running test 'test_build'... PASS
Running test 'test_mac_set'... PASS
Running test 'test_ip_set'... PASS
Running test 'test_rtable_set'... PASS
Running test 'test_disabled_interfaces/run.pl'... PASS
Running test 'test_noniparp_ethtype'... PASS
Running test 'test_arp_rpl/run.pl'... PASS
Running test 'test_arp_norpl/run.pl'... PASS
Running test 'test_arp_quepkt/run.pl'... PASS
Running test 'test_ip_error/run.pl'... PASS
Running test 'test_ip_rtblmiss/run.pl'... PASS
Running test 'test_ip_intfc/run.pl'... PASS
Running test 'test_ip_checksum/run.pl'... PASS
Running test 'test_ttl_expired/run.pl'... PASS
Running test 'test_send_receive/run.pl'... PASS
Running test 'test_arp_req/run.pl'... PASS
Running test 'test_tcp_port/run.pl'... PASS
Running test 'test_udp_packet/run.pl'... PASS
Running test 'test_icmp_echo/run.pl'... PASS
Running test 'test_icmp_notecho/run.pl'... PASS
Running global teardown... PASS

Running tests on project 'gui_scone'...
Running global setup... PASS
Running test 'test_main_frame'... PASS
Running test 'test_routing_table'... PASS
Running test 'test_arp_table'... PASS
Running test 'test_port_config_table'... PASS
Running global teardown... PASS

Running tests on project 'router_kit'...
Running global setup... PASS
Running test 'test_00_make/run.sh'... PASS
Running test 'test_01_ip_dst_filter/run.pl'... PASS
Running test 'test_02_route_table/run.pl'... PASS
Running test 'test_03_arp_table/run.pl'... PASS
Running test 'test_04_ip_packets/run.pl'... PASS
Running global teardown... PASS

Running tests on project 'router_buffer_sizing'...
Running global setup... PASS
Running test 'test_time_stamp/run'... PASS
Running test 'test_store_event/run'... PASS
Running global teardown... PASS
  • If there are no errors (all tests say PASS), you can play with your router, or go on to creating a bitfile from source.

Run regression scripts on new bitfile

If you installed the CAD tools, you should run this test to verify that you can build a new circuit. Skip this step if you do not plan to modify hardware.

Synthesize reference_router bitfile, from source

Note: This step will take about 45-60 mins. This can be used to verify the setup of the machine for synthesis. You will need to have the NetFPGA Beta Plus package. The Beta (not Plus) package does not include the sources for this step.

  • If you are a hardware developer and would like to synthesize your own NetFPGA Router hardware using the Verilog source code, follow the steps below. To synthesize FPGA hardware, you will need to have all of the FPGA Development tools installed.
  • Login, either direct in an X session or via ssh -X. This step causes ~/nf2_profile, plus environment variables, to be sourced. You may say "But I'm not running anything graphical!" and you'd be right. Unfortunately, even when called with no gui, the Xilinx tools require X to be running. A bugreport on this issue has been filed to Xilinx.
  • Set up the Xilinx ISE tools (see Xilinx's website for instructions). Make sure the Xilnx tools are in your path and that the XILINX environment variable is set.
  • Go to the synthesis directory for the reference_nic and run make. This step should take under an hour on a well-endowed machine.
cd ~/NF2/projects/reference_router/synth
time make
  • Verify the reference_router bitfile (nf2_top_par.bit) has been created.
ls | grep nf2_top_par.bit
  • Sample correct output:
nf2_top_par.bit

Load new bitfile

  • Download the fresh bitfile to the NetFPGA board:
nf2_download nf2_top_par.bit

Run regression-test suite on new bitfile

  • Re-run the regression test suite.
~/NF2/bin/nf21_regress_test.pl
  • Sample correct output:
 Running tests on project 'driver'...
Running test 'driver_compile'... PASS
Running test 'driver_install'... PASS
Running test 'verify_mtu'... PASS
Running global teardown... PASS

Running tests on project 'reference_nic'...
Running test 'download_nic'... PASS
Running test 'test_loopback_random'... PASS
Running test 'test_loopback_minsize'... PASS
Running test 'test_loopback_maxsize'... PASS
Running test 'test_loopback_drop'... PASS
Running test 'test_ip_interface'... PASS
Running global teardown... PASS

Running tests on project 'reference_router'...
Running global setup... PASS
Running test 'test_router_cpusend/run.pl'... PASS
Running test 'test_wrong_dest_mac'... PASS
Running test 'test_nonip_packet'... PASS
Running test 'test_nonipv4_packet'... PASS
Running test 'test_invalidttl_packet'... PASS
Running test 'test_lpm_misses'... PASS
Running test 'test_arp_misses'... PASS
Running test 'test_badipchecksum_packet'... PASS
Running test 'test_ipdest_filter_hit'... PASS
Running test 'test_packet_forwarding'... PASS
Running test 'test_lpm'... PASS
Running test 'test_lpm_next_hop'... PASS
Running test 'test_queue_overflow'... PASS
Running test 'test_oq_limit'... PASS
Running test 'test_ipdest_filter'... PASS
Running test 'test_oq_sram_sz_cpu'... PASS
Running test 'test_oq_sram_sz_mac'... PASS
Running test 'test_router_table/run.pl'... PASS
Running test 'test_send_rec/run.pl'... PASS
Running test 'test_lut_forward'... PASS
Running global teardown... PASS

Running tests on project 'scone'...
Running global setup... PASS
Running test 'test_build'... PASS
Running test 'test_mac_set'... PASS
Running test 'test_ip_set'... PASS
Running test 'test_rtable_set'... PASS
Running test 'test_disabled_interfaces/run.pl'... PASS
Running test 'test_noniparp_ethtype'... PASS
Running test 'test_arp_rpl/run.pl'... PASS
Running test 'test_arp_norpl/run.pl'... PASS
Running test 'test_arp_quepkt/run.pl'... PASS
Running test 'test_ip_error/run.pl'... PASS
Running test 'test_ip_rtblmiss/run.pl'... PASS
Running test 'test_ip_intfc/run.pl'... PASS
Running test 'test_ip_checksum/run.pl'... PASS
Running test 'test_ttl_expired/run.pl'... PASS
Running test 'test_send_receive/run.pl'... PASS
Running test 'test_arp_req/run.pl'... PASS
Running test 'test_tcp_port/run.pl'... PASS
Running test 'test_udp_packet/run.pl'... PASS
Running test 'test_icmp_echo/run.pl'... PASS
Running test 'test_icmp_notecho/run.pl'... PASS
Running global teardown... PASS

Running tests on project 'gui_scone'...
Running global setup... PASS
Running test 'test_main_frame'... PASS
Running test 'test_routing_table'... PASS
Running test 'test_arp_table'... PASS
Running test 'test_port_config_table'... PASS
Running global teardown... PASS

Running tests on project 'router_kit'...
Running global setup... PASS
Running test 'test_00_make/run.sh'... PASS
Running test 'test_01_ip_dst_filter/run.pl'... PASS
Running test 'test_02_route_table/run.pl'... PASS
Running test 'test_03_arp_table/run.pl'... PASS
Running test 'test_04_ip_packets/run.pl'... PASS
Running global teardown... PASS

Running tests on project 'router_buffer_sizing'...
Running global setup... PASS
Running test 'test_time_stamp/run'... PASS
Running test 'test_store_event/run'... PASS
Running global teardown... PASS

Walkthrough the Reference Designs

This section describes some walkthroughs to help in using the distributed source code. Each walkthrough will deal with a different aspect of using the reference designs. Some sections will be marked as optional reading. These sections explain what is going on behind the scenes to put things together and are not within the main focus of the walkthrough. The walkthroughs are the following:

Image:HW_SW_Diagram.gif

Reference NIC Walkthrough
Mainly an introduction to the software/hardware interface. How to make software talk to the hardware.
SCONE Walkthrough
More complex example of user process communicating to hardware. How to talk to the router and set entries.
Router Kit Walkthrough
How to install and use the router kit.
Reference Router Walkthrough
Introduces more details about the hardware, and how to modify it.
Buffer Monitoring System
Describes how a circuits was added to measure the lengths of buffers.

Reference NIC Walkthrough

The reference NIC walkthrough will go through an example of using some of the tools that are distributed with the release, and an example of how to write a simple C program to interface to the hardware.

Using counterdump

Assuming the regression tests have been completed successfully and the NFP is installed correctly, we can now proceed to use one of the distributed projects: the NIC. In the rest of the exercises, we assume that NFP is installed in the user's home directory. Replace the '~' with the full path to the installation location if not. To run the tools, IP addresses must be assigned to the nf2cX interfaces. To do that, run the following command as root after replacing all the x's:

/sbin/ifconfig nf2cX x.x.x.x

Next, we need to download the NIC bitfile onto the NetFPGA. As root, run the following:

~/NF2/lib/C/download/nf2_download ~/NF2/bitfiles/reference_nic.bit

You should see output similar to the following:

Found net device: nf2c0
Bit file built from: nf2_top_par.ncd
Part: 2vp50ff1152
Date: 2007/11/21
Time: 11: 0: 3
Error Registers: 1000000
Good, after resetting programming interface the FIFO is empty
Download completed - 2377668 bytes. (expected 2377668).
DONE went high - chip has been successfully programmed.

To compile the utilities type:

cd ~/NF2/lib/C/nic
make

One of the built tools is called counterdump. This simple tool reads several hardware counters and dumps the counts to the terminal. To use it, type:

./counterdump

You should see an output similar to this:

Found net device: nf2c0
Num pkts received on port 0: 0
Num pkts dropped (rx queue 0 full): 0
Num pkts dropped (bad fcs q 0): 0
Num bytes received on port 0: 0
Num pkts sent from port 0: 0
Num bytes sent from port 0: 0

Num pkts received on port 1: 0
Num pkts dropped (rx queue 1 full): 0
Num pkts dropped (bad fcs q 1): 0
Num bytes received on port 1: 0
Num pkts sent from port 1: 0
Num bytes sent from port 1: 0

Num pkts received on port 2: 0
Num pkts dropped (rx queue 2 full): 0
Num pkts dropped (bad fcs q 2): 0
Num bytes received on port 2: 0
Num pkts sent from port 2: 0
Num bytes sent from port 2: 0

Num pkts received on port 3: 0
Num pkts dropped (rx queue 3 full): 0
Num pkts dropped (bad fcs q 3): 0
Num bytes received on port 3: 0
Num pkts sent from port 3: 0
Num bytes sent from port 3: 0

Using send_pkts

Now let's try to send and receive some packets and check the counter outputs again. One of the tools that are included in the NFP is a tool called send_pkts. This tool can send arbitrary Ethernet packets from any given port, but you will need root access to use it. To use this tool:

cd ~/NF2/lib/C/tools/send_pkts
make

If everything goes correctly, you should see an output similar to this:

 gcc `libnet-config --defines --cflags` -O2 -o send_pkts send_pkts.c `libnet-config --libs` -L/usr/lib-lnet -lpcap --static

For the next part, we will test sending a few packets from one of the ports. To send packets, issue the following commands:

cd ~/NF2/lib/C/tools/send_pkts
sudo ./send_pkts -i nf2c0 -s 10 -l 100

The last command sends 10 100-byte packets out of port 0 on the NetFPGA (port 0 is the port closest to the PCI connector on the NetFPGA). If you have a machine connected to the same LAN section as that port, you should be able to capture the packets using Wireshark.

Now check the counters again:

~/NF2/lib/C/nic/counterdump

You should see an output similar to this:

Found net device: nf2c0
Num pkts received on port 0: 0
Num pkts dropped (rx queue 0 full): 0
Num pkts dropped (bad fcs q 0): 0
Num bytes received on port 0: 0
Num pkts sent from port 0: 10
Num bytes sent from port 0: 1000

Num pkts received on port 1: 0
Num pkts dropped (rx queue 1 full): 0
Num pkts dropped (bad fcs q 1): 0
Num bytes received on port 1: 0
Num pkts sent from port 1: 0
Num bytes sent from port 1: 0

Num pkts received on port 2: 0
Num pkts dropped (rx queue 2 full): 0
Num pkts dropped (bad fcs q 2): 0
Num bytes received on port 2: 0
Num pkts sent from port 2: 0
Num bytes sent from port 2: 0

Num pkts received on port 3: 0
Num pkts dropped (rx queue 3 full): 0
Num pkts dropped (bad fcs q 3): 0
Num bytes received on port 3: 0
Num pkts sent from port 3: 0
Num bytes sent from port 3: 0

If your NetFPGA ports are not connected to a quiet network, then you'll probably see different results.

Understanding the Hardware/Software Interface

The counters that have been dumped using counterdump are actually memory-mapped I/O registers. These are counters that exist in the NetFPGA FPGA hardware and are exported via the PCI interface. The software uses ioctl calls to do reads and writes into these registers. The ioctl calls are wrapped in two simple functions readReg and writeReg. In this section, we will open up counterdump.c and understand the software/hardware interface. counterdump.c is shown below with line numbers for reference.

    1	/****************************************************************************
2 * vim:set shiftwidth=2 softtabstop=2 expandtab:
3 * $Id$
4 *
5 * Module: counterdump.c
6 * Project: NetFPGA NIC
7 * Description: dumps the MAC Rx/Tx counters to stdout
8 * Author: Jad Naous
9 *
10 * Change history:
11 *
12 */
13
14 #include <stdio.h>
15 #include <stdlib.h>
16 #include <unistd.h>
17
18 #include <net/if.h>
19
20 #include "../common/reg_defines.h"
21 #include "../common/nf2.h"
22 #include "../common/nf2util.h"
23
24 #define PATHLEN 80
25
26 #define DEFAULT_IFACE "nf2c0"
27
28 /* Global vars */
29 static struct nf2device nf2;
30
31 /* Function declarations */
32 void dumpCounts();
33 void processArgs (int , char **);
34 void usage (void);
35
36 int main(int argc, char *argv[])
37 {
38 nf2.device_name = DEFAULT_IFACE;
39
40 processArgs(argc, argv);
41
42 // Open the interface if possible
43 if (check_iface(&nf2))
44 {
45 exit(1);
46 }
47 if (openDescriptor(&nf2))
48 {
49 exit(1);
50 }
51
52 dumpCounts();
53
54 closeDescriptor(&nf2);
55
56 return 0;
57 }
58
59 void dumpCounts()
60 {
61 unsigned val;
62
63 readReg(&nf2, RX_QUEUE_0_NUM_PKTS_STORED_REG, &val);
64 printf("Num pkts received on port 0:  %u\n", val);
65 readReg(&nf2, RX_QUEUE_0_NUM_PKTS_DROPPED_FULL_REG, &val);
66 printf("Num pkts dropped (rx queue 0 full):  %u\n", val);
67 readReg(&nf2, RX_QUEUE_0_NUM_PKTS_DROPPED_BAD_REG, &val);
68 printf("Num pkts dropped (bad fcs q 0):  %u\n", val);
69 readReg(&nf2, RX_QUEUE_0_NUM_BYTES_PUSHED_REG, &val);
70 printf("Num bytes received on port 0:  %u\n", val);
71 readReg(&nf2, TX_QUEUE_0_NUM_PKTS_SENT_REG, &val);
72 printf("Num pkts sent from port 0:  %u\n", val);
73 readReg(&nf2, TX_QUEUE_0_NUM_BYTES_PUSHED_REG, &val);
74 printf("Num bytes sent from port 0:  %u\n\n", val);
75
76 readReg(&nf2, RX_QUEUE_1_NUM_PKTS_STORED_REG, &val);
77 printf("Num pkts received on port 1:  %u\n", val);
78 readReg(&nf2, RX_QUEUE_1_NUM_PKTS_DROPPED_FULL_REG, &val);
79 printf("Num pkts dropped (rx queue 1 full):  %u\n", val);
80 readReg(&nf2, RX_QUEUE_1_NUM_PKTS_DROPPED_BAD_REG, &val);
81 printf("Num pkts dropped (bad fcs q 1):  %u\n", val);
82 readReg(&nf2, RX_QUEUE_1_NUM_BYTES_PUSHED_REG, &val);
83 printf("Num bytes received on port 1:  %u\n", val);
84 readReg(&nf2, TX_QUEUE_1_NUM_PKTS_SENT_REG, &val);
85 printf("Num pkts sent from port 1:  %u\n", val);
86 readReg(&nf2, TX_QUEUE_1_NUM_BYTES_PUSHED_REG, &val);
87 printf("Num bytes sent from port 1:  %u\n\n", val);
88
89 readReg(&nf2, RX_QUEUE_2_NUM_PKTS_STORED_REG, &val);
90 printf("Num pkts received on port 2:  %u\n", val);
91 readReg(&nf2, RX_QUEUE_2_NUM_PKTS_DROPPED_FULL_REG, &val);
92 printf("Num pkts dropped (rx queue 2 full):  %u\n", val);
93 readReg(&nf2, RX_QUEUE_2_NUM_PKTS_DROPPED_BAD_REG, &val);
94 printf("Num pkts dropped (bad fcs q 2):  %u\n", val);
95 readReg(&nf2, RX_QUEUE_2_NUM_BYTES_PUSHED_REG, &val);
96 printf("Num bytes received on port 2:  %u\n", val);
97 readReg(&nf2, TX_QUEUE_2_NUM_PKTS_SENT_REG, &val);
98 printf("Num pkts sent from port 2:  %u\n", val);
99 readReg(&nf2, TX_QUEUE_2_NUM_BYTES_PUSHED_REG, &val);
100 printf("Num bytes sent from port 2:  %u\n\n", val);
101
102 readReg(&nf2, RX_QUEUE_3_NUM_PKTS_STORED_REG, &val);
103 printf("Num pkts received on port 3:  %u\n", val);
104 readReg(&nf2, RX_QUEUE_3_NUM_PKTS_DROPPED_FULL_REG, &val);
105 printf("Num pkts dropped (rx queue 3 full):  %u\n", val);
106 readReg(&nf2, RX_QUEUE_3_NUM_PKTS_DROPPED_BAD_REG, &val);
107 printf("Num pkts dropped (bad fcs q 3):  %u\n", val);
108 readReg(&nf2, RX_QUEUE_3_NUM_BYTES_PUSHED_REG, &val);
109 printf("Num bytes received on port 3:  %u\n", val);
110 readReg(&nf2, TX_QUEUE_3_NUM_PKTS_SENT_REG, &val);
111 printf("Num pkts sent from port 3:  %u\n", val);
112 readReg(&nf2, TX_QUEUE_3_NUM_BYTES_PUSHED_REG, &val);
113 printf("Num bytes sent from port 3:  %u\n\n", val);
114 }
115
116 /*
117 * Process the arguments.
118 */
119 void processArgs (int argc, char **argv )
120 {
121 char c;
122
123 /* don't want getopt to moan - I can do that just fine thanks! */
124 opterr = 0;
125
126 while ((c = getopt (argc, argv, "i:h")) != -1)
127 {
128 switch (c)
129 {
130 case 'i': /* interface name */
131 nf2.device_name = optarg;
132 break;
133 case '?':
134 if (isprint (optopt))
135 fprintf (stderr, "Unknown option `-%c'.\n", optopt);
136 else
137 fprintf (stderr,
138 "Unknown option character `\\x%x'.\n",
139 optopt);
140 case 'h':
141 default:
142 usage();
143 exit(1);
144 }
145 }
146 }
147
148
149 /*
150 * Describe usage of this program.
151 */
152 void usage (void)
153 {
154 printf("Usage: ./counterdump <options> \n\n");
155 printf("Options: -i <iface> : interface name (default nf2c0)\n");
156 printf(" -h : Print this message and exit.\n");
157 }

Let's go through the code.

   20	#include "../common/reg_defines.h"
21 #include "../common/nf2.h"
22 #include "../common/nf2util.h"

Line 20 includes the header file that contains all the register addresses on the NetFPGA. This is needed to refer to register addresses as constant names rather than numeric addresses. Lines 21 and 22 include macros to access registers. These functions are used later in the code.

   29	static struct nf2device nf2;

The nf2 struct will hold information about the device we are trying to access.

Now let's go into our main function.

   38	  nf2.device_name = DEFAULT_IFACE;

Set a default name for the device we are trying to access. This is useful so that the user of this program doesn't have to specify the name if she is using the default device (which is true in most cases).

   40	  processArgs(argc, argv);

Parses the command line options. In this simple program, the only command line option is to change the name of the interface we are trying to access.

   43	  if (check_iface(&nf2))

Checks that the interface exists and can be reached.

   47	  if (openDescriptor(&nf2))

Tries to open the interface for reading/writing using ioctl calls. The interface has to be up and assigned an IP address for a non-root user to be able to access it using ioctl calls.

   52	  dumpCounts();

calls the function to dump all the counts.

   54	  closeDescriptor(&nf2);

Closes the interface after we are done using it to be polite.

Reading and writing registers uses two functions:

int readReg(nf2device *nf2, unsigned int addr, unsigned int *val)
Reads the register at address addr from device nf2 and writes the value in *val. Returns 1 on fail, 0 on success.
int writeReg(nf2device *nf2, unsigned int addr, unsigned int val)
Writes val into the register at address addr from device nf2 . Returns 1 on fail, 0 on success.

As an example we will look at two lines in the dumpCounts() function. The rest of the lines are similar:

   63	  readReg(&nf2, RX_QUEUE_0_NUM_PKTS_STORED_REG, &val);
64 printf("Num pkts received on port 0:  %u\n", val);

Line 63 reads the number of packets received into Rx Queue 0 into val, and line 64 prints out the result.

The registers available for access are documented in the Register Map. Unfortunately, registers keep getting added and removed as the design evolves, so the most current list of registers is in the reg_defines.h file that defines the register addresses. This file is generated automatically from the Verilog source code when it is simulated so it always has the most recent list of registers and their addresses. The registers in the Register Map are divided into groups corresponding to the modules described in the Verilog. The next section will go into more details on these modules.

Next, we will modify this file to dump the device ID which is assigned at implementation time. To do that, open file NF2/lib/C/common/reg_defines.h file and copy the macro that defines the device ID to replace the XXXX in the following lines, and copy the lines into the start of dumpCounts() function after the declaration of val.

readReg(&nf2, XXXX, &val);
printf("Device ID:  %u\n\n", val);

Then after saving the file, type make in the directory containing counterdump.c and run the program again. You should now see an output similar to the following:

Found net device: nf2c0
Device ID: 1
Num pkts received on port 0:           0
Num pkts dropped (rx queue 0 full): 0
Num pkts dropped (bad fcs q 0): 0
Num bytes received on port 0: 0
Num pkts sent from port 0: 10
Num bytes sent from port 0: 1000

Num pkts received on port 1: 0
Num pkts dropped (rx queue 1 full): 0
Num pkts dropped (bad fcs q 1): 0
Num bytes received on port 1: 0
Num pkts sent from port 1: 0
Num bytes sent from port 1: 0

Num pkts received on port 2: 0
Num pkts dropped (rx queue 2 full): 0
Num pkts dropped (bad fcs q 2): 0
Num bytes received on port 2: 0
Num pkts sent from port 2: 0
Num bytes sent from port 2: 0

Num pkts received on port 3: 0
Num pkts dropped (rx queue 3 full): 0
Num pkts dropped (bad fcs q 3): 0
Num bytes received on port 3: 0
Num pkts sent from port 3: 0
Num bytes sent from port 3: 0

Reference Pipeline

Diagram of the reference pipeline
Diagram of the reference pipeline

The division of the hardware into modules was hinted at in the previous section. Understanding these modules is essential in making the most of the available designs. The distributed projects in the NFP, including the NIC, all follow the same modular structure. This design is a pipeline where each stage is a separate module. A diagram of the pipeline is shown on the right.

The first stage in the pipeline consists of several queues which we call the Rx queues. These queues receive packets from IO ports such as the Ethernet ports and the PCI over DMA and provide a unified interface to the rest of the system. These ports are connected into a wrapper called the User Data Path which contains the processing stages. The current design (version 1.0 Beta) has 4 Ethernet Rx queues and 4 CPU DMA queues. The Ethernet and DMA queues are interleaved so that to the User Data Path, Rx Queue 0 is Ethernet Port 0, Rx Queue 1 is DMA port 0, Rx Queue 2 is Ethernet Port 1, and so on. Packets that arrive into CPU DMA Rx Queue X are packets that have been sent by the software out of interface nf2cX.

In the User Data Path, the first module a packet passes through is the Input Arbiter. The input arbiter decides which Rx queue to service next, and pulls the packet from that Rx queue and hands it to the next module in the pipeline: The output port lookup module. The output port lookup module is responsible for deciding which port a packet goes out of. After that decision is made, the packet is then handed to the output queues module which stores the packet in the output queues corresponding to the output port until the Tx queue is ready to accept the packet for transmission.

The Tx queues are analogous to the Rx queues and they send packets out of the IO ports instead of receiving. Tx queues are also interleaved so that packets sent out of User Data Path port 0 are sent to Ethernet Tx queue 0, and packets sent out of User Data Path port 1 are sent to CPU DMA Tx queue 0, and so on. Packets that are handed to DMA Tx queue X pop out of interface nf2cX. For more information on these modules, you can go here.

For almost each of these modules, there is a set of registers that access status information and set control signals for the module. These registers are described in the Register Map.

What to do From Here

SCONE Walkthrough

What is SCONE?

The router SCONE (Software Component Of NetFPGA) is a user level router that performs IPv4 forwarding, handles ARPs and various ICMP messages, has telnet (port 23) and web (port 8080) interfaces to handle router control, and also implements a subset of OSPF named PW-OSPF. SCONE mirrors a copy of its MAC addresses, IP addresses, routing table, and ARP table to the NetFPGA card which hardware accelerates the forwarding path.

How to use SCONE?

  1. Ensure your NETFPGA is programmed with the Reference Router bitfile
  2. Check that your nf2c0, nf2c1, nf2c2, nfc3 interfaces are up and do not have an assigned IPv4 address
    • To remove an IPv4 address from an interface run 'ifconfig nf2cX 0.0.0.0' replacing X with the interface name
  3. Setup the cpuhw file
    • A text file named cpuhw must exist in the sw directory that you execute the scone binary. The format is as follows:
      • <interface name> <ip address> <ip mask> <mac address>
    • An example:
      • eth0 192.168.0.2 255.255.255.0 00:00:00:00:00:01
    • Inside SCONE the interfaces are typically named and referred to as eth0-3, which correspond to the nf2c0-3 ports.
  4. (Optional) Specify a text file containing static routes
    • A text file containing static routes to be added on startup can be specified from the command line using the '-r' parameter, ie '-r rtable.netfpga'. The format is as follows:
      • <net> <next hop> <mask> <outgoing interface name>
    • An example:
      • 192.168.130.14 192.168.130.9 255.255.255.254 eth1
  5. (Optional) Specify a file to log packets in pcap format
    • Specify a file using the '-l' command line parameter, ie '-l scone.log', all packets received by SCONE will be logged using pcap format to this file. This file can be opened and examined using Wireshark. Note packets that take the hardware forwarding path will not appear in this log file.
SCONE's web interface
SCONE's web interface

To modify the way SCONE operates after launch, use the telnet command line interface, or the web interface. The web interface supports all the commands available from the telnet interface, and provides AJAX style capabilities to keep the data current as the router runs. To use either of these interfaces, connect to one of the IP addresses specified in the cpuhw file on port 23 for telnet, or port 8080 for http. To get a list of commands using telnet issue the ? command. A graphical version of this printout is available from the Command List link of the web interface. For help with a command enter the command and put ? for its arguments. IE 'show ip ?'. Many of the commands are documented in the Required Commands section here.

How does SCONE work with NetFPGA?

For an incoming packet to take the hardware forwarding path the packet must get a hit on the destination MAC address table, the routing table, and the ARP table for the next hop. SCONE builds and maintains the routing table containing static and dynamic routes, and the ARP table in software. When changes are detected in the software tables, SCONE copies them down into the NetFPGA's hardware tables. If you want packets to be passed up to software that match the IP addresses assigned to the interfaces you must also push these IP addresses down into a hardware table. Following are some code snippets for writing to these various tables. In general to write you first write the registers representing the data for the row, then you write the row number into the corresponding WR register. To read a row, first write the row number you want into the RD register, then read the data from the corresponding registers.

Writing MAC Addresses to the NetFPGA

uint8_t mac_addr[6];
unsigned int mac_hi = 0;
unsigned int mac_lo = 0;
mac_hi |= ((unsigned int)mac_addr[0]) << 8;
mac_hi |= ((unsigned int)mac_addr[1]);
mac_lo |= ((unsigned int)mac_addr[2]) << 24;
mac_lo |= ((unsigned int)mac_addr[3]) << 16;
mac_lo |= ((unsigned int)mac_addr[4]) << 8;
mac_lo |= ((unsigned int)mac_addr[5]);
writeReg(&netfpga, ROUTER_OP_LUT_MAC_0_HI_REG, mac_hi);
writeReg(&netfpga, ROUTER_OP_LUT_MAC_0_LO_REG, mac_lo);
// 1,2,3 can be substituted in for 0 to set the MAC for the other ports

NOTE: One confusing aspect of using a user level process to control the router is that the MAC addresses of the router will not match the kernel MAC addresses. That is, the MAC addresses of the nf2cX interfaces do not necessarily reflect those that are actually in the hardware. The nf2cX interfaces are software independent entities that do not necessarily reflect the state of the hardware. In the next section we will describe the Router Kit which reflects the Linux's view of the network to HCORR (Hardware Component of Reference Router). This includes the ports' MAC addresses, IP addresses, the routing table, and the ARP cache.

Writing interface IP Addresses to the NetFPGA

The HCORR has a table that stores IP addresses called the destination IP filter table. Packets whose destination IP address matches an entry in this table will be sent to the hardware. The number of entries in this table can be found in the Register Map. We use this table to filter out the IP addresses of the router and the IP addresses of PW-OSPF multicast packets. We write into the table as follows:

struct in_addr ip;
writeReg(&rs->netfpga, ROUTER_OP_LUT_DST_IP_FILTER_IP_REG, ntohl(ip.s_addr));
// i is the row to write the IP address to
writeReg(&rs->netfpga, ROUTER_OP_LUT_DST_IP_FILTER_WR_ADDR_REG, i);

Writing routing entries to the NetFPGA

Writing to the Routing table is similar to writing to the Destination IP filter table. Note that the output port(s) is specified in one-hot-encoded format corresponding to the output port of the User Data Path to the Tx Queues. The function getOneHotPortNumber returns the values (in decimal): 1 , 4, 16, 64 which correspond to physical output ports 0-3 on the NetFPGA card. You can also tell a route to specifically send to software by specifying 2, 8, 32, 128 corresponding to the nf2c0,1,2,3 interfaces. To send out of MAC ports 0 and 1, write 1+4=5 as the output port entry.

struct in_addr ip, mask, gw;
int i;
char* iface;
/* write the ip */
writeReg(&netfpga, ROUTER_OP_LUT_RT_IP_REG, ntohl(ip.s_addr));
/* write the mask */
writeReg(&netfpga, ROUTER_OP_LUT_RT_MASK_REG, ntohl(mask.s_addr));
/* write the next hop */
writeReg(&netfpga, ROUTER_OP_LUT_RT_NEXT_HOP_IP_REG, ntohl(gw.s_addr));
/* write the port */
writeReg(&netfpga, ROUTER_OP_LUT_RT_OUTPUT_PORT_REG, getOneHotPortNumber(iface));
/* write the row number */
writeReg(&netfpga, ROUTER_OP_LUT_RT_LUT_WR_ADDR_REG, i);

Writing ARP entries to the NetFPGA

uint8_t mac_addr[6];
unsigned int mac_hi = 0;
unsigned int mac_lo = 0;
struct in_addr ip;
int i;
mac_hi |= ((unsigned int)mac_addr[0]) << 8;
mac_hi |= ((unsigned int)mac_addr[1]);
mac_lo |= ((unsigned int)mac_addr[2]) << 24;
mac_lo |= ((unsigned int)mac_addr[3]) << 16;
mac_lo |= ((unsigned int)mac_addr[4]) << 8;
mac_lo |= ((unsigned int)mac_addr[5]);
writeReg(&netfpga, ROUTER_OP_LUT_ARP_MAC_HI_REG, mac_hi);
writeReg(&netfpga, ROUTER_OP_LUT_ARP_MAC_LO_REG, mac_lo);
/* write the next hop ip data */
writeReg(&netfpga, ROUTER_OP_LUT_ARP_NEXT_HOP_IP_REG, ntohl(ip.s_addr));
/* set the row */
writeReg(&netfpga, ROUTER_OP_LUT_ARP_LUT_WR_ADDR_REG, i);

Router Kit Walkthrough

Overview

Router Kit is a simple approach to providing hardware acceleration to an unmodified Linux system. It is comprised of a single program, rkd (Router Kit Daemon), which monitors the Linux routing table and ARP cache and mirrors it down to the NetFPGA IPv4 reference router implementation.

Running Router Kit

  rkd [-h|--help} [-d|--daemon} [-i|--interval] <in_ms>

rkd should work from the command line without any external configuration options. Simply run (./rkd). To run the process in the background use -d. You may specify the polling time in milliseconds using the -i option.

Using Router Kit

Router Kit is only useful on a Linux host with NetFPGA2 installed, and the ipv4 reference router bitfile loaded. Given this setup each port on the NetFPGA2 card is available to Linux via a nfc* interface (i.e. nfc0, nfc1, nfc2, and nfc3 assuming a single card is installed).

rkd will attempt to mirror all ARP cache and routing table entries associate with a NetFPGA interface into hardware. This provides a very simple (and familiar) method of of adding entries to the hardware. For example, to add a static ARP entry, simply use the arp(8) command. The following command will add a static ARP entry.

   arp -s 1.2.3.4 ca:fe:de:ad:d0:d0 -i nfc0

To add an entry into the routing table, use route(8) (or ip(8)). For example, adding a default entry with a next hop of 10.0.0.1 out of the first port would look something like:

   route add default gw 10.0.0.1 dev nfc0

Router kit is not limited to manual manipulation from the command line. All state (including dynamic state) is mirrored. To wit, running rkd alongside a standard routing daemon, such as XoRP, or Zebra/Quagga, should provide hardware acceleration of the forwarding table without any further configuration (provided the routing software is using the NetFPGA interfaces for forwarding).

How it Works

rkd continuously polls the routing table and ARP cache state from /proc/net/route and /proc/net/arp respectively. When a change in state is detected, ./rkd writes the updated state to the NetFPGA through the register interface. All traffic not handled by the hardware is DMA'd to software where it is processed by the Linux kernel.

Reference Router Walkthrough

In this section we will go through the available tools to communicate with the Hardware Component of the Reference Router (HardCORR) and go through the process of modifying the design, simulating it, and finally implementing it. The tools that we will go over quickly are a Java GUI and a Standalone Command Line Interface (CLI).

Java GUI

The Java GUI allows the user to change entries in the Routing Table and ARP cache as well as the router's MAC and IP addresses. It also provides updates on counter values and graphs of throughput and much more. The GUI has a part that is written in C that provides the interface between the Java binaries and the driver. This native library is compiled from the nf2util.c file that contains the readReg and writeReg utilities used in the previous section. The library connects to the GUI using the Java Native Access (jna) library.

To build the GUI, first make sure that you have Sun's Java Development Kit (version >=1.6.0) installed and make sure the java, javac, and jar binaries are in your path (otherwise, you can edit the Makefile under lib/java/gui to reflect the actual locations). Then cd into NF2/lib/java/gui and type make clean. Then type make. You should get output similar to below:

make[1]: Entering directory `/home/jnaous/NF2/lib/C/common'
gcc -fpic -c nf2util.c
gcc -shared nf2util.o -o libnf2.so
make[1]: Leaving directory `/home/jnaous/NF2/lib/C/common'
Building java...
Done
Writing router gui manifest...
Building router jar...
Writing script to start router gui...
Writing event capturing router gui manifest...
Building event capturing router jar...
Writing script to start event capturing router gui...

To run the GUI for the router, cd into NF2/lib/java/gui and type ./router.sh. The GUI should pop-up. The GUI constantly polls the data it reads from the hardware. To make updates faster, you can change the update rate under the File menu at the top left.

The Quickstart Panel provides a summary of things that can be done with the hardware. There is also a tab for viewing statistics and a tab for details. The details page will show the data path pipeline we saw earlier under the Reference NIC Walkthrough with clickable buttons. Clicking on those buttons will open up new buttons with more details and configuration settings.

Command Line Interpreter

A standalone small CLI that allows you to change routing table entries, ARP cache entries, and other settings is also provided in case the user doesn't want to run SCONE. The CLI is under NF2/lib/C/router. To build it, type make in that directory. You should see output similar to the following:

gcc -g    -c -o cli.o cli.c
gcc -g -c -o ../common/util.o ../common/util.c
gcc -lncurses cli.o ../common/nf2util.o ../common/util.o ../common/reg_defines.h -o cli
gcc -g -c -o regdump.o regdump.c
gcc -lncurses regdump.o ../common/nf2util.o ../common/reg_defines.h -o regdump
gcc -g -c -o show_stats.o show_stats.c
gcc -lncurses show_stats.o ../common/nf2util.o ../common/util.o ../common/reg_defines.h -o show_stats

For help on using the CLI, start it by typing ./cli and then type help in the CLI.

We invite you to extend this CLI and any of our software tools and contribute them back so we can expand our library and make it easier for anybody to use NetFPGA.

Modifying the Reference Router

This section will guide you through the process of creating your own project based on the reference router and adding a library module in the data path that would limit the rate at which the NetFPGA output packets on a particular port. We will first go through an overview of the router design and the interface between modules. Then we will explain how to add a library module into the design and put it in the data path. Following that, we will go through verifying the design by simulation and finally implementing it as a downloadable bitfile.

Reference Pipeline Details

Diagram of the reference pipeline
Diagram of the reference pipeline

Hopefully you still remember the reference pipeline and what each major module in the pipeline does. We show it again here for reference. Please go over it again if you need to know what each module does.

The user data path is 64-bits wide running at 125MHz. This would give the data path a peak bandwidth of 8Gbps. Packets are passed between modules using a fifo-like simple push interface with four signals: WR, RDY, DATA, and CTRL. The WR and RDY signals are each one bit wide while the CTRL is 8 bits wide and the DATA is 64 bits wide. Say module i wants to push a packet to module i+1. When i+1 is ready to accept some data, it will assert the RDY signal. Module i then puts the packet data on the DATA bus, sets the CTRL bus according to the data it is transmitting (the values of the CTRL bus are discussed later), and raises the WR signal whenever it is ready to send. Module i+1 should be ready to latch the data transmitted the next clock cycle after asserting the RDY signal. If module i+1 cannot accept anymore data, it will deassert the RDY signal at least one clock cycle before the data should stop arriving. The figure below demonstrates a valid transaction.

Packet hand-off between two consecutive modules.
Packet hand-off between two consecutive modules.

As a packet is processed by the modules in the data path, each module has the choice of either adding a 64-bit word to the beginning of the packet that contains the result of the processing or modifying a word that already exists. We call this word a module header. Modules that come later in the pipeline can use the information in the module header(s) to do further processing on the packet. In the reference designs we use only one module header that is modified by almost all modules in the pipeline.

The CTRL bus is used for two purposes. The first is to distinguish module headers from each other (see next paragraph), and to determine the end of the packet. The CTRL bus is non-zero for module headers and distinguishes module headers from each other when there are multiple module headers. When the actual packet received starts after the module headers, the CTRL bus is reset to 0, and then at the last word of the packet, the CTRL lines will indicate which byte is the last byte in the last word. This is done by setting a 1 in the position of the last byte. Note that the first byte of the packet is stored in the most significant byte position (byte 7, i.e. bits 63-56) of the first word of the packet, and so on. For example, if the last word has only 1 byte, then the last word will have the last byte in byte 7, and the CTRL word associated with that DATA word is 0b10000000. On the other hand, if the last word has six valid bytes (i.e. packet length in bytes mod 8 = 6) then the CTRL word that signifies the end-of-packet will be 0b00000100.

The Rx Queues create a module header when they receive a packet and prepend it to the beginning of the packet. The Rx queues store the length of the packet in bytes at the lowest 16 bits of the module header, the source port as a binary-encoded number (port 0 is MAC port 0 and port 1 is CPU port 0, ...) in bits 15-31, and the packet length in 64-bit words in bits 32-47. We call this module header the IOQ module header.

Format of a packet as it passes through the hardware pipeline.
Format of a packet as it passes through the hardware pipeline.

The Input Arbiter selects an Rx queue to service and pushes a packet into the Output Port Lookup module without modifying the module header. The Output Port Lookup module decides which output port(s) a packet goes out of and writes the output ports selection as a one-hot-encoded number into bits 56-63 of the IOQ module header. This number has a one for every port the packet should go out on where port 0 is MAC0, port 1 is CPU0, port 2 is MAC1, ...

The Output Queues module looks at the IOQ module header to decide which output queue to store the packet in and uses the lengths from the IOQ module header to store the packet efficiently. After the packet is removed from its output queue and pushed into its destination Tx Queue, the IOQ module is finally removed before sending the packet out of the appropriate port. The diagram on the right shows the format of a packet as it passes through the reference pipeline.

Register Pipeline

Module authors may wish to incorporate registers that are accessible from the host within their modules. To simplify the process of adding modules the register interfaces of the modules are connected together in a pipeline manner instead of being connected in a star topology to a central arbiter. This greatly simplifies the process of adding a module to a design as it does not require the central arbiter to be modified.

The register pipeline is 32-bits wide and runs at 125 MHz. Each module should have two pairs of ports: one for incoming requests and one for outgoing replies. The following set of signals are the input signals for a single module: REG_REQ_IN, REG_ACK_IN, REG_RD_WR_L_IN, REG_ADDR_IN (23-bits), REG_DATA_IN (32-bits), REG_SRC_IN (2-bits). Equivalent signals ending in _OUT exist for the output port.

Register requests/replies are signaled by a high on REG_REQ_*. REG_REQ_* should only be high for a single clock cycle otherwise it indicates multiple register acccess. Note that a module is permitted to take more than one clock cycle to produce a reply but it should ensure that requests following the initial request are not dropped.) The REG_RD_WR_L_* signal indicates whether the transaction is a read (high) or a write (low). REG_ACK_* should be low when the request is generated and should only be brought high by the module responding to the request.

A module identifies whether it is the target of a request by inspecting the REG_ADDR_IN signal. If the address matches the address range assigned to the module then the module should process the request and generate a response. Once the module has completed any necessary processing it should raise REG_ACK_OUT, set REG_DATA_OUT to the correct value in the case of a read, and forward all other inputs to the outputs, all for a single cycle. If a module determines that it is not the target of a request then it should forward all inputs unmodified to the outputs on the next clock cycle.

The REG_SRC_* signals are used by register request initiators to identify the responses that are destined to the requestor. Each requestor should use a unique value as their source address.

Outside the Reference Pipeline

There is a number of modules that are not described by the Reference Pipeline. These are the nf2_top and nf2_core modules which contain the Reference Pipeline as well as modules that are needed to generate the various clocks on the Virtex chip and interface with the PCI controller, SRAM, DRAM, ... These modules are outside the scope of this document. We invite you to help us extend this section of the documentation so that others may benefit from your experience.

Using a Library Module

The NFP contains several modules that can be used to add more features to your hardware design. The modules all exist under NF2/lib/verilog. The module we will add in this walkthrough is a rate limiter which allows you to control the rate at which packets aresent out of a port. The module will be added into the user_data_path.v file.

Type the following:

~$ cd ~/NF2/projects/
~/NF2/projects$ cp -r reference_router/ rate_limited_router
~/NF2/projects$ mkdir rate_limited_router/src/udp
~/NF2/projects$ cp ../lib/verilog/user_data_path/reference_user_data_path/src/user_data_path.v rate_limited_router/src/udp/

We have now created a copy of the reference HCORR and made a local copy of the user_data_path.v file for overriding the reference one. The following assumes that you know Verilog. We will now connect four rate_limiter modules in the pipeline between the output_queues module and the MAC output ports of the user_data_path.v. The rate_limiter module is under NF2/lib/verilog/rate_limiter. You can take a look at it for reference. After modifying NF2/projects/rate_limited_router/src/udp/user_data_path.v to add the rate_limiter modules, the diff of the original user_data_path.v with the new one should look similar to below. You can find the modified user_data_path.v here:

234a235,251
> //------- Rate limiter wires/regs ------
> wire [CTRL_WIDTH-1:0] rate_limiter_in_ctrl[0:3];
> wire [DATA_WIDTH-1:0] rate_limiter_in_data[0:3];
> wire rate_limiter_in_wr[0:3];
> wire rate_limiter_in_rdy[0:3];
>
> wire [CTRL_WIDTH-1:0] rate_limiter_out_ctrl[0:3];
> wire [DATA_WIDTH-1:0] rate_limiter_out_data[0:3];
> wire rate_limiter_out_wr[0:3];
> wire rate_limiter_out_rdy[0:3];
>
> wire rate_limiter_in_reg_req[0:4];
> wire rate_limiter_in_reg_ack[0:4];
> wire rate_limiter_in_reg_rd_wr_L[0:4];
> wire [`UDP_REG_ADDR_WIDTH-1:0] rate_limiter_in_reg_addr[0:4];
> wire [`CPCI_NF2_DATA_WIDTH-1:0] rate_limiter_in_reg_data[0:4];
> wire [UDP_REG_SRC_WIDTH-1:0] rate_limiter_in_reg_src[0:4];

In the above we have added wires to connect the new modules.

360,363c377,380
< .out_data_0 (out_data_0),
< .out_ctrl_0 (out_ctrl_0),
< .out_wr_0 (out_wr_0),
< .out_rdy_0 (out_rdy_0),
---
> .out_data_0 (rate_limiter_in_data[0]),
> .out_ctrl_0 (rate_limiter_in_ctrl[0]),
> .out_wr_0 (rate_limiter_in_wr[0]),
> .out_rdy_0 (rate_limiter_in_rdy[0]),
370,373c387,390
< .out_data_2 (out_data_2),
< .out_ctrl_2 (out_ctrl_2),
< .out_wr_2 (out_wr_2),
< .out_rdy_2 (out_rdy_2),
---
> .out_data_2 (rate_limiter_in_data[1]),
> .out_ctrl_2 (rate_limiter_in_ctrl[1]),
> .out_wr_2 (rate_limiter_in_wr[1]),
> .out_rdy_2 (rate_limiter_in_rdy[1]),
380,383c397,400
< .out_data_4 (out_data_4),
< .out_ctrl_4 (out_ctrl_4),
< .out_wr_4 (out_wr_4),
< .out_rdy_4 (out_rdy_4),
---
> .out_data_4 (rate_limiter_in_data[2]),
> .out_ctrl_4 (rate_limiter_in_ctrl[2]),
> .out_wr_4 (rate_limiter_in_wr[2]),
> .out_rdy_4 (rate_limiter_in_rdy[2]),
390,393c407,410
< .out_data_6 (out_data_6),
< .out_ctrl_6 (out_ctrl_6),
< .out_wr_6 (out_wr_6),
< .out_rdy_6 (out_rdy_6),
---
> .out_data_6 (rate_limiter_in_data[3]),
> .out_ctrl_6 (rate_limiter_in_ctrl[3]),
> .out_wr_6 (rate_limiter_in_wr[3]),
> .out_rdy_6 (rate_limiter_in_rdy[3]),
414,419c431,436
< .reg_req_out (udp_reg_req_in),
< .reg_ack_out (udp_reg_ack_in),
< .reg_rd_wr_L_out (udp_reg_rd_wr_L_in),
< .reg_addr_out (udp_reg_addr_in),
< .reg_data_out (udp_reg_data_in),
< .reg_src_out (udp_reg_src_in),
---
> .reg_req_out (rate_limiter_in_reg_req[0]),
> .reg_ack_out (rate_limiter_in_reg_ack[0]),
> .reg_rd_wr_L_out (rate_limiter_in_reg_rd_wr_L[0]),
> .reg_addr_out (rate_limiter_in_reg_addr[0]),
> .reg_data_out (rate_limiter_in_reg_data[0]),
> .reg_src_out (rate_limiter_in_reg_src[0]),

Above: Instead of connecting the output ports of the output_queues module to the user_data_path output ports, connect them to the rate limiter modules. The same goes for the register ring connections.

437c454,525
<
---
> generate
> genvar i;
> for (i=0; i<4; i=i+1) begin: gen_rate_limiters
> rate_limiter #(
> .DATA_WIDTH (DATA_WIDTH),
> .UDP_REG_SRC_WIDTH (UDP_REG_SRC_WIDTH)
> ) rate_limiter
> (
> .out_data (rate_limiter_out_data[i]),
> .out_ctrl (rate_limiter_out_ctrl[i]),
> .out_wr (rate_limiter_out_wr[i]),
> .out_rdy (rate_limiter_out_rdy[i]),
>
> .in_data (rate_limiter_in_data[i]),
> .in_ctrl (rate_limiter_in_ctrl[i]),
> .in_wr (rate_limiter_in_wr[i]),
> .in_rdy (rate_limiter_in_rdy[i]),
>
> // --- Register interface
> .reg_req_in (rate_limiter_in_reg_req[i]),
> .reg_ack_in (rate_limiter_in_reg_ack[i]),
> .reg_rd_wr_L_in (rate_limiter_in_reg_rd_wr_L[i]),
> .reg_addr_in (rate_limiter_in_reg_addr[i]),
> .reg_data_in (rate_limiter_in_reg_data[i]),
> .reg_src_in (rate_limiter_in_reg_src[i]),
>
> .reg_req_out (rate_limiter_in_reg_req[i+1]),
> .reg_ack_out (rate_limiter_in_reg_ack[i+1]),
> .reg_rd_wr_L_out (rate_limiter_in_reg_rd_wr_L[i+1]),
> .reg_addr_out (rate_limiter_in_reg_addr[i+1]),
> .reg_data_out (rate_limiter_in_reg_data[i+1]),
> .reg_src_out (rate_limiter_in_reg_src[i+1]),
>
> // --- Misc
> .clk (clk),
> .reset (reset));
> end // block: gen_rate_limiters
> endgenerate
>
> defparam gen_rate_limiters[0].rate_limiter.RATE_LIMIT_BLOCK_TAG = `RATE_LIMIT_0_BLOCK_TAG;
> defparam gen_rate_limiters[1].rate_limiter.RATE_LIMIT_BLOCK_TAG = `RATE_LIMIT_1_BLOCK_TAG;
> defparam gen_rate_limiters[2].rate_limiter.RATE_LIMIT_BLOCK_TAG = `RATE_LIMIT_2_BLOCK_TAG;
> defparam gen_rate_limiters[3].rate_limiter.RATE_LIMIT_BLOCK_TAG = `RATE_LIMIT_3_BLOCK_TAG;
>
> //--- Connect the wires from the rate limiters
> assign out_data_0 = rate_limiter_out_data[0];
> assign out_ctrl_0 = rate_limiter_out_ctrl[0];
> assign out_wr_0 = rate_limiter_out_wr[0];
> assign rate_limiter_out_rdy[0] = out_rdy_0;
>
> assign out_data_2 = rate_limiter_out_data[1];
> assign out_ctrl_2 = rate_limiter_out_ctrl[1];
> assign out_wr_2 = rate_limiter_out_wr[1];
> assign rate_limiter_out_rdy[1] = out_rdy_2;
>
> assign out_data_4 = rate_limiter_out_data[2];
> assign out_ctrl_4 = rate_limiter_out_ctrl[2];
> assign out_wr_4 = rate_limiter_out_wr[2];
> assign rate_limiter_out_rdy[2] = out_rdy_4;
>
> assign out_data_6 = rate_limiter_out_data[3];
> assign out_ctrl_6 = rate_limiter_out_ctrl[3];
> assign out_wr_6 = rate_limiter_out_wr[3];
> assign rate_limiter_out_rdy[3] = out_rdy_6;
>
> assign udp_reg_req_in = rate_limiter_in_reg_req[4];
> assign udp_reg_ack_in = rate_limiter_in_reg_ack[4];
> assign udp_reg_rd_wr_L_in = rate_limiter_in_reg_rd_wr_L[4];
> assign udp_reg_addr_in = rate_limiter_in_reg_addr[4];
> assign udp_reg_data_in = rate_limiter_in_reg_data[4];
> assign udp_reg_src_in = rate_limiter_in_reg_src[4];

Above: Add the rate limiter modules on each output port going to an Ethernet Tx queue. The register ring goes through each of the rate limiter modules. Note the defparams used to assign a register block for each rate limiter module. More on that is coming later.

Now all that is left is telling the build system that we want to include the rate limiter in compilation. All the library modules that a project uses are found in projects/<project_name>/include/lib_modules.txt. For the rate limited router, we will have the rate limiter module as well as all the modules that are normally used by the reference router. The modified lib_modules.txt looks as follows, where line 16 was added:

    1  io_queues/cpu_dma_queue
2 io_queues/ethernet_mac
3 input_arbiter/rr_input_arbiter
4 nf2/generic_top
5 nf2/reference_core
6 user_data_path/generic_cntr_reg
7 output_port_lookup/cam_router
8 output_queues/sram_rr_output_queues
9 sram_arbiter/sram_weighted_rr
10 user_data_path/reference_user_data_path
11 io/mdio
12 cpci_bus
13 dma
14 user_data_path/udp_reg_master
15 io_queues/add_rm_hdr
16 rate_limiter

Note that the reference_user_data_path module is still mentioned in the lib_modules.txt even though we are overriding one of the files. This is because we would still like to use the other files that are in that module. The build environment will automatically handle the override and make sure that it uses the file in the project src dir instead of the library one. Also note that the build system only handles source files that are directlys under the src dir in the project directory and one level below it. So if you put a file under rate_limited_router/src/udp/some_dir/some_file.v it will not be included in the build.

All these library modules can be found under NF2/projects/lib/verilog. Some library modules are only used for simulation, such as the testbench module. Some libraries offer alternatives such as the user_data_path module. If you look under NF2/lib/verilog/user_data_path, you will find two directories, one has the user data path used for the buffer monitoring router, and one used in the reference router. If you have the full source package (the teacher/researcher package) you will also find an alternative for the output _port_lookup module: The cam_router which is a router using a CAM to perform LPM lookups and the learning_cam_switch which makes the NetFPGA act as a four-port learning switch. Note that the library modules that are not used in the reference NIC design and the reference IPv4 Router design have not received the same thorough testing as those that are. This includes the learning_cam_switch module and the rate_limiter module.

Adding New Sources

If you wish to add new source code (i.e. not from the library) to your project, you can simply put the new verilog files under the src dir in your project. You could also put them in a directory one level below the src directory. They will automatically be included for synthesis and simulation. Note that in case some files are only used for simulation, you will have to encapsulate the unsynthesizable code with synthesis translate_off and synthesis translate_on directives.

To add IP cores generated with the Xilinx's tool, you can do one of two things:

Copy .xco file (recommended)
Simply copy the .xco file from the directory where you generated the IP core to the <project_name>/synth directory. The build environment will automatically regenerate the .v wrapper for simulation/synthesis and the .ngc or .edn needed for synthesis.
Copy the .ngc and .v files
Copy the generated .v files to your <project_name>/src directory and the .ngc/.edn to your <project_name>/synth directory. This will avoid reimplementing the core.

If you wish to add registers, you can follow the template in the rate_limiter module (simple but not efficient) or the template in the cam_router module (complex, but very efficient using block ram). Macros (such as those used for the register addresses) can be defined in one of two places:

Local Macros
If your macros will be used only in your project, it is preferable that they be defined only in your project. This can be done easily since any verilog files in the include directory will be automatically included in simulation and synthesis. In the example that follows we will show how this is done.
Global Macros
If your macros are used globally, then you should place them in the NF2/lib/verilog/common/src21/udp_defines.v file. Make sure that you follow the format of the already defined register addresses there.

Register addresses are automagically (not quite) written to the NF2/lib/C/common/reg_defines.h file when a design is simulated. So to generate the reg_defines.h file with any new register addresses, you will need to simulate the project (the simulation doesn't have to do anything except start and end.

As an example, we will add register addresses for the rate_limiter modules that we have just added. Note that there are addresses defined in udp_defines.v for a single rate limiter, but we wish to extend that to multiple rate limiters. So, create rate_limit_include.v under rate_limited_router/include/. The file can be found here. We will go through the file below.

Lines 44-51 shown below are the block addresses of the predefined blocks. These can be found in the udp_defines.v file and we should not use the same block addresses. Note that there are different block sizes and each has it unique set of addresses so the OQ_BLOCK_ADDR does not conflict with the OP_LUT_BLOCK_ADDR since they are different sizes.

   44	 // --- Block register addresses
45 `define OP_LUT_BLOCK_ADDR `OP_LUT_BLOCK_ADDR_WIDTH'h1
46 `define IN_ARB_BLOCK_ADDR `IN_ARB_REG_ADDR_WIDTH'h2
47 `define EVT_CAP_BLOCK_ADDR `EVT_CAP_REG_ADDR_WIDTH'h3
48 `define RATE_LIMIT_BLOCK_ADDR `RATE_LIMIT_REG_ADDR_WIDTH'h4
49 `define DELAY_BLOCK_ADDR `DELAY_REG_ADDR_WIDTH'h5
50 `define SWITCH_OP_LUT_BLOCK_ADDR `SWITCH_OP_LUT_BLOCK_ADDR_WIDTH'h6
51 `define OQ_BLOCK_ADDR `OQ_BLOCK_ADDR_WIDTH'h1

In lines 62-73 we define new macros for the register addresses. Line 63 sets the width of an internal register address, and here we have decided to use a block that has 64 addresses in it for each block. Line 64 sets the width of the address of the whole block. Lines 66-69 define the address of each 64-address block. These should not conflict with the block addresses from above. And lines 70-73 define the tag against which the rate_limit_register file should match against to know if the register requested is its own.

   62	/* -----\/----- EXCLUDED -----\/-----
63 `define RATE_LIMIT_REG_ADDR_WIDTH `UDP_BLOCK_SIZE_64_REG_ADDR_WIDTH
64 `define RATE_LIMIT_BLOCK_ADDR_WIDTH `UDP_BLOCK_SIZE_64_BLOCK_ADDR_WIDTH
65 -----/\----- EXCLUDED -----/\----- */
66 `define RATE_LIMITER_0_BLOCK_ADDR `RATE_LIMIT_REG_ADDR_WIDTH'h7
67 `define RATE_LIMITER_1_BLOCK_ADDR `RATE_LIMIT_REG_ADDR_WIDTH'h8
68 `define RATE_LIMITER_2_BLOCK_ADDR `RATE_LIMIT_REG_ADDR_WIDTH'h9
69 `define RATE_LIMITER_3_BLOCK_ADDR `RATE_LIMIT_REG_ADDR_WIDTH'ha
70 `define RATE_LIMITER_0_BLOCK_TAG ({`UDP_BLOCK_SIZE_64_TAG, `RATE_LIMITER_0_BLOCK_ADDR})
71 `define RATE_LIMITER_1_BLOCK_TAG ({`UDP_BLOCK_SIZE_64_TAG, `RATE_LIMITER_1_BLOCK_ADDR})
72 `define RATE_LIMITER_2_BLOCK_TAG ({`UDP_BLOCK_SIZE_64_TAG, `RATE_LIMITER_2_BLOCK_ADDR})
73 `define RATE_LIMITER_3_BLOCK_TAG ({`UDP_BLOCK_SIZE_64_TAG, `RATE_LIMITER_3_BLOCK_ADDR})

Lines 83-84 are the internal addresses used inside the rate limiter register block. You can look at rate_limit_regs.v for more details.

   82	 // --- Rate limiter registers
83 `define RATE_LIMIT_ENABLE `RATE_LIMIT_REG_ADDR_WIDTH'h0
84 `define RATE_LIMIT_SHIFT `RATE_LIMIT_REG_ADDR_WIDTH'h1

Lines 93-103 define the addresses that will be visible externally to any software that is running on the host.

   93	 `define RATE_LIMIT_0_ENABLE_REG                (`UDP_BASE_ADDRESS | {`RATE_LIMITER_0_BLOCK_TAG, `RATE_LIMIT_ENABLE})
94 `define RATE_LIMIT_0_SHIFT_REG (`UDP_BASE_ADDRESS | {`RATE_LIMITER_0_BLOCK_TAG, `RATE_LIMIT_SHIFT})
95
96 `define RATE_LIMIT_1_ENABLE_REG (`UDP_BASE_ADDRESS | {`RATE_LIMITER_1_BLOCK_TAG, `RATE_LIMIT_ENABLE})
97 `define RATE_LIMIT_1_SHIFT_REG (`UDP_BASE_ADDRESS | {`RATE_LIMITER_1_BLOCK_TAG, `RATE_LIMIT_SHIFT})
98
99 `define RATE_LIMIT_2_ENABLE_REG (`UDP_BASE_ADDRESS | {`RATE_LIMITER_2_BLOCK_TAG, `RATE_LIMIT_ENABLE})
100 `define RATE_LIMIT_2_SHIFT_REG (`UDP_BASE_ADDRESS | {`RATE_LIMITER_2_BLOCK_TAG, `RATE_LIMIT_SHIFT})
101
102 `define RATE_LIMIT_3_ENABLE_REG (`UDP_BASE_ADDRESS | {`RATE_LIMITER_3_BLOCK_TAG, `RATE_LIMIT_ENABLE})
103 `define RATE_LIMIT_3_SHIFT_REG (`UDP_BASE_ADDRESS | {`RATE_LIMITER_3_BLOCK_TAG, `RATE_LIMIT_SHIFT})

Lines 112-120 define the Macro that will be used to print the user's addresses and add them to the generated reg_defines.h file.

  112	 `define PRINT_USER_REG_ADDRESSES                                                                         \
113 $fwrite(c_reg_defines_fd, "#define RATE_LIMIT_0_ENABLE_REG 0x%07x\n", `RATE_LIMIT_0_ENABLE_REG<<2); \
114 $fwrite(c_reg_defines_fd, "#define RATE_LIMIT_0_SHIFT_REG 0x%07x\n\n", `RATE_LIMIT_0_SHIFT_REG<<2); \
115 $fwrite(c_reg_defines_fd, "#define RATE_LIMIT_1_ENABLE_REG 0x%07x\n", `RATE_LIMIT_1_ENABLE_REG<<2); \
116 $fwrite(c_reg_defines_fd, "#define RATE_LIMIT_1_SHIFT_REG 0x%07x\n\n", `RATE_LIMIT_1_SHIFT_REG<<2); \
117 $fwrite(c_reg_defines_fd, "#define RATE_LIMIT_2_ENABLE_REG 0x%07x\n", `RATE_LIMIT_2_ENABLE_REG<<2); \
118 $fwrite(c_reg_defines_fd, "#define RATE_LIMIT_2_SHIFT_REG 0x%07x\n\n", `RATE_LIMIT_2_SHIFT_REG<<2); \
119 $fwrite(c_reg_defines_fd, "#define RATE_LIMIT_3_ENABLE_REG 0x%07x\n", `RATE_LIMIT_3_ENABLE_REG<<2); \
120 $fwrite(c_reg_defines_fd, "#define RATE_LIMIT_3_SHIFT_REG 0x%07x\n\n", `RATE_LIMIT_3_SHIFT_REG<<2)

As was said before, when simulated, this will generate the reg_defines.h that can be used in C code and the NF2/lib/Perl5/reg_defines.ph file which can be used in Perl code.

To enable the rate limited, you need to write 1 into one of the RATE_LIMIT_X_ENABLE_REG registers. The RATE_LIMIT_X_SHIFT_REG registers controls the transmission rate exponentially. For more details, refer to the Verilog code.

Simulating the design

The next step after writing the code is to simulate the design. We will use the same testbenches used for the router. These testbenches are defined under NF2/projects/rate_limited_router/verif. Each directory contains a description of the packets to send, when to send them and which packets we expect to come out of the NetFPGA whether via DMA or Ethernet. The simulation environment also allows you to specify register reads and writes. Each testbench consists of a three files:

config.txt
specifies when a simulation should end.
run
generates packets and runs the simulation. Usually this should not be modified except for unusual circumstances such as simulation time parameter definitions.
make_pkts.pl
Perl script that generates all the packets and register requests.

test_router_short/make_pkts.pl is shown below ready for dissection:

File header. Nothing fancy.

    1	#!/usr/local/bin/perl -w
2 # make_pkts.pl
3 #
4 #

Select the libraries to be used for simulation. All theses libraries provide functions to create/send/expect packets and to generate register requests. In particular, the NF21RouterLib encapsulates the nf_PCI_regread and the nf_PCI_regwrite functions which access registers to provide router specific functionality. These libraries and some others that are used for real hardware (as opposed to simulation) testing can be found under NF2/lib/Perl5/.

    7	use NF2::PacketGen;
8 use NF2::PacketLib;
9 use NF21RouterLib;

Include the file that defines all register addresses.

   11	require "reg_defines.ph";

Set the delay to send a packet. More information can be found in the comments and documentation in the libraries.

   13	$delay = 2000;

Required lines to initiate the libraries.

   14	$batch = 0;
15 nf_set_environment( { PORT_MODE => 'PHYSICAL', MAX_PORTS => 4 } );
16
17 # use strict AFTER the $delay, $batch and %reg are declared
18 use strict;
19 use vars qw($delay $batch %reg);

Define some variables for the tests:

   21	my $ROUTER_PORT_1_MAC = '00:ca:fe:00:00:01';
22 my $ROUTER_PORT_2_MAC = '00:ca:fe:00:00:02';
23 my $ROUTER_PORT_3_MAC = '00:ca:fe:00:00:03';
24 my $ROUTER_PORT_4_MAC = '00:ca:fe:00:00:04';
25
26 my $ROUTER_PORT_1_IP = '192.168.1.1';
27 my $ROUTER_PORT_2_IP = '192.168.2.1';
28 my $ROUTER_PORT_3_IP = '192.168.3.1';
29 my $ROUTER_PORT_4_IP = '192.168.4.1';
30
31 my $next_hop_1_DA = '00:fe:ed:01:d0:65';
32 my $next_hop_2_DA = '00:fe:ed:02:d0:65';
33

Initiate the DMA:

   34	# Prepare the DMA and enable interrupts
35 prepare_DMA('@3.9us');
36 enable_interrupts(0);

Write the registers to setup routes through the router:

   38	# Write the ip addresses and mac addresses, routing table, filter, ARP entries
39 $delay = '@4us';
40 set_router_MAC(1, $ROUTER_PORT_1_MAC);
41 $delay = 0;
42 set_router_MAC(2, $ROUTER_PORT_2_MAC);
43 set_router_MAC(3, $ROUTER_PORT_3_MAC);
44 set_router_MAC(4, $ROUTER_PORT_4_MAC);
45
46 add_dst_ip_filter_entry(0,$ROUTER_PORT_1_IP);
47 add_dst_ip_filter_entry(1,$ROUTER_PORT_2_IP);
48 add_dst_ip_filter_entry(2,$ROUTER_PORT_3_IP);
49 add_dst_ip_filter_entry(3,$ROUTER_PORT_4_IP);
50
51 add_LPM_table_entry(0,'171.64.2.0', '255.255.255.0', '171.64.2.1', 0x04);
52 add_LPM_table_entry(15, '0.0.0.0', '0.0.0.0', '171.64.1.1', 0x01);
53
54 # Add the ARP table entries
55 add_ARP_table_entry(0, '171.64.1.1', $next_hop_1_DA);
56 add_ARP_table_entry(1, '171.64.2.1', $next_hop_2_DA);
57
58 my $length = 100;
59 my $TTL = 30;
60 my $DA = 0;
61 my $SA = 0;
62 my $dst_ip = 0;
63 my $src_ip = 0;
64 my $pkt;
65

Send the first packet into port 1 (MAC port 0). Note that the ports in the simulation libraries are all defined starting from 1 as opposed to 0.

   70	$delay = '@80us';
71 $length = 64;
72 $DA = $ROUTER_PORT_1_MAC;
73 $SA = '01:55:55:55:55:55';
74 $dst_ip = '171.64.2.7';
75 $src_ip = '171.64.8.1';

Create the packet:

   76	$pkt = make_IP_pkt($length, $DA, $SA, $TTL, $dst_ip, $src_ip);

Send it in:

   77	nf_packet_in(1, $length, $delay, $batch,  $pkt);

Create the packet that we expect to see coming out of port 2 (MAC port 1):

   79	$DA = $next_hop_2_DA;
80 $SA = $ROUTER_PORT_2_MAC;
81 $pkt = make_IP_pkt($length, $DA, $SA, $TTL-1, $dst_ip, $src_ip);
82 nf_expected_packet(2, $length, $pkt);

Create a new packet from a different port that is destined for the router itself. The packet should be sent to the CPU via DMA.

   88	$length = 60;
89 $DA = $ROUTER_PORT_2_MAC;
90 $SA = '02:55:55:55:55:55';
91 $dst_ip = $ROUTER_PORT_1_IP;
92 $src_ip = '171.64.9.1';
93 $pkt = make_IP_pkt($length, $DA, $SA, $TTL, $dst_ip, $src_ip);
94 nf_packet_in(2, $length, '@82us', $batch, $pkt);

Specify that we expect the packet to come on DMA port 2 (a.k.a nf2c1):

   96	nf_expected_dma_data(2, $length, $pkt);

Now send a packet out from nf2c1. This also says that we should expect the same packet to come out of MAC port 1:

   98	$delay = '@100us';
99 PCI_create_and_send_pkt(2, $length);

The rest of the lines till 163 test different size packets. After that there's some code that generates all this info and puts it in files. The rest should not be changed in general:

  163	# *********** Finishing Up - need this in all scripts ! ****************************
164 my $t = nf_write_sim_files();
165 print "--- make_pkts.pl: Generated all configuration packets.\n";
166 printf "--- make_pkts.pl: Last packet enters system at approx %0d microseconds.\n",($t/1000);
167 if (nf_write_expected_files()) {
168 die "Unable to write expected files\n";
169 }
170
171 nf_create_hardware_file('LITTLE_ENDIAN');
172 nf_write_hardware_file('LITTLE_ENDIAN');

Now to run the simulation, we need to make sure that our environment is set. Make sure that your NF2_ROOT environment variable points to the path where the NF2 directory is (e.g. ~/NF2) and that your NF2_DESIGN_DIR points to ${NF2_ROOT}/projects/rate_limited_router. Also make sure that you are sourcing the settings from ${NF2_ROOT}/bin/nf2_profile or ${NF2_ROOT}/bin/nf2_cshrc depending on your shell. Of course we will need to have Modelsim installed. To run the simulations:

nf21_run_test.pl --major router --minor short

The test should then generate the packets and register requests and run the simulation in console mode. When the test is done, it will search for the word ERROR in the log to find out if an error occurred and let you know. If you would like to run the test in a GUI to view the waveforms and have complete control over the simulation, you can add the --gui switch to the command. For more information on other options type:

nf21_run_test.pl --help

After you run the simulation you should see output similar to what is seen below (this is only the last part):

...
# Timecheck: 493645.00ns
# 500100 Simulation has reached finish time - ending.
# ** Note: $finish  : /home/jnaous/mckeown/NF2/new_tree/lib/verilog/testbench/target32.v(616)
# Time: 500100 ns Iteration: 0 Instance: /testbench/target32
--- Simulation is complete. Validating the output.
Comparing simulation output for port 1 ...
Port 1 matches [0 packets]
Comparing simulation output for port 2 ...
Port 2 matches [4 packets]
Comparing simulation output for port 3 ...
Port 3 matches [0 packets]
Comparing simulation output for port 4 ...
Port 4 matches [0 packets]
Comparing simulation output for DMA queue 1 ...
DMA queue 1 matches [0 packets]
Comparing simulation output for DMA queue 2 ...
DMA queue 2 matches [2 packets]
Comparing simulation output for DMA queue 3 ...
DMA queue 3 matches [0 packets]
Comparing simulation output for DMA queue 4 ...
DMA queue 4 matches [0 packets]
--- Test PASSED
Test test_router_short passed!
------------SUMMARY---------------
PASSING TESTS:
test_router_short
FAILING TESTS:
TOTAL: 1 PASS: 1 FAIL: 0

Implementing the Design

Implementing the design is a very simple process:

cd rate_limited_router/synth
make

When make is done, you should have a bitfile called nf2_top_par.bit that should be downloaded to the NetFPGA to run. To download it, use:

nf2_download ./nf2_top_par.bit

The synthesis process uses Smartguide by default when rebuilding a project. In cases where the netlist is changed dramatically between synthesis runs, or where the place and route process does not manage to route the nets to meet timing, Smartguide will produce results that fail to meet timing or will take a very very long time to finish (and still fail to meet timing). In these cases, you can disable Smartguide by adding the following line before the include in the synth/Makefile:

USE_SMARTGUIDE := 0

Also, by default, map uses the timing<code> switch to improve timing. This can in some cases lead to weird errors during the map process. To disable the use of the <code>timing switch, add the following line to <project_name>/synth/Makefile:

TIMING := 0

You can set these switches to 1 or just comment them out to re-enable Smartguide or timing.

Testing the New Router

A full Perl library is available to test the actual hardware. The library NF2/lib/Perl5/TestLib.pm contains many functions that can be used to send/receive packets and read/write registers. For more information on using these libraries, you can look at the various regression tests under NF2/projects/reference_router/regress as well as looking through the library code to see which functions are available and read the comments to know what they do. We invite you to help us by expanding this section or submitting patches that provide additional documentation on using the Perl library.

Buffer Monitoring System

Introduction

In order to record the development of buffer sizes in switches, a system is needed to record arrival and departure of packets into the queue. The buffer monitoring subsystem is designed to meet this need. The subsystem works by monitoring events such as arrival/departure/drop of a packet on the queues, then recording a Timestamp as well as the length of the packet. These events are then output in a packet with a certain format called the “event packet”.

Using the system

The system has two software components: monitor_ctrl and rcv_evts. These two components control the buffer monitoring system and record the events at the receiver respectively.

monitor_ctrl

The monitor_ctrl program is found in NF2/projects/router_buffer_sizing/sw. It implements all the functions to control and view the status of the buffer monitoring system. The program works by reading and changing the control register of the buffer monitoring system. In addition, several read-only registers were added to observe the behavior of the subsystem. Executing “monitor_ctrl -h” will show available options. “monitor_ctrl -p” will print the status.


For more information on buffer monitoring registers please see the Register Map.

rcv_evts

The rcv_evts program is supposed to record incoming evts, parse them, and display the output to stdout. It can be used as a basis on which to do more interesting things with the event packet information (such as build the queue occupancy in time). To use the program, simply run:

rcv_evts -i nf2c3 -v

where nf2c3 is the interface receiving the event packets.

Design Details

The system is composed of two main modules: evt_rcrdr and evt_pkt_wrtr. The evt_rcrdr module records individual events as they happen and serializes them to be sent out to the evt_pkt_wrtr module. The evt_pkt_wrtr then reads each event and stores it, and then when the send_egress_pkt is ready, it sends out a complete event packet.

evt_rcrdr

On every clock cycle, there are 4 types of possible events: A packet stored, a packet removed, a packet dropped, and a Timestamp event. The first three event types we call “short events” because they only need 32 bits. Whereas the Timestamp is a “long event” since it uses 64 bits. The short events carry the 19 least significant bits of the clock. Periodically, a Timestamp event is signaled and recorded. The evt_rcrdr rearranges the events as they come in and stores them in a single clock cycle into the event fifo. The event fifo is a shallow fifo (depth=8) with a variable input size. The input is composed of five 32-bit words of which we can store a variable number of words (the first x words are stored). This is the number of events at the current clock cycle (the Timestamp event takes 2 words). Note that the evt_rcrdr assumes that the events are all independent. The maximum sustained event recording capability is 62.5 million events per second, whereas the peak event recording capability is 8 events in any 32 ns interval (after which events should not go over the average or events would be lost). It is possible to adapt the evt_rcrdr to record any signal that meets these requirements with an additional field of 9-bits (usually the length field) to record any additional data. In addition, the evt_rcrdr make no assumption on the sizes of the fields. These could be modified by changing the sizes in unet_defines.v. However, note that the evt_pkt_wrtr does make the assumption that the word sizes are 32 bits since that would simplify writing to the send_egress_pkt. However, this could be easily adapted as well.

evt_pkt_wrtr

This module reads events from the evt_rcrdr module and delineates them into packets. The module also monitors the datapath for activity and only injects event packets when the datapath is idle.

Schematic

The system follows the reference user data path.

Image:schematic.gif

Packet Format

Following the Ethernet, IP, and UDP headers, there are 4 reserved bits, then 4 bits indicating the version of the buffer monitoring system, then 8 bits indicating the number of event types excluding the Timestamp event type. This is followed by a 32 bit sequence number (starting at 0) and a list of the queue sizes before the first short event. A series of short events then follows. Periodic Time Stamps make it possible to keep track of the exact time of each event.

Image:Pkt_format.gif

There are four event types:

1- Arrival/Departure/Drop (short) Events. Note that the packet length here is in words of 64-bits. The default time resolution is 8ns.

2- Timestamp Event: Periodically recorded to keep the time updated.

Image:evt_format.gif

Links

Schematic and board layout

The schematic of the NetFPGA board is available as both a PDF and an Orcad Capture file:

An Allegro layout of the board is also available:

Contributed Packages

Packages contributed by Alpha testers and Beta Users

Tutorial Setup

Contains Tutorial setup instructions

Other Pages

License

The NetFPGA code is distributed under a BSD-style license shown below. Please make sure you read and understand it. The design of the board itself is also available freely.

Copyright (c) 2006 The Board of Trustees of The Leland Stanford Junior University

We are making the NetFPGA tools and associated documentation (Software) available for public use and benefit with the expectation that others will use, modify and enhance the Software and contribute those enhancements back to the community. However, since we would like to make the Software available for broadest use, with as few restrictions as possible permission is hereby granted, free of charge, to any person obtaining a copy of this Software) to deal in the Software under the copyrights without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

The name and trademarks of copyright holder(s) may NOT be used in advertising or publicity pertaining to the Software or any derivatives without specific, written prior permission.

'NetFPGA' 카테고리의 다른 글

NetFPGA Installation Guide  (0) 2009.07.11