Skip to topic | Skip to bottom
Home
MAX


Start of topic | Skip to actions

Network Performance Testing

This page is intended to bring together a number of approaches and resources relating to testing and tuning end systems to improve network performance and the throughput experienced by users applications and the movement of large data files. The focus is on understanding how to tune and debug end systems to obtain the highest throughput performance possible with the systems you have. It builds upon and makes public a page developed by Chris Tracy a year ago. Viewers are encouraged to suggest additions and modifications.

Application Tools to Measure Network Performance & Increase Flows

Tuning and Measurement Links

Papers

Scientific Workshops & Projects

Chris' Tricks and Tips for Debugging Loss on Linux Systems

  • add typical tuning parameters (see below) to /etc/sysctl.conf, run sysctl -p to activate changes
  • check for TCP retransmissions: netstat -s | grep -i retrans
    • do this before and after every TCP test, check to see if counter increases
      • if TCP is retransmitting, it is likely that you will also see loss during a UDP test
    • feature will eventually be incorporated into nuttcp
    • TCP retransmissions will typically cause a big performance hit when operating at very high rates
      • it is possible to change how "fair" TCP's congestion control algorithm behaves (e.g. ctcp)
  • check for dropped packets using ifconfig: ifconfig -a | egrep -e "(^eth|drop)"
  • check detailed stats on ethernet interface: ethtool -S ethX
    • not all NIC drivers support this capability
    • e1000 and sk98lin drivers definitely have it
  • check if jumbo frames are working properly
    • router and switch interfaces along the layer2/layer3 components of the path need to be configured to support jumbo frames
    • run ping -s 8000 -M do in each direction
    • run tcpdump -vvv -n -i ethX icmp on receiver to ensure incoming packets are not fragmented
      • length of incoming packet should be around 8028 bytes, not five 1500 byte packets..
      • hosts that support the -M do option for ping should set the DF bit to prohibit fragmentation
    • if you need to fix the MTU, ifconfig ethX mtu 9000
  • make sure the txqueuelen parameter is set: ifconfig ethX txqueuelen 1000
  • show pause parameters/flow control settings & auto-negotiation settings: ethtool -a ethX
    • change settings with ethtool -A ethX
  • make sure you are using the correct window size parameter (e.g. nuttcp -w10m for a 10 megabyte window)
    • compute the bandwidth delay product (BDP) to determine the required window size given the round-trip time (RTT) and bandwidth
  • turn off TCP segmentation offloading engine: ethtool -K ethX tso off
    • keep it on though if turning off does not help..
  • netstat gives lots of useful stats, but be aware the counters are not interface specific: netstat -s
  • check the PCI bus topology with lspci -tv, show chipset details with lspci -v or lspci -vvv for even more info
    • may need to consult your motherboard manual to really understand the PCI buses in your particular system
# lspci -tv 
-[0000:00]-+-00.0  Intel Corporation E7230 Memory Controller Hub
           +-01.0-[0000:01]--
           +-1c.0-[0000:02-03]--+-00.0-[0000:03]--
           |                    \-00.1  Intel Corporation 6700/6702PXH I/OxAPIC Interrupt Controller A
           +-1c.4-[0000:04]----00.0  Intel Corporation 82573E Gigabit Ethernet Controller (Copper)
           +-1c.5-[0000:05]----00.0  Intel Corporation 82573L Gigabit Ethernet Controller
  • if you are unlucky and just see lines like Intel Corp.: Unknown device 109a you are probably running an older OS with outdated pci.ids on a machine with much newer hardware, search /usr/share/misc/pci.ids on a newer linux OS to figure out what you have
# lspci -tv
-[00]-+-00.0  Intel Corp.: Unknown device 2778
      +-01.0-[01]--
      +-1c.0-[09-0a]--+-00.0-[0a]--
      |               \-00.1  Intel Corp. PCI Bridge Hub I/OxAPIC Interrupt Controller A
      +-1c.4-[0d]----00.0  Intel Corp.: Unknown device 108c
      +-1c.5-[0e]----00.0  Intel Corp.: Unknown device 109a
# grep 109a /usr/share/misc/pci.ids | grep -i eth
        109a  82573L Gigabit Ethernet Controller
  • excessive context switching can cause loss
    • use vmstat 1 to measure context switches ('cs' column)
    • in general, run as few applications as possible on the test hosts, stop all unused daemons
    • idle systems might have 10-20 context switches per second, busier systems might have 500-600 per second
    • when a perf test is running there will be thousands to tens of thousands per second
  • unload/reload kernel driver (use lsmod, rmmod, modprobe)
    • more difficult when using the same NIC for management and data plane testing
    • alternatively, reboot the test host
  • investigate ethernet driver parameters
    • the e1000 driver has many tweakables, see the e1000 driver README for more information
    • for example, modprobe e1000 TxDescriptors=80,128
  • make sure you are using the latest version of the NIC driver
    • for example, e1000-7.6.5.tar.gz: e1000 linux driver for intel chipsets v7.6.5
    • [ add sk98lin driver details..? ]
    • one way you might find out what driver version you are currently using, for each set of kernel modules installed:
find /lib/modules/ -name 'e1000.*o' -print | xargs strings -f | grep version=
/lib/modules/2.6.12-1.1381_FC3/kernel/drivers/net/e1000/e1000.ko: version=6.0.54-k2-NAPI
/lib/modules/2.6.12-1.1381_FC3/kernel/drivers/net/e1000/e1000.ko: srcversion=9D9D89286803155050355F2
/lib/modules/2.6.12-1.1381_FC3smp/kernel/drivers/net/e1000/e1000.ko: version=6.0.54-k2-NAPI
/lib/modules/2.6.12-1.1381_FC3smp/kernel/drivers/net/e1000/e1000.ko: srcversion=9D9D89286803155050355F2
/lib/modules/2.6.9-1.667smp/kernel/drivers/net/e1000/e1000.ko: version=7.2.7-NAPI
/lib/modules/2.6.9-1.667/kernel/drivers/net/e1000/e1000.ko: version=7.2.7-NAPI
  • make sure you are running linux 2.6: uname -a
    • linux 2.4.20 kernels seemed to perform well, before that, your mileage may vary
    • if you really love *BSD and want to get line-rate with copper/fiber interfaces, see dragonflybsd
    • good luck with any other OS, nuttcp does run on windows wink
  • make sure iptables is not running: lsmod | grep -i ipt
  • make sure you aren't passing through any NAT or other traffic filters/etc that might not be able to keep up
    • what do these problems typically look like? lots of loss? re-ordering? other strangeness?
    • at 10G, you may not want your packets going through the 8021q module to send tagged frames, for example
  • try testing to a different host, or try a different NIC which uses a different driver
  • setup additional measurement points along the path to isolate the problem
  • check interface statistics on the switch/router interfaces along the path
    • look for CRC errors, input/output queue drops, etc: sh int gi1/0/2 (for most devices)
    • watch to see if the counters increase after a perf test — could indicate dirty fiber, not enough buffering, etc
    • ideally these kinds of counters should stay at zero:
     0 symbol errors, 0 runts, 0 giants, 0 throttles
     0 CRC, 0 IP Checksum, 0 overrun, 0 discarded
      ...
     Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  • test memory bandwidth/cpu by running a local test where the transmitter and receiver are on the same host
    • nuttcp is capable of this, but be aware that the machine is doing double-duty since it is both a sender and receiver
    • this test does not put any packets on the wire, but it does help to measure what the host/OS is capable of
clpk-es1:~# ./nuttcp -w1m 192.168.98.43
12695.0625 MB /  10.00 sec = 10645.0237 Mbps 100 %TX 91 %RX
  • check for runaway processes that might be eating up lots of memory or CPU, here is an example:
$ ps auxwww|grep wxvlc
1000     10192  0.0  0.0   2884   760 pts/7    S+   21:04   0:00 grep wxvlc
1000     16932 15.3  0.4 1099360 8944 ?        Sl   Oct26 961:59 wxvlc
1000     18491 66.1 71.7 1615732 1488712 ?     Sl   Oct26 4142:39 wxvlc

Typical tuning parameteres

  • these values are a bit overkill for most applications, especially GigE?, it will cause excessive memory utilization for things like SSH
  • experiments have also shown that turning off SACK, even for 10GigE, is not beneficial...

# some of the defaults may be different for your kernel
# call this file with sysctl -p <this file>
# these are just suggested values that worked well to increase throughput in
# several network benchmark tests, your mileage may vary

### IPV4 specific settings
# turns TCP timestamp support off, default 1, reduces CPU use
net.ipv4.tcp_timestamps = 0
# turn SACK support off, default on -- you probably only want to do this at 10GigE
#net.ipv4.tcp_sack = 0
# on systems with a VERY fast bus -> memory interface this is the big gainer
# sets min/default/max TCP read buffer, default 4096 87380 174760
# setting to 100M - 10M is too small for cross country (chsmall)
net.ipv4.tcp_rmem = 150000000 150000000 150000000
# sets min/pressure/max TCP write buffer, default 4096 16384 131072
net.ipv4.tcp_wmem = 150000000 150000000 150000000
# sets min/pressure/max TCP buffer space, default 31744 32256 32768
net.ipv4.tcp_mem = 150000000 150000000 150000000

### CORE settings (mostly for socket and UDP effect)
# maximum receive socket buffer size, default 131071
net.core.rmem_max = 75000000
# maximum send socket buffer size, default 131071
net.core.wmem_max = 75000000
# default receive socket buffer size, default 65535
net.core.rmem_default = 2524287
# default send socket buffer size, default 65535
net.core.wmem_default = 2524287
# maximum amount of option memory buffers, default 10240
net.core.optmem_max = 2524287
# number of unprocessed input packets before kernel starts dropping them, default 300
net.core.netdev_max_backlog = 300000

Test results and real-world examples

  • establish the RTT for these tests (CLPK--MCLN):

clpk-es1:~# ping -c 3 -s 8000 192.168.98.44
PING 192.168.98.44 (192.168.98.44) 8000(8028) bytes of data.
8008 bytes from 192.168.98.44: icmp_seq=1 ttl=64 time=1.65 ms
8008 bytes from 192.168.98.44: icmp_seq=2 ttl=64 time=1.58 ms
8008 bytes from 192.168.98.44: icmp_seq=3 ttl=64 time=1.49 ms
--- 192.168.98.44 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 1.490/1.576/1.657/0.082 ms

  • establish whether there are any layer-3 routers along the path
    • this example uses a dedicated circuit or 'lightpath' provided by a VLAN across a 10G lambda between College Park MD and McLean VA on DRAGON, so it is only one hop away
    • it goes through a Raptor at College Park, a Raptor at McLean, to the HOPI Force10 in McLean and finally to an end system
clpk-es1:~# traceroute 192.168.98.44
traceroute to 192.168.98.44 (192.168.98.44), 30 hops max, 40 byte packets
 1  192.168.98.44 (192.168.98.44)  0.867 ms  0.868 ms  0.839 ms

  • the hosts used in this test are not identical
    • clpk-es1.dragon.maxgigapop.net is an Aberdeen/Supermicro running Debian 4.0 (linux 2.6.18) with 1GB RAM
      • NIC: on-board copper interface (Intel 82573 PRO/1000)
    • wash-pc2.hopi.internet2.edu is an HP running Red Hat Enterprise (linux 2.6.17.6-web100) with 4GB RAM
      • NIC: on-board copper interface (Tigon3 tg3 driver, Broadcom Corporation NetXtreme BCM5704 chipset)

Quick, 15-second tests

  • UDP test, as fast as possible over GigE VLAN, totally clean in both directions (no dropped packets) for 15 seconds, printing stats every 3 seconds:

clpk-es1:~# ./nuttcp -w1m -i3 -T15 -u -Ri1000M 192.168.98.44
  354.9922 MB /   3.01 sec =  990.9267 Mbps     0 / 45439 ~drop/pkt  0.00 ~%loss
  354.4688 MB /   3.00 sec =  991.0436 Mbps     0 / 45372 ~drop/pkt  0.00 ~%loss
  354.5312 MB /   3.00 sec =  991.0396 Mbps     0 / 45380 ~drop/pkt  0.00 ~%loss
  354.0000 MB /   3.00 sec =  991.0420 Mbps     0 / 45312 ~drop/pkt  0.00 ~%loss
  354.4766 MB /   3.00 sec =  991.0426 Mbps     0 / 45373 ~drop/pkt  0.00 ~%loss
 1773.5781 MB /  15.01 sec =  991.0186 Mbps 85 %TX 7 %RX 0 / 227018 drop/pkt 0.00 %loss

clpk-es1:~# ./nuttcp -w1m -i3 -T15 -u -Ri1000M -r 192.168.98.44
  355.2422 MB /   3.01 sec =  991.3953 Mbps     0 / 45471 ~drop/pkt  0.00 ~%loss
  354.6016 MB /   3.00 sec =  991.5309 Mbps     0 / 45389 ~drop/pkt  0.00 ~%loss
  354.6094 MB /   3.00 sec =  991.5524 Mbps     0 / 45390 ~drop/pkt  0.00 ~%loss
  354.6094 MB /   3.00 sec =  991.5481 Mbps     0 / 45390 ~drop/pkt  0.00 ~%loss
  354.6250 MB /   3.00 sec =  991.5994 Mbps     0 / 45392 ~drop/pkt  0.00 ~%loss
 1775.2969 MB /  15.02 sec =  991.5185 Mbps 99 %TX 5 %RX 0 / 227238 drop/pkt 0.00 %loss

  • TCP test, as fast as possible over GigE VLAN, totally clean in both directions (no TCP retransmissions) for 15 seconds, printing stats every 3 seconds:

clpk-es1:~# netstat -s | grep -i trans
    15 segments retransmited

clpk-es1:~# ./nuttcp -w1m -i3 -T15 192.168.98.44
  354.1848 MB /   3.00 sec =  989.7925 Mbps
  354.3481 MB /   3.00 sec =  990.8520 Mbps
  354.3562 MB /   3.00 sec =  990.8805 Mbps
  354.8318 MB /   3.00 sec =  990.8852 Mbps
  354.3589 MB /   3.00 sec =  990.8778 Mbps
 1773.4375 MB /  15.02 sec =  990.6603 Mbps 6 %TX 11 %RX

clpk-es1:~# ./nuttcp -w1m -i3 -T15 -r 192.168.98.44
  354.2385 MB /   3.00 sec =  989.8288 Mbps
  354.3818 MB /   3.00 sec =  990.9162 Mbps
  354.3762 MB /   3.00 sec =  990.8912 Mbps
  354.8875 MB /   3.00 sec =  991.0172 Mbps
  354.3809 MB /   3.00 sec =  990.9131 Mbps
 1773.4355 MB /  15.02 sec =  990.7109 Mbps 12 %TX 6 %RX

clpk-es1:~# netstat -s | grep -i trans
    15 segments retransmited

Longer, 5-minute tests

  • If you really have well-performing hosts, you should be able to get clean performance for much longer durations:

clpk-es1:~# ./nuttcp -w1m -i60 -T300 -u -Ri1000M -r 192.168.98.44
 7092.8516 MB /  60.01 sec =  991.5409 Mbps     0 / 907885 ~drop/pkt  0.00 ~%loss
 7092.1562 MB /  60.00 sec =  991.5476 Mbps     0 / 907796 ~drop/pkt  0.00 ~%loss
 7091.6953 MB /  60.00 sec =  991.5490 Mbps     0 / 907737 ~drop/pkt  0.00 ~%loss
 7092.6484 MB /  60.00 sec =  991.5506 Mbps     0 / 907859 ~drop/pkt  0.00 ~%loss
35461.3281 MB / 300.01 sec =  991.5463 Mbps 99 %TX 5 %RX 0 / 4539050 drop/pkt 0.00 %loss

  • a 5-minute TCP test showing some retransmissions but TCP still performed quite well:

clpk-es1:~# netstat -s | grep -i trans
    45 segments retransmited
    10 times recovered from packet loss due to fast retransmit
    30 fast retransmits

clpk-es1:~# ./nuttcp -w1m -i60 -T300 192.168.98.44
 7080.1045 MB /  60.00 sec =  989.8027 Mbps
 7084.4465 MB /  60.00 sec =  990.5035 Mbps
 7084.9639 MB /  60.00 sec =  990.5104 Mbps
 7080.3718 MB /  60.00 sec =  989.8690 Mbps
 7078.8157 MB /  59.99 sec =  989.7820 Mbps
35410.3750 MB / 300.02 sec =  990.0939 Mbps 6 %TX 10 %RX

clpk-es1:~# netstat -s | grep -i trans
    75 segments retransmited
    20 times recovered from packet loss due to fast retransmit
    60 fast retransmits

Known Issues

  • Bill Fink has identified problems with cheaper 10GigE switches which appears to be a lack of buffering capability when there is a situation where packets are coming in from a 10GigE interface and are needing to be buffered before going out a GigE interface
    • 10G-end-host--10G-switch--!GigE-trunk--10G-switch--10G-end-host
    • or 10G-end-host--10G-switch--!GigE-end-host
    • the switch may not have enough buffer when the 10G host is sending at GigE speeds

  • There is a jumbo frame problem with older Force10 EtherScale 10GigE linecards, discovered during SC2005 in Seattle WA
    • jumbo frames will work in one direction, but not the other
    • run this command on affected line cards: reset linecard # hard
    • supposedly this is fixed in a newer FTOS release

Future Work

  • how to measure and plot jitter/inter-packet interval
    • using udpmon or tcpdump -ttt

  • perf test automation and plotting

  • how to switch TCP congestion control algorithms
    • iperf 2.0.4 has command-line switch to do this
    • you can also switch the algorithm by tweaking this variable (switch back to 'cubic' after running test):
      • echo reno >> /proc/sys/net/ipv4/tcp_congestion_control
    • to find a list of available TCP congestion control modules:
      • ls /lib/modules/`uname -r`/kernel/net/ipv4/

  • examples of how to launch nuttcp in different modes of operation:
    • parallel streams
    • separate path for control traffic & data traffic

  • examples of how to configure netem


I Attachment sort Action Size Date Who Comment
PathdiagPAM08paper.pdf manage 590.1 K 04 Jun 2008 - 14:08 PeterONeil? Pathdiag PAM08 Conference Paper

You are here: MAX > PerformanceTuning

to top

Copyright © 1999-2012.
The information contained in these pages is the property of the Mid-Atlantic Crossroads (MAX).
If you have questions or comments, please contact MAX Administration