Adding Latency and Limiting Bandwidth

It really shouldn't be this hard.

Some aspects of linux have the reputation of being hard. Traffic control via queueing disciplines for bandwidth management for example. Even the title is enough to strike fear into the heart of a seasoned system admin.

Which is a pity really, as the things outlined in chapter 9 of the lartc are very useful in practise. The problem is the documentation is very descriptive - which is good once you know roughly what you're doing - but which has quite a steep learning curve if you don't. In fact it's pretty vertical if you don't already know quite a lot about networking. A few more worked examples would help over and above those in the cookbook.

Instead, like most people in a rush, I have relied on attempting to bash together snippets of code that are on random blogs to make /sbin/tc do what I want it to do, without really understanding what is going on.

This time, when presented with a problem for which this is the exact tool, I found I needed to dive deeper, and actually understand it, as none of the precanned recipes worked. It was a case of "if all else fails try the manual".

So now I think I've got a vague handle on what is going on, I'm documenting what I ended up doing because I'm sure I will need a worked example when I come back to this in the future. If its useful to you too, so much the better.

The Problem

We have a need to test the loading speed of our web page and trading platform under a set of network conditions that approximate the following;

Local LAN, unrestricted
"Europe", 20ms round trip latency, limit of 512kbit/sec in and out
"SE Asia", 330ms round trip latency, limit of 128kbit/sec in and out.

In practise thats quite generous, particularly in the case of the south east asia profile. There was no way I was getting 128kbit on the wifi in shakey's on Rizal Boulevard in Dumaguete earlier this month. Which was better than the hotel wifi.

The Solution

Background

We have selenium to run the tests via webdriver/remotedriver to two windows virtual machines, one running chrome and one running IE. They run on a Linux host system, and can see a loadbalancer behind which lies one of our performance test environments. We need to add latency and bandwidth restrictions to their connections, effectively to put them into each of the traffic classes above depending on which test our CI system asks them to run.

The load balancer has been set up with three virtual servers, all listening on the same IP address but different ports.

Local: 9090
Europe: 9092
SE Asia: 9091

Each virtual server has the same webserver pool behind it, so they're all the same from the point of view of the load balancer, but we'll use the different destination ports to switch the traffic between the different sets of network latency and bandwidth restriction we need to simulate the different customer locations.

The linux virtual machine host has the guests vnet network devices attached to a bridge. In turn the bridge is attached to the network, via a bonded interface. In our case bond0.30.

To make this work for both machines, we'll apply the traffic management on the bond0.30 side of the bridge.

Ascii art diagram of that;

    IE Windows VM - vnet0                               eth0
                          \                           / 
                            host bridge 30 - bond0.30  
                          /                           \ 
Chrome Windows VM - vnet1                               eth1

Qdiscs and Classes

There are three creatures we're dealing with here;

qdisc - a Queueing Discipline. These are the active things we're going to use to control how the traffic is managed. qdiscs can be classless or classful. We're going to use a classful qdisc called htb
classes - We'll use these to separate the traffic into its constituent flows and to apply different constraints on each flow.
filters - Similarly to iptables, these allow us to specify which traffic ends up in which class.

Chapter 9 says that you can only shape transmitted traffic, which is not 100% accurate, as we can do things to inbound traffic too, however our options are very limited.

So, looking at the default qdiscs, classes and filters

[root@vm01 ~]# tc -s qdisc show dev bond0.30     
qdisc pfifo_fast 0: bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 47844819829 bytes 140593932 pkt (dropped 0, overlimits 0 requeues 22) 
 rate 0bit 0pps backlog 0b 0p requeues 22 
[root@vm01 ~]# tc -s class show dev bond0.30
[root@vm01 ~]# tc -s filter show dev bond0.30
[root@vm01 ~]#

The "-s" option shows the statistics. So, but default, we have a queue discipline called pfifo_fast, which just passes traffic.

Each device has a default root which we use to build upon. We can also attach handles to classes and qdiscs to allow us to relate each part to the others and build up chains to process the packet stream. "root" is shorthand for a handle of 1:0, or the top of the tree.

One of the most useful pages I found is here; http://luxik.cdi.cz/~devik/qos/htb/manual/userg.htm

Things worth repeating from that link are;

tc tool (not only HTB) uses shortcuts to denote units of rate. kbps means kilobytes and kbit means kilobits
Note: In general (not just for HTB but for all qdiscs and classes in tc), handles are written x:y where x is an integer identifying a qdisc and y is an integer identifying a class belonging to that qdisc. The handle for a qdisc must have zero for its y value and the handle for a class must have a non-zero value for its y value. The "1:" above is treated as "1:0"

The whole page is worth reading carefully.

The Design

The Pentagons are filters, the circles represent qdiscs, and the rectangles are classes. One important point is that this diagram in no way implies flow. This is hard to get, and I had problems understanding the comments in section 9.5.2.1 "How filters are used to classify traffic" - particularly;

"You should *not* imagine the kernel to be at the apex of the tree and the network below, that is just not the case. Packets get enqueued and dequeued at the root qdisc, which is the only thing the kernel talks to."

The way I squared it in the end was to think of it as an order of application for traffic flowing through the root qdisc.

So in the above we have the root qdisc, which is an instance of the HTB qdisc. From that depends each of the classes we set up to handle the three different classes of traffic. We use htb to limit the outbound bandwidth for each of the classes (1:10, 1:11, 1:12). When we define the root qdisc we specify that class 1:10 will be our default class for the bulk of the traffic we don't want to delay.

Setting up the root qdisc;

INTERFACE=bond0.30
tc qdisc add dev $INTERFACE root handle 1:0 htb default 10

"root" is a synonym for handle 1:0. $INTERFACE is defined in the shell script to make the porting from machine to machine easier. This installs the htb qdisc on the root for our bond interface, and tells it that by default all traffic should be put in a class called 1:10.

Now we add classes for each of the types of traffic, along with the bandwidth limits we want to enforce on each of the traffic classes.

# default class
tc class add dev $INTERFACE parent 1:0 classid 1:10 htb rate 1024mbit

# "europe" traffic class - outbound bandwidth limit
tc class add dev $INTERFACE parent 1:0 classid 1:11 htb rate 512kbit

# "se asia" traffic class - outbound bandwidth limit
tc class add dev $INTERFACE parent 1:0 classid 1:12 htb rate 128kbit

We now attach the network emulator qdisc, netem, which we will use to introduce latency into each of the classes;

# network emulation - add latency.
tc qdisc add dev $INTERFACE parent 1:11 handle 11:0 netem delay 20ms 5ms 25% \
 distribution normal
tc qdisc add dev $INTERFACE parent 1:12 handle 12:0 netem delay 330ms 10ms 25% \ 
 distribution normal

This attaches the emulator instances to their parent classes, with handles that match the parents Y value, for ease of tracing.The netem parameters break down as follows.

delay 20ms - This is pretty self explanatory.
5ms - this is a jitter on the latency to give a bit of variation
25% - this indicates how much the variation in the latency of each packet will depend on its predecessor
distribution normal - how the variation is distributed.

The netem module is described completely here: http://www.linuxfoundation.org/collaborate/workgroups/networking/netem

One thing that could be improved here is that we're adding all the latency on the outbound leg. Ideally we'd add 165 ms on the way there and on the way back for the SEAsia traffic (and 10ms for the EU traffic). To do that means applying latency to the outbound interfaces in both directions. In our case that would mean applying 165ms of latency to both of the vnet interfaces as well as the bond0.30 interface. However that is tricky to do simply as the virtual machine interface names may change as they get rebooted. Instead this way we end up with the same result for far less faffing about.

Now, all we need to do is add the filters that classify the packets into their classes

SEASIAIP=172.16.10.10
SEASIAPORT=9091
EUIP=172.16.10.10
EUPORT=9092

# filter packets into appropriate traffic classes.
tc filter add dev $INTERFACE protocol ip parent 1:0 prio 1 \ 
  u32 match ip dst $SEASIAIP match ip dport $SEASIAPORT 0xffff flowid 1:12
tc filter add dev $INTERFACE protocol ip parent 1:0 prio 1 \ 
  u32 match ip dst $EUIP match ip dport $EUPORT 0xffff flowid 1:11

The action is mainly in the second line of each command, where we match the target IP of the load balancer, and the ports we've setup. The flowid is the class handle for the appropriate classes. We don't need to set up a filter for the "normal" traffic, as it is covered by the "default 10" part of the original htb root qdisc declaration.

And that takes care of the outbound traffic shaping and latency.

We now need to handle inbound.

For this we use the special ingress qdisc. There's very little we can actually do with this qdisc. It has no classes, and all you can really do is to attach a filter to it. Usefully we can use the "police" key word to restrict (by packet dropping) the inbound flow. Its not exact, but its good enough for our purposes.

# inbound qdisc.
tc qdisc add dev $INTERFACE handle ffff: ingress

# attach a policer for "se asia" class.
tc filter add dev $INTERFACE protocol ip parent ffff: prio 1 \
 u32 match ip src $SEASIAIP  match ip sport $SEASIAPORT 0xffff \
 police rate 128kbit burst 10k drop flowid :1

# attach a policer for "europe" traffic class.
tc filter add dev $INTERFACE protocol ip parent ffff: prio 1 \
 u32 match ip src $EUIP match ip sport $EUPORT 0xffff \
 police rate 512kbit burst 10k drop flowid :2

The handle ffff: is a synonym for the inbound traffic root. All you can do is attach ingress to it as shown. To be frank I've not dived into exactly how the burst keyword affects things. Essentially the above filter rule is the same as the one we used on the outbound side except we now match the source ports and IPs rather than the destination ports and IPs. Then rather than using the flowid argument we use police to instruct the kernel to drop packets from each of our loadbalancer ports if they exceed the stated rates.

Cleanup

To clean up after all of this, its sufficient to just remove the root and ingress qdiscs. Removing the top of the tree removes all the other configuration.

# remove any existing ingress qdisc.
tc qdisc del dev $INTERFACE ingress
# remove any existing egress qdiscs
tc qdisc del dev $INTERFACE root

Which cleans up all classes and filters.

Conclusion

There's an init script that encapsulates all of the above which can be downloaded from here.

[root@vm01 ~]# chkconfig latency on
[root@vm01 ~]# /etc/init.d/latency      
Usage: /etc/init.d/latency {start|stop|restart|condrestart|status}
[root@vm01 ~]# /etc/init.d/latency start
[root@vm01 ~]# /etc/init.d/latency stop
[root@vm01 ~]# /etc/init.d/latency status
 Active Queue Disciplines for bond0.10 

 Active Queueing Classes for bond0.10 

 Active Traffic Control Filters for bond0.10 
[root@vm01 ~]#

And thats it.

This mainly suits a static configuration, as is the case with our load balancer and continuous integration environment. However for web development use, this approach lacks flexibility, particularly if you don't have root access. For our developers, I looked at ipdelay but eventally settled with charles which was adequate for our purposes.

HTH.

Written by atp

Monday 30 January 2012 at 5:33 pm

Posted in Linux

atp