flow-control.md 5.7 KB
Newer Older
bernd's avatar
bernd committed
1
# Flow Control #
bernd's avatar
bernd committed
2

bernd's avatar
bernd committed
3
The assumptions of TCP are wrong, so TCP's flow control is broken - that's
bernd's avatar
bernd committed
4
one of my rationale to create a new protocol. So what are my assumptions, and
bernd's avatar
bernd committed
5
what do I propose instead?
bernd's avatar
bernd committed
6

bernd's avatar
bernd committed
7
Let's first look at TCP: What does TCP assume?
bernd's avatar
bernd committed
8

bernd's avatar
bernd committed
9
TCP defines a window size. This equals the amount of data which is in
bernd's avatar
bernd committed
10 11 12 13 14 15 16
flight, so supposed there is no packet drop.  The tail is the last
acknowledged packet, and the head is the last sent packet.  TCP has a
"slow start", so it starts with one packet, and increases the number of
segments in the window depending on how many segments are acknowledged.  This
gives an exponential growth phase, until there's one packet drop.  The
assumption here is that when the sender sends too fast, packets are dropped.
 If a packet is dropped, TCP will half the window size, and slowly grow it
bernd's avatar
bernd committed
17
by one segment per round-trip - until again a packet is lost.
bernd's avatar
bernd committed
18

bernd's avatar
bernd committed
19
This means the number of packets in flight oscillates by a factor of two.
bernd's avatar
bernd committed
20 21 22 23 24 25 26 27 28
 So what about optimal buffer size?  The optimal buffer size for a
TCP connection is capable to keep packets for 0.5 RTT - because that's the same
amount of packets that fit onto the wire (they take 0.5 RTT from source to
destination).  This is something so far nobody has implemented, because it
requires that the buffering router measures the RTT for each TCP connection.
 This measurement is possible in theory, it can be measured from the first
syn to the first ack.  In practice, there's usually no scientific method
applied to choose the right buffer size; if you are lucky, there had been a few
experiments, selecting some buffer size on an educated guess, and the router
bernd's avatar
bernd committed
29
has a global FIFO.
bernd's avatar
bernd committed
30

bernd's avatar
bernd committed
31
The worst problem with this algorithm is on networks with poor quality, such
bernd's avatar
bernd committed
32 33 34
as wireless networks, where packet drops are relatively frequent, and have
nothing to do with the sender rate.  The next problem is that a filled up
buffer in the router delays all other connections, including low-latency
bernd's avatar
bernd committed
35
low-bandwidth real-time protocols.
bernd's avatar
bernd committed
36

bernd's avatar
bernd committed
37 38
So that's not the way to go.  We must reevaluate the assumptions and
find a solution.
bernd's avatar
bernd committed
39

bernd's avatar
bernd committed
40
## The assumptions ##
bernd's avatar
bernd committed
41

bernd's avatar
bernd committed
42 43 44 45 46 47
+ Network devices do have buffers, most of them "too large buffers" for TCP
  to work reasonable
+ The buffers are your friend, not your enemy, they avoid retransmissions
+ Buffers should usually stay almost empty
+ Packet drops are not related to flow control problems
+ Intermediate network hops can help fairness, by providing "fair queuing"
bernd's avatar
bernd committed
48

bernd's avatar
bernd committed
49 50 51
## The solution ##

Since network hops which may help with flow control are not likely to be
bernd's avatar
bernd committed
52 53
available soon (and probably also not the right solution), the solution has to
do end-to-end flow control (like TCP/IP), working with single (unfair) FIFO
bernd's avatar
bernd committed
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
queuing. The flow control should be fair (_n_ competing connections should
get _n_ of the data rate each), and it should not completely yield to
TCP, even in a buffer-bloat configuration.

The approach is the following: The sender sends short bursts of
packets (default: 8 packets per burst), and the receiver measures the
timing when these packets arrive - from earliest to latest, and
calculates the achievable data rate. The receiver sends this data rate
back to the sender, which adjusts its sending rate (to make sure the
rate is not faked, the receiver must prove it has received at least
most packets). Data rate calculation accumulates rates over several
bursts (default: 4 bursts per block), and sends only a final result,
i.e. one acknowledge per 32 packets. This is the P part of a PID
controller; the receiver constantly provides measurements of
acheivable rates, and the sender adjusts this rate on every ack
received.

The sender tracks two things: Slack and slack-growth (I and D of the PID
bernd's avatar
bernd committed
72 73
controller). Slack, i.e. accumulated buffer space, provides an exponential
slowdown, where a factor of two equates to either half the difference of
bernd's avatar
bernd committed
74
maximum and minimum observed slack or 20ms (whatever is larger).
bernd's avatar
bernd committed
75

bernd's avatar
bernd committed
76
Slack-growth is observed by the timing of the last burst compared with the
bernd's avatar
bernd committed
77 78 79
first burst in the four burst sequence. This tells us how excessive our data
rate is. To compensate, we need to multiply that time with the bursts in
flight, and add that as extra delay after the next burst we send. This allows
bernd's avatar
bernd committed
80
the buffer to recover.
bernd's avatar
bernd committed
81

bernd's avatar
bernd committed
82
The whole algorithm, sender and receiver side, fits into about 100 lines at
bernd's avatar
bernd committed
83 84
the moment, which includes the rate control and burst generation on the sender
side, but does not include all the debugging and statistics code to observe
bernd's avatar
bernd committed
85
what happens.
bernd's avatar
bernd committed
86

bernd's avatar
bernd committed
87
To get fast long-distance connections up to speed quickly, the first rate
bernd's avatar
bernd committed
88 89 90
adjustment will also up the packets in flight. Later, each ack allows for
further packets in flight (default: a maximum of two bursts, i.e. 64 packets)
before the next ack is expected. To achieve this, the sender measures round
bernd's avatar
bernd committed
91
trip delay.
bernd's avatar
bernd committed
92

bernd's avatar
bernd committed
93
This helps to detect broken connections - if the receiver goes offline or
bernd's avatar
bernd committed
94 95
has been suspended temporarily, the sender stops. It can't call back the
packets in flight, for sure, these will get lost, and might temporarily fill up
bernd's avatar
bernd committed
96
buffers.
bernd's avatar
bernd committed
97

bernd's avatar
bernd committed
98
The algorithm is measured as fair and sufficiently stable for several
bernd's avatar
bernd committed
99
parallel connections to the same sender, and it works together with parallel
bernd's avatar
bernd committed
100
TCP and LEDBAT traffic.
bernd's avatar
bernd committed
101

bernd's avatar
bernd committed
102
## Fair Queuing ##
bernd's avatar
bernd committed
103

bernd's avatar
bernd committed
104
Instead of using a single FIFO buffer policy, routers (or, in net2o
bernd's avatar
bernd committed
105 106 107 108 109 110
terminology, switches) can help fairness: Under congestion, each connection is
given its own FIFO, and all filled FIFOs of the same QoS level are served in a
round-robin fashion.  This allows the receiver to accurately determine the
actual achievable bandwidth, and does not trigger the more heuristical
delay-optimizing strategies.  Also, this buffer policy allows to have
different flow control algorithms on different protocols, and still have
bernd's avatar
bernd committed
111
fairness.