SIPPING Working Group V. Hilt Internet-Draft I. Widjaja Expires: September 4, 2007 Bell Labs/Alcatel-Lucent D. Malas Level 3 Communications H. Schulzrinne Columbia University March 3, 2007 Session Initiation Protocol (SIP) Overload Control draft-hilt-sipping-overload-01 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on September 4, 2007. Copyright Notice Copyright (C) The IETF Trust (2007). Abstract Overload occurs in Session Initiation Protocol (SIP) networks when SIP servers have insufficient resources to handle all SIP messages they receive. Even though the SIP protocol provides a limited overload control mechanism through its 503 response code, SIP servers Hilt, et al. Expires September 4, 2007 [Page 1] Internet-Draft Overload Control March 2007 are still vulnerable to overload. This document proposes several new overload control mechanisms for the SIP protocol. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Design Considerations . . . . . . . . . . . . . . . . . . . . 4 3.1. System Model . . . . . . . . . . . . . . . . . . . . . . . 4 3.2. Hop-by-Hop vs. End-to-End . . . . . . . . . . . . . . . . 5 3.3. Topologies . . . . . . . . . . . . . . . . . . . . . . . . 7 3.4. Overload Control Method . . . . . . . . . . . . . . . . . 9 3.4.1. Rate-based Overload Control . . . . . . . . . . . . . 9 3.4.2. Loss-based Overload Control . . . . . . . . . . . . . 10 3.4.3. Window-based Overload Control . . . . . . . . . . . . 10 3.5. Overload Control Algorithms . . . . . . . . . . . . . . . 11 3.5.1. Increase Algorithm . . . . . . . . . . . . . . . . . . 12 3.5.2. Decrease Algorithm . . . . . . . . . . . . . . . . . . 12 3.6. Load Status . . . . . . . . . . . . . . . . . . . . . . . 12 3.7. SIP Mechanism . . . . . . . . . . . . . . . . . . . . . . 13 3.8. Backwards Compatibility . . . . . . . . . . . . . . . . . 13 3.9. Interaction with Local Overload Control . . . . . . . . . 14 4. SIP Application Considerations . . . . . . . . . . . . . . . . 14 4.1. How to Calculate Load Levels . . . . . . . . . . . . . . . 14 4.2. Responding to an Overload Indication . . . . . . . . . . . 15 4.3. Emergency Services Requests . . . . . . . . . . . . . . . 15 4.4. Operations and Management . . . . . . . . . . . . . . . . 16 5. SIP Load Header Field . . . . . . . . . . . . . . . . . . . . 16 5.1. Generating the Load Header . . . . . . . . . . . . . . . . 16 5.2. Determining the Load Header Value . . . . . . . . . . . . 17 5.3. Determining the Throttle Parameter Value . . . . . . . . . 17 5.4. Processing the Load Header . . . . . . . . . . . . . . . . 18 5.5. Using the Load Header Value . . . . . . . . . . . . . . . 19 5.6. Using the Throttle Parameter Value . . . . . . . . . . . . 19 5.7. Rejecting Requests . . . . . . . . . . . . . . . . . . . . 20 6. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 7. Security Considerations . . . . . . . . . . . . . . . . . . . 21 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 22 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 9.1. Normative References . . . . . . . . . . . . . . . . . . . 22 9.2. Informative References . . . . . . . . . . . . . . . . . . 23 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 Intellectual Property and Copyright Statements . . . . . . . . . . 25 Hilt, et al. Expires September 4, 2007 [Page 2] Internet-Draft Overload Control March 2007 1. Introduction As with any network element, a Session Initiation Protocol (SIP) [2] server can suffer from overload when the number of SIP messages it receives exceeds the number of messages it can process. SIP server overload can pose a serious problem. During periods of overload, the throughput of a SIP network can be significantly degraded. In particular, SIP server overload may lead to a situation in which the throughput drops to a small fraction of the original capacity of the network. This is often called congestion collapse. The SIP protocol provides a limited mechanism for overload control through its 503 response code. However, this mechanism cannot prevent SIP server overload and it cannot prevent congestion collapse in a network of SIP servers. In fact, the 503 response code mechanism may cause traffic to move back and forth between SIP servers and thereby worsen an overload condition. A detailed discussion of the SIP overload problem, the 503 response code and the requirements for a SIP overload control solution can be found in [5]. Overload is said to occur if a SIP server does not have sufficient resources to process all incoming SIP messages. These resources may include CPU processing capacity, memory, network bandwidth, input/ output, or disk resources. Generally speaking, overload occurs if a SIP server can no longer process or respond to all incoming SIP messages. We only consider failure cases where SIP servers cannot process all incoming SIP requests. There are other failure cases where the SIP server can process, but not fulfill, requests. These are beyond the scope of this document since SIP provides other response codes for these cases and overload control MUST NOT be used to handle these scenarios. For example, a PSTN gateway that runs out of trunk lines but still has plenty of capacity to process SIP messages should reject incoming INVITEs using a 488 (Not Acceptable Here) response [4]. Similarly, a SIP registrar that has lost connectivity to its registration database but is still capable of processing SIP messages should reject REGISTER requests with a 500 (Server Error) response [2]. This specification is structured as follows: Section 3 discusses general design principles of an SIP overload control mechanism. Section 4 discusses general considerations for applying SIP overload control. Section 5 defines a SIP protocol extension for overload control and Section 6 introduces the syntax of this extension. Section 7 and Section 8 discuss security and IANA considerations respectively. Hilt, et al. Expires September 4, 2007 [Page 3] Internet-Draft Overload Control March 2007 2. Terminology In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119 [1] and indicate requirement levels for compliant implementations. 3. Design Considerations This section discusses key design considerations for a SIP overload control mechanism. The goal for this mechanism is to prevent upstream servers to send SIP messages to an overloaded downstream server, rather than rejecting messages already sent at the overloaded server. 3.1. System Model The model shown in Figure 1 identifies fundamental components of an SIP overload control system: o SIP Processor: component that processes SIP messages. The SIP processor is the component that is protected by overload control. o Monitor: component that monitors the current load of the SIP processor on the receiving entity. The monitor implements the mechanisms needed to measure the current usage of resources relevant for the SIP processor. It reports load samples (S) to the Control Function. o Control Function: component that implements the actual overload control mechanism on the receiving and sending entity. The control function uses the load samples (S) provided by the monitor. It determines if overload has occurred and a throttle (T) needs to be set to adjust the load sent to the SIP processor on the receiving entity. The control function on the receiving entity sends load feedback (F) to the control function sending entity. o Actuator: component that acts on the throttles (T) generated by the control function and adjust the load forwarded to the receiving entity accordingly. For example, a throttle may instruct the actuator to reduce the load destined to the receiving entity by 10%. The actuator decides how the load reduction is achieved (e.g., by redirecting or rejecting requests). The type of feedback (F) conveyed from the receiving to the sending entity depends on the overload control method used (i.e., rate-based vs. window-based overload control; see Section 3.4.3) as well as other design parameters (e.g., whether load status information is Hilt, et al. Expires September 4, 2007 [Page 4] Internet-Draft Overload Control March 2007 included or not). In any case, the feedback (F) informs the sending entity that overload has occurred and that the traffic forward to the receiving entity needs to be reduced to a lower rate. Sending Receiving Entity Entity +----------------+ +----------------+ | Server A | | Server B | | +----------+ | | +----------+ | -+ | | Control | | F | | Control | | | | | Function |<-+------+--| Function | | | | +----------+ | | +----------+ | | | T | | | ^ | | Overload | v | | | S | | Control | +----------+ | | +----------+ | | | | Actuator | | | | Monitor | | | | +----------+ | | +----------+ | | | | | | ^ | -+ | v | | | | -+ | +----------+ | | +----------+ | | <-+--| SIP | | | | SIP | | | SIP --+->|Processor |--+------+->|Processor |--+-> | System | +----------+ | | +----------+ | | +----------------+ +----------------+ -+ Figure 1: System Model for Overload Control 3.2. Hop-by-Hop vs. End-to-End A SIP request is often processed by more than one SIP server. Thus, overload control can in theory be applied hop-by-hop, i.e., individually between each pair of servers, or end-to-end as a single control loop that stretches across the entire path from UAC to UAS (see Figure 2). Hilt, et al. Expires September 4, 2007 [Page 5] Internet-Draft Overload Control March 2007 +---------+ +-------+----------+ +------+ | | | ^ | | | | +---+ | | +---+ v | v //=>| C | v | //=>| C | +---+ +---+ // +---+ +---+ +---+ // +---+ | A |===>| B | | A |===>| B | +---+ +---+ \\ +---+ +---+ +---+ \\ +---+ ^ \\=>| D | ^ | \\=>| D | | +---+ | | +---+ | | | v | +---------+ +-------+----------+ (a) hop-by-hop loop (b) end-to-end loop ==> SIP request flow <-- Load feedback loop Figure 2: Hop-by-Hop vs. End-to-End In the hop-by-hop model, a separate overload control loop is instantiated between each pair of neighboring SIP servers on the path of a SIP request. Each SIP server provides load feedback to its upstream neighbors, which then adjust the amount of traffic they are forwarding to the SIP server. However, the neighbors do not forward the received feedback information further upstream. Instead, they act on the feedback and resolve the overload condition if needed, for example, by re-routing or rejecting traffic. The upstream neighbor of a server can, and should, use a separate overload control loop with its upstream neighbors. If the neighbor becomes overloaded, it will report this problem to its upstream neighbors, which again take action based on the reported feedback. Thus, in hop-by-hop overload control, overload is resolved by the direct upstream neighbors of the overloaded server without the need to involve entities that are located multiple SIP hops away. Hop-by-hop overload control can effectively reduce the impact of overload on a SIP network and, in particular, can avoid congestion collapse. In addition, hop-by-hop overload control is simple and scales well to networks with many SIP entities. It does not require a SIP entity to aggregate a large number of load status values or keep track of the load status of SIP servers it is not communicating with. End-to-end overload control implements an overload control loop along the entire path of a SIP request, from UAC to UAS. An end-to-end overload control mechanism needs to consider load information from Hilt, et al. Expires September 4, 2007 [Page 6] Internet-Draft Overload Control March 2007 all SIP servers on the way (including all proxies and the UAS). It has to be able to frequently collect the load status of all servers on the potential path(s) to a destination and combine this data into meaningful load feedback. A UA or SIP server should not throttle its load unless it knows that all potential paths to the destination are overloaded. Overall, the main problem of end-to-end path overload control is its inherent complexity since a UAC or SIP server would need to monitor all potential paths to a destination in order to know when to throttle. Therefore, end-to-end overload control is likely to only work if a UA/server sends lots of requests to the exact same destination. 3.3. Topologies A simple topology for overload control is a SIP server that receives traffic from a single source (as shown in Figure 3(a)). A load balancer is a typical example for this configuration. In more complex topology, a SIP server receives traffic from multiple upstream sources. This is shown in Figure 3(b), where SIP servers A, B and C forward traffic to server D. It is important to note that each of these servers may contribute a different amount of load to the overall load of D. This load mix may vary over time. If server D becomes overloaded, it generates feedback to reduce the amount of traffic it receives from its upstream neighbors (i.e., A or A, B and C respectively). If a SIP server (server D) becomes overloaded, it needs to decide how overload control feedback is balanced across upstream neighbors. This decision needs to account for the actual amount of traffic received from an upstream neighbor. The decision may need to be re- adjusted as the load contributed by each upstream neighbor varies over time. A server may use a local policy to decide how much load it wants to receive from each upstream neighbor. For example, a server may throttle all upstream sources equally (e.g., all sources need to reduce traffic forwarded by 10%) or to prefer some servers over others. For example, it may want to throttle a less preferred upstream neighbor earlier than a preferred neighbor or throttle the neighbor first that sends the most traffic. Since this decision is made by the receiving entity (i.e., server D), all senders for this entity are governed by the same overload control algorithm. In many network configurations, upstream servers (A, B and C) have alternative servers (server E) to which they can redirect excess messages if the primary target (server D) is overloaded (see Figure 3(c)). Servers D and E may differ in their processing capacity. When redirecting messages, the upstream servers need to Hilt, et al. Expires September 4, 2007 [Page 7] Internet-Draft Overload Control March 2007 ensure that these messages do not overload the alternate server. An overload control mechanism should enables upstream servers to only choose alternative servers that have enough capacity to handle the redirected requests. +---+ +---+ /->| D | | A |-\ / +---+ +---+ \ / \ +---+ +---+-/ +---+ +---+ \->| | | A |------>| E | | B |------>| D | +---+-\ +---+ +---+ /->| | \ / +---+ \ +---+ +---+ / \->| F | | C |-/ +---+ +---+ (a) load balancer w/ (b) multiple upstream alternate servers neighbors +---+ | A |---\ a--\ +---+=\ \---->+---+ \ \/----->| D | b--\ \--->+---+ +---+--/\ /-->+---+ \---->| | | B | \/ c-------->| D | +---+===\/\===>+---+ | | /\====>| E | ... /--->+---+ +---+--/ /==>+---+ / | C |=====/ z--/ +---+ (c) multiple upstream (d) very large number of neighbors w/ alternate server upstream neighbors Figure 3: Topologies Overload control that is based on throttling the message rate is not suited for servers that receive requests from a very large population of senders, which only infrequently send requests as shown in Figure 3(d). An edge proxy that is connected to many UAs is an example for such a configuration. Since each UA typically only contributes a single request to an overload condition, it can't decrease its message rate to resolve the overload. In such a configuration, a SIP server can gradually reduce its load Hilt, et al. Expires September 4, 2007 [Page 8] Internet-Draft Overload Control March 2007 by rejecting a percentage of the requests it receives with 503 responses. Since there are many upstream neighbors that contribute to the overall load, sending 503 to a fraction of them gradually reduces load without entirely stopping the incoming traffic and helps to resolve the overload condition in this scenario. 3.4. Overload Control Method The method used by an overload control mechanism to curb the amount of traffic forwarded to an element is a key aspect of the design. Three different types of overload control methods exist: rate-based, loss-based and window-based overload control. 3.4.1. Rate-based Overload Control The key idea of rate-based overload control is to indicate the message rate that an upstream element is allowed to send to the downstream neighbor. If overload occurs, a SIP server instructs each upstream neighbor to send at most X messages per second. This rate cap ensures that the offered load for a SIP server never increases beyond the sum of the rate caps granted to all upstream neighbors and can protect a SIP server from overload even during extreme load spikes. A common technique to implement a rate cap of a given number of messages per second X is message gapping. After transmitting a message to a downstream neighbor, a server waits for 1/X seconds before it transmits the next message to the same neighbor. Messages that arrive during the waiting period are not forwarded and are either redirected, rejected or buffered. The main drawback of this mechanism is that it requires a SIP server to assign a certain rate cap to each of its upstream neighbors based on its overall capacity. Effectively, a server assigns a share of its capacity to each upstream neighbor. The server needs to ensure that the sum of all rate caps assigned to upstream neighbors is not (significantly) higher than its actual processing capacity. This requires a SIP server to continuously evaluate the amount of load it receives from an upstream neighbor and assign a rate cap that is suitable for this neighbor. For example, in a non-overloaded situation, it could assign a rate cap that is 10% higher than the current rate from this neighbor. The rate cap needs to be adjusted if the load offered by upstream neighbors changes and new upstream neighbors appear or an existing neighbor stops transmitting. If the cap assigned to an upstream neighbor is too high, the server may still experience overload. However, if the cap is too low, the upstream neighbors will reject messages even though they could be processed by the server. Thus, rate-based overload control is likely Hilt, et al. Expires September 4, 2007 [Page 9] Internet-Draft Overload Control March 2007 to work well only if the number of upstream servers is small and constant, e.g., as shown in the example in Figure 3(d). 3.4.2. Loss-based Overload Control A loss percentage enables a SIP server to ask its upstream neighbor to reduce the amount of traffic it would normally forward to this server by a percentage X. For example, a SIP server can ask its upstream neighbors to lower the traffic it would forward to it by 10%. The upstream neighbor then redirects or rejects X percent of the traffic that is destined for this server. A loss percentage can be implemented in the upstream entity, for example, by drawing a random number between 1 and 100 for each request to be forwarded. The request is not forwarded to the server if the random number is less than or equal to X. A server does not need to track the message rate it receives from each upstream neighbor. To reduce load, a server can ask each upstream neighbor to lower traffic by a certain percentage which can be determined independent of the actual message rate contributed by each server. The loss percentage depends on the loss percentage currently used by the upstream servers and the current system load of the server. For example, if the server load approaches 90% and the current loss percentage is set to a 50% load reduction, then the server may decide to increase the loss percentage to 55% in order to get back to a system load of 80%. Similarly, the server can lower the loss percentage if permitted by the system utilization. This requires that system load can be accurately measured and that these measurements are reasonably stable. The main drawback of percentage throttling is that the throttle percentage needs to be adjusted to the offered load, in particular, if the load fluctuates quickly. For example, if a SIP server sets a throttle value of 10% at time t1 and load increases by 20% between time t1 and t2 (t1