Network Working Group Yiqun Cai Internet Draft Mike McBride Expiration Date: April 2007 Chris Hall Maria Napierala October 2006 Multicast VPN Deployment Recommendations draft-ycai-mboned-mvpn-deploy-00.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. This document is an Internet-Draft and is in full conformance with all provisions of RFC 3978/3979 . Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Cai, McBride, et al. [Page 1] Internet Draft draft-ycai-mboned-mvpn-deploy-00.txt October 2006 Abstract Multicast VPN based on early standards has been in operation in production networks for several years now. This document describes some of the experience gained from implementation and deployment and as such is informational only. Table of Contents 1 Introduction ....................................... 3 2 Implementation ..................................... 3 2.1 RPF ................................................ 3 2.2 MTU ................................................ 4 2.3 Load Balancing ..................................... 4 2.4 MTRACE ............................................. 4 3 Operational Experience ............................. 5 3.1 Multicast VPN Design Considerations ................ 5 3.2 PIM Modes For MI-PMSI .............................. 5 3.2.1 PIM-SSM for MI-PMSI ................................ 6 3.2.2 PIM-SM for MI-PSMI ................................. 6 3.3 PIM Modes For S-PMSI ............................... 6 3.4 CE to PE PIM Modes ................................. 7 3.5 Timer Alignment .................................... 7 3.6 MDT SAFI ........................................... 7 3.7 Addressing ......................................... 8 3.8 Filtering .......................................... 8 3.9 Scalability ........................................ 8 3.10 MPLS and IP ........................................ 9 3.11 QOS ................................................ 9 4 Security Considerations ............................ 9 5 Iana Considerations ................................ 9 6 Acknowledgments .................................... 9 7 Normative References ............................... 10 8 Informative References ............................. 10 9 Authors' Addresses ................................. 10 10 Full Copyright Statement ........................... 11 11 Intellectual Property .............................. 11 Cai, McBride, et al. [Page 2] Internet Draft draft-ycai-mboned-mvpn-deploy-00.txt October 2006 1. Introduction Multicast support for L3 VPN based on RFC2547 [2547bis] was first presented in San Diego IETF, 2000. It had not been included in the charter of L3VPN (formerly PPVPN) working group until San Diego IETF in 2004 and stayed on as an individual submission. During the time, the drafts, known as "rosen-draft" [ROSEN-8], continued to evolve. Several vendors provided implementation based on the drafts and service providers started deploying the solution in production networks. Limited interoperability testing has also been done. Since the working group officially accepted the challenge to define a solution or solutions to support multicast, several proposals have been suggested. They are now captured in [MVPN] which forms a base for future standards work. The unique history of multicast support in L3VPN, that is, the implementation and deployment started way before IETF adopted the work, has caused certain confusion. This is only natural with any pre-standard work. In this document, we describe some of the lessons learned from implementing and deploying MVPN. We hope it will benefit implementors as well network operators looking to deploy MVPN services. 2. Implementation As of writing, there are two known implementations: IOS from Cisco and JunOS from Juniper Networks. Contact these vendors for implementation details beyond what is provided in this draft. The following sections describe common mvpn deployment considerations. 2.1. RPF [MVPN], as well as early "rosen-drafts", specifies that the source address of any PIM packets a PE router generates over MI-PMSI (or MDT tunnel) be the same as the BGP nexthop for updates originated by the PE router for all multicast traffic sources existing in the site. However, it was discovered that one implementation didn't do so, which caused interoperability problems. The symptom of the problem is that a PE router couldn't resolve the RPF neighbour towards the source connected to a remote PE router. Interoperability should otherwise occur when using recent OS versions. Cai, McBride, et al. [Page 3] Internet Draft draft-ycai-mboned-mvpn-deploy-00.txt October 2006 2.2. MTU When GRE encapsulation is used in the core, 24 bytes are added to the IP packets generated in the VPNs. Due to the lack of a path MTU discovery mechanism for multicast, a PE router may have to fragment the incoming packets. The best practice is to fragment the packets before performing any GRE encapsulation. This spares the egress PE routers from reassembling the fragments, and leaves that for the end-systems. This doesn't work if the "DF" bit is set in the original packet since the packet will be dropped. 2.3. Load Balancing Some vendors implement a special feature called "EIBGP load balancing". What it does is install multiple routes from both EBGP and IBGP in the VRF unicast routing table. When this is enabled on all PE routers, multicast RPF may be affected if it also supports load balancing. The best practice is to make sure multicast RPF procedure selects EBGP paths only when both are present. 2.4. MTRACE MTRACE is a tool that allows a network operator to obtain multicast routing information from routers, and to explore a path to the source of the traffic or the RP. Since there is no security mechanism embedded in the protocol, some service providers expressed concern when the mtrace packet has to traverse the PE routers in order to obtain the full information. At this moment, vendors have their own mechanism to remove, or hide, certain fields in the MTRACE packets in order to satisfy the needs of their customers. It is becoming obvious that we need to define a better mechanism for the protocol for use in MVPN. Cai, McBride, et al. [Page 4] Internet Draft draft-ycai-mboned-mvpn-deploy-00.txt October 2006 3. Operational Experience 3.1. Multicast VPN Design Considerations When deploying a multicast VPN service, providers try to optimize multicast traffic distribution and delays while reducing the amount of state. The following considerations have given MVPN providers direction in their MVPN deployment: + Core multicast routing states should typically be kept to a minimum + MVPN packet delays should typically be the same as unicast traffic + Data should typically be sent only to PEs with interested receivers 3.2. PIM Modes For MI-PMSI In [ROSEN-8], "MI-PMSI" is also known as default MDTs, which is used to build an overlay network connecting all PE routers attaching to the same MVPN. Service providers have implemented PIM-SM and PIM-SSM instantiated MI-PMSI in production networks. When PIM-SSM is used, BGP based auto-discovery based on [ROSEN-8] has also been deployed. The majority of current default mdt deployments are using PIM-SM using static Anycast RP with MSDP assignment. But a dynamic RP discovery protocol, such as BSR, could also be used. The decision to deploy either PIM-SM or PIM-SSM is based on the following concerns, + the number of multicast routing states + the overhead of managing the RP if PIM-SM is used + the difference of forwarding delay between shared tree and source trees Cai, McBride, et al. [Page 5] Internet Draft draft-ycai-mboned-mvpn-deploy-00.txt October 2006 3.2.1. PIM-SSM for MI-PMSI Optimal MVPN forwarding is most easily achievable when there is a single multicast tree per MVPN per PE. Such trees are naturally built with PIM-SSM since it permits the PE to directly join a source tree for an MDT. With PIM-SSM, no Rendezvous Points are required. With SSM, however, all PEs on an MVPN tree need to maintain source state. Each PE, which is participating in MVPN, is a source. Unless VPN customers locate their multicast sources within a constrained set of sites, SSM may become a scalability concern in the service providers network. Aggregating multiple VPNs into a single multicast tree might be necessary to reduce state. 3.2.2. PIM-SM for MI-PSMI One solution to minimize the amount of multicast state in an MVPN environment is to configure PIM-SM to stay on the shared tree or to configure bi-directional (BIDIR) PIM. With shared trees, multicast state scalability is no longer a function of the number of PE's but rather of the number of VPNs. The scale benefit of shared trees comes at the cost of less efficient multicast distribution. MVPN providers use Data MDTS as defined in [ROSEN-8] to achieve bandwidth optimality. MVPN providers may address the sub-optimality of shared tree forwarding by deploying an RP at the best location for each VPN. Such an assignment would be based on the VPN source locations which may be difficult to maintain. 3.3. PIM Modes For S-PMSI In [ROSEN-8], "S-PMSI" is also known as data MDT. Data MDTs have also been deployed by service providers. Both PIM-SM and PIM-SSM are used. As of writing, the switching from MI-PMSI to S-PMSI is based on traffic rate, which is what implementations support today. The majority of data mdt deployments today are using SSM since the source address is included in the PIM Hello packet sent from the source PE, ie, there is no overlay signalling necessary. MVPN providers deploy Data MDTs (S-PMSI) to achieve optimal bandwidth useage, especially when SSM is deployed as well. S-PMSIs are optimized for active sources and receivers and triggered per (S,G) for a subset of (S,G) of a given VPN. Since Data-MDTs are triggered by (S,G) states in a VPN, they could increase the amount of multicast states in an MVPN network. Cai, McBride, et al. [Page 6] Internet Draft draft-ycai-mboned-mvpn-deploy-00.txt October 2006 3.4. CE to PE PIM Modes The PIM protocols, which are deployed within the customer VPN, are independent of the PIM Protocols in use within the Provider core. Customers can choose to deploy PIM-DM, PIM-SM, Bidir, or SSM. With SM or Bidir, customers may choose to deploy the RP on either a PE or CE router. It is recommended to have a CE router serve as the RP to avoid additional burden on a PE. We have, however, seen RPs deployed on PEs as well as CE routers. If a customer desires to have a managed RP, they may consider having the service provider manage their CE and have it serve as the RP. To avoid managing an RP altogether, SSM should be deployed. Deploying PIM-DM is not recommended and at least one implementation does not switch to Data MDTs (S-PMSI) upon receipt of customer PIM-DM traffic. 3.5. Timer Alignment When PIM-SM is used for both MI-PMSI and S-PMSI, some interesting observations were made. For example, when BSR is used in the service provider network to discover RPs, it takes more than 3 minutes to detect the failure of an RP if default timer is used. During the window, PIM Hellos originated by C-PIM instances will be dropped, which cause PIM adjacencies to be torn down. But since the default PIM Hello timer is 30 seconds, C-PIM instance on a PE router detects an outage much faster than the P-PIM instance on the same PE router. This is also a factor to be considered when choosing the protocol for RP redundancy. One option, when using BSR, is to use it only for RP discovery and then utilize Anycast-RP for RP redundancy. 3.6. MDT SAFI Prior to [MDT SAFI], the PE BGP VPNv4 prefix update was sent using an extended community using RD type 2. With the introduction of [MDT SAFI], the update is sent with RD type 0. For backward compatibility, BGP allows sending RD type 2 updates to peers unable to understand the new MDT SAFI. It is our experience, and recommendation, that customers run routers with all MDT address family or routers with all pre MDT SAFI to prevent any BGP update conflicts. Cai, McBride, et al. [Page 7] Internet Draft draft-ycai-mboned-mvpn-deploy-00.txt October 2006 3.7. Addressing It has become general practice to use 239/8 private address space when assigning address space to mvpn's. This helps to prevent leaking vpn traffic outside the mvpn core and helps keep customer data private. When SSM is used, 239.232/16 addressing is the common practice according to RFC 2365, Administratively Scoped IP Multicast. Operators typically deploy an addressing tool to manage their addresses. 3.8. Filtering It may be necessary to modify existing filters to permit GRE and UDP port 3232 to allow default and data MDT group traffic to pass. 3.9. Scalability MVPN defines use of PIM across the default MDT. PIM Hellos and join/prune messages will continue to increase with increase in PE's participating in that default MDT. There have been no scaling issues in the current deployments of MVPN. Currently, MVPN deployments consist of up to a few hundred sites per MVPN. Subsequently, the number of PE's participating in a default MDT continues to increase as customers extend the multicast group participation to additional VPN sites. There are unicast VPN customers with several thousand sites. These sites are gradually becoming multicast enabled. At some level of scaling of the default MDT, PIM Hello's and J/P messages may become a scaling issue. The scaling point at which these messages become a real operational problem is not clear. Empirical field data shows they do not affect the broad range of MVPN deployments today. [ROSEN 8] is scalable as specified across a wide range of deployments. Some analysis is needed to clarify at what operational level PIM messages do become a problem. The L3VPN WG has gathered requirements information in [MORIN]. A benchmarking draft [DRY] has been submitted to the BMWG to provide consistent MVPN test methodology. The PIM WG is evaluating methods to decrease PIM messages when this becomes of operational value. Increasing the Hello timer and increasing the periodic join/prune timer may help in MVPN scaling. Doing so, however, may affect join and leave latency in times when control messages are lost. OAM, to verify the health of the data and control paths, would also be affected if the Hello timer were increased or removed altogether. Cai, McBride, et al. [Page 8] Internet Draft draft-ycai-mboned-mvpn-deploy-00.txt October 2006 3.10. MPLS and IP Though the majority of MVPN deployments are over an MPLS core, there have been deployments in both MPLS and IP cores. We have seen L3TPv3 tunneling used successfully for transporting MVPN GRE across an IP core within a vrf. 3.11. QOS Deployments of MVPN, that have deployed QOS, are using the same QOS mechanisms for the MVPN GRE header that they are for their other data traffic. VPN customers may want to separate the queuing of multicast data from unicast data. Service Providers are extending their QOS portfolio to support more classes of service to allow for better separation of multicast and unicast traffic. Enhanced QOS mechanisms support applications with short bursts but which require bounded delay (such as video streaming). Since multicast (UDP) traffic might not be subject to the same drop behavior as TCP traffic, QOS profiles support Weighted Random Early Detection (WRED) treatment. 4. Security Considerations This document has no known security implications. 5. Iana Considerations This document creates no new requirements on IANA namespaces. 6. Acknowledgments We'd like to thank Dino Farinacci, Yuji Kamite and Hitoshi Fukuda for their feedback on this draft. Cai, McBride, et al. [Page 9] Internet Draft draft-ycai-mboned-mvpn-deploy-00.txt October 2006 7. Normative References [2547bis] "BGP/MPLS VPNs", Rosen, Rekhter, et. al., September 2003, draft-ietf-l3vpn-rfc2547bis-01.txt [MVPN] "Multicast in MPLS/BGP IP VPNs", Rosen, Aggarwal, May 2005, draft-ietf-l3vpn-2547bis-mcast-00.txt 8. Informative References [ROSEN-8] E. Rosen, Y. Cai, I. Wijnands, "Multicast in MPLS/BGP IP VPNs", draft-rosen-vpn-mcast-08.txt [MVPN-PIM] R. Aggarwal, A. Lohiya, T. Pusateri, Y. Rekhter, "Base Specification for Multicast in MPLS/BGP VPNs", draft-raggarwa-l3vpn- 2547-mvpn-00.txt [RAGGARWA-MCAST] R. Aggarwal, et. al., "Multicast in BGP/MPLS VPNs and VPLS", draft-raggarwa-l3vpn-mvpn-vpls-mcast--01.txt". [RP-MVPN] S. Yasukawa, et. al., "BGP/MPLS IP Multicast VPNs", draft- yasukawa-l3vpn-p2mp-mcast-00.txt [MDT SAFI] G. Nalawade, et. al., "MDT SAFI", draft-nalawade-idr-mdt- safi-02.txt [MORIN] T. Morin, "Requirements for Multicast in L3 Provider- Provisioned VPNs", draft-ietf-l3vpn-ppvpn-mcast-reqts-09.txt [DRY] S. Dry, "Multicast VPN Scalability Benchmarking", draft-sdry- bmwg-mvpnscale-00.txt 9. Authors' Addresses Yiqun Cai ycai@cisco.com Mike McBride mmcbride@cisco.com Chris Hall chall@sprint.net Maria Napierala mnapierala@att.com Cai, McBride, et al. [Page 10] Internet Draft draft-ycai-mboned-mvpn-deploy-00.txt October 2006 10. Full Copyright Statement Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 11. Intellectual Property By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf- ipr@ietf.org. Cai, McBride, et al. [Page 11] Internet Draft draft-ycai-mboned-mvpn-deploy-00.txt October 2006