SlideShare a Scribd company logo
1 of 40
Dmitry Afanasiev, fl0w@yandex-team.ru
Daniel Ginsburg, dbg@yandex-team.ru
Network Architects
MPLS in DC and inter-DC
networks: the unified
forwarding mechanism for
network programmability at
scale
About Us
3
• Founded in 1993
• NASDAQ:YNDX, Mkt Cap ~$12.5B
• One of Europe's largest internet companies
and the leading search provider in Russia
• Over 60% of the local search market
• Monthly user audience of over 90 million
worldwide.
• Services: search, music, video, cloud storage,
news, weather, maps, traffic, email, ads ...
What is Yandex
4
• We're rather typical MS-DC
• Several DCs in Russia and abroad + MPLS
backbone to connect them
• About 100k servers and growing fast
• Mostly IPv6 internally, need to serve external
IPv4
• Network architecture is a bit outdated, needs
rethinking
Our Infrastructure
In Search of New Arch
6
• It needs to be:
– Scalable
– Flexible
– Programmable
• Lots of approaches out there, some get many
things right…
• But not one combines all the right pieces in the
right way
• It's really surprising because right combination
seems almost inevitable.
In Search of New Arch
7
• Many of the ideas have been around for years
(or even decades)
• Interconnection network topology – folded Clos
• Let the edge handle complexity
• Core just delivers packets edge to edge
• Overlay/underlay logical split
• Control: mix of centralized and distributed.
Needs a nice way to combine both
• Simple commodity network elements
• Hierarchy and automation to scale the network
Ideas to Build Upon
8
• All these are ideas are well known, understood
and almost universally accepted in the industry
• People are trying to implement them using a
wild mix of data plane mechanisms.
• And it introduces enormous complexity
• What's missing? Unified forwarding
mechanism
What’s missing
9
• Life is much easier when we don't have to deal
with multitude of data planes and forwarding
mechanisms.
• Fortunately, there is already well known, well
understood, standardized forwarding plane
mechanism upon which we can implement all
those ideas without compromising their value.
• It has well defined and standardized mapping
to many other popular forwarding panes.
• It's known as MPLS.
Missing… or overlooked?
Unified Forwarding: Why and How
11
• Different data plane mechanisms – different
features
• The unified data plane should be able to
support all useful features and produce their
combinations
• MPLS is very flexible:
– forwarding over a pre-signalled virtual circuit a-la ATM - this is what RSPV-
TE does
– source routing over a previously discovered topology a-la Token Ring
networks - see Segment Routing proposal
– hierarchical LPM a-la IP - just split the address over several labels and
allow routers to act on the topmost one (not that we suggest it is practical,
but it is definitely possible)
Flexibility
12
• Best way to implement arbitrary semantics is to
get rid of any semantics in protocol headers
and assign it externally
• Hardware works with protocol headers
• Control software defines the semantics
An Abstract Note on Semantics
13
• Why combining? To have the right features at
the right place or produce useful combination
of features
• There're basically two ways to combine
different data-planes together: stitch or
interwork them, and overlay them on top of
each other
Combining Data Planes
14
• It’s pain
• Might be done for subset of protocol features
• Need to translate between protocols (complex,
never perfect, looses information)
• Need to provision interworking points: fragile,
operational nightmare, create bottlenecks
• Seems nobody really does this anymore… Or
maybe we still have to sometimes?
Stitching Data Planes
15
• Overlay to: scale, virtualize, augment one data
plane with properties of another
• Overlaying is building hierarchy
• But with multiple data planes it is limited and
ad-hoc
• Often ugly: IP over Ethernet over VXLAN over
IP over Ethernet
• MPLS is intrinsically hierarchical (overlayable,
if you will)
Overlaying Data Planes
16
• Many hierarchical structures are already in the
network: topology, addressing, management
and control
• Hierarchy is the most important and the most
reliable way to scale things
Hierarchy is your friend
17
• The ability to implement hierarchy natively
enables us to ditch the notion of hard
overlay/underlay boundary.
• In a stack of DC-label, ToR-label, port-label,
slice-label, vm-label, where's the boundary of
overlay/underlay? Not in the packet
• Placement of the boundary only depends on
how you structure your control
Overlay/underlay split is a metaphor
18
• Can be as granular or coarse-grained as one
wishes. There's no network-imposed limitation
• Easy behavior aggregation. Just add an extra
label on top
• Easy behavior disaggregation. One can
expose additional granularity by adding extra
label on bottom
FEC is hierarchical
How to Control MPLS
20
• MPLS control plane is notoriously complex
• Good news: you don’t have to use all of it, can
pick good parts
• Classical distributed control is Ok for transport
• Centralized control seems better for higher
level artifacts on the edge, sometimes called
services
• Both styles can (and should) be combined
MPLS is complex?
21
• The device has be a bit smarter than in OF
• Gets parts of label stack from different control
plane components
• Assembles the full stack from those parts,
using local logic to follow assembly instructions
provided by control plane
• Assembly instructions come in form of
referencing by “name”
• Assembly uses late binding
Enabling combinability
22
• MPLS VPN (abstraction A) refers to MPLS
tunnels (abstraction B), using next-hop
resolution.
• The resolution happens on the device itself,
and two control plane entities are loosely
coupled - MPLS tunnels paths can change
their paths, the assigned labels etc, without
MP-BGP caring about it
• VPN abstraction refers to tunnel abstraction
using next-hops. Next-hop is the name which
one control plane abstraction refers to another
Enabling combinability – example
23
• Recursive next-hop resolution with labeled
routes (RFC 3107) is the powerful way to
overlay one control plane abstraction over
another
• Able to express almost anything we currently
want. Still, more expressive way is desired
• BGP 3107 is the way to interact with all-
classically-controlled MPLS networks
Enabling Combinability – BGP 3107
24
• If you can ensure that the labels at some point
of the network always stay the same (because
you assigned them to be so), you can use
static configuration on the other side
• The way to go, when one wants to avoid any
signaling dependencies
• Static configuration can be calculated and
disseminated automatically
Static Configuration
25
• On the host! Or even right from the application
• Hypervisor switch is the easiest point. SW only,
very flexible.
• Naturally fits centralized control
• Helps to scale. Lots of RAM, each element
keeps only needed state
• Modern CPUs can forward 10s of Gbps without
breaking sweat
Where MPLS should start?
26
• A simple forwarding plane (3 simple ops)
• A simple software agent on the device
(receives parts of label stack from different
control plane components, assembles full
stack, and programs the HW)
• Centralized and distributed control, or anything
in between
• Combinability of different control plane
components with late binding via names, which
the device resolves
Looks SDNish
27
• “Modularity based on abstraction is the way
things get done” --Liskov
• “SDN ...Not a revolutionary technology... ...just
a way of organizing network functionality” --
Shenker
• “SDN is merely set of abstractions for control
plane, not a specific set of mechanisms.” --
Shenker
• “Most lasting legacy of SDN is not better
datacenters - But better ways of reasoning
about network control” --Shenker
What SDN is
28
• Let the edge handle complexity – do it on host
• Core just delivers packets edge to edge –
hierarchy enables the devices to be agnostic to
changes on the edge
• Overlay/underlay logical split – just a way to
implement hierarchy
• Control: mix of centralized and distributed.
Needs a nice way to combine both – yeah!
• Simple commodity network elements – cheap
MPLS capable silicon is finally there
How Ideas Map to MPLS
29
• Key point of S-MPLS was to extend MPLS to
access and separate transport and service in
MPLS network
• NFV describes how to host service nodes in
DC. If you don’t have MPLS in DC it’s no
longer seamless
• Fix is obvious – extend MPLS into DC
• Labels can carry additional metadata if one
wants them to
NFV and Seamless MPLS
Case Study: New Yandex DC
31
• Cheap and abundant bandwidth
• Scalable forwarding with minimal state
• Multitenancy (=> network virtualization)
• Efficient resource pooling
• InterDC traffic engineering
• Function chaining: load balancing, FW, etc.
• Interconnection with existing infrastructure
• Means to integrate all of above
• Local response to some events, e.g. failures
• Automation at scale
What we need?
32
We are trying to keep design really simple. Don’t
need many functions often perceived as
desireable:
• L2 (neither real, nor emulated)
• VM mobility
– In scale-out applications nodes coming and going is a norm, no need to
move them around while preserving state and identity
– VM mobility increases complexity as it depends on other features
• Multicast
• We don't have too many changes in topology
What we don’t need
33
• Host with vLER (MPLS capable vRouter)
• Fabric switching elements – LSRs
• Centralized controller
• Legacy routers. Need to interwork with fabric
LSRs and controller. BGP 3107 is the tool
Components
34
• 3-label stack: topmost for egress switch, next
for egress port, bottom for VM
• vRouter uses {dst prefix, VRF} to impose label
stack
• Bottom label processed by destination vLER
• Expected state on a fabric switch:
#switches_in_the_fabric + #local_access_ports
Forwarding model
35
• iBGP 3107 (in-path RR w/ NHS) inside fabric
for reachabilty and label distribution (draft-
lapukhov…, but with iBGP and labels)
• iBGP 3107 to interwork with legacy routers
– Session with connected network element with NHS for switch label
– Session with controller for remaining labels, binds to switch label via next
hop
• Label mappings on edge of the fabric are
stable, can be provisioned rather than signaled
• Internal fabric failures are handled locally
• Label mappings on vRouters are distributed
centrally
Control plane
Why Now and What’s Next?
37
“The world is changed… I smell it in the air”
• A lot of similar ideas in the industry
• Seems that thinking converges on something
• But ... a lot of ugly ad-hoc solutions are
popping out here and there
• Better implement good solution until bad ones
are entrenched
• It would be a shame and missed opportunity to
stick with VXLAN/… for years when we could
get MPLS instead
Why Now?
38
• Merchant silicon is finally MPLS capable. And
the price is almost right.
• Modern CPUs can process tens of Mpps in
SW, making host-based switching feasible.
• Several open source MPLS data plane
implementations are emerging
• Several "classical" MPLS control plane
components are very useful - BGP 3107, and
have been there for quite long time.
What’s Ready?
39
• All RFC3107 implementations are broken
(multiple labels). Talk to your vendor
• Silicon is not perfect. Talk to your vendor
• A more expressive way to control late binding
of control plane artifacts than BGP 3107
• Perception MPLS as complex technology. It's
current MPLS control plane that is complex
• Perception of MPLS as WAN or metro
technology
Gaps
Thank you!
Questions?

More Related Content

What's hot

Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet FiltersKernel TLV
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KernelThomas Graf
 
eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In DeepMydbops
 
Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...
Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...
Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...Cisco Canada
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingViller Hsiao
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
 
Broadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptxBroadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptxMemory Fabric Forum
 
EBPF and Linux Networking
EBPF and Linux NetworkingEBPF and Linux Networking
EBPF and Linux NetworkingPLUMgrid
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineSunil Nagaraj
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveNetronome
 
introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack monad bobo
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019Brendan Gregg
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
 
OpenPOWER Summit 2020 - OpenCAPI Keynote
OpenPOWER Summit 2020 -  OpenCAPI KeynoteOpenPOWER Summit 2020 -  OpenCAPI Keynote
OpenPOWER Summit 2020 - OpenCAPI KeynoteAllan Cantle
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersDhanashree Prasad
 
Mininet introduction
Mininet introductionMininet introduction
Mininet introductionVipin Gupta
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
 

What's hot (20)

Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux Kernel
 
eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In Deep
 
Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...
Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...
Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
Broadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptxBroadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptx
 
EBPF and Linux Networking
EBPF and Linux NetworkingEBPF and Linux Networking
EBPF and Linux Networking
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture Pipeline
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
 
introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPF
 
Sockets and Socket-Buffer
Sockets and Socket-BufferSockets and Socket-Buffer
Sockets and Socket-Buffer
 
OpenPOWER Summit 2020 - OpenCAPI Keynote
OpenPOWER Summit 2020 -  OpenCAPI KeynoteOpenPOWER Summit 2020 -  OpenCAPI Keynote
OpenPOWER Summit 2020 - OpenCAPI Keynote
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
 
Mininet introduction
Mininet introductionMininet introduction
Mininet introduction
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
 

Viewers also liked

ENOG-1 ddos-classification.lyamin
ENOG-1 ddos-classification.lyaminENOG-1 ddos-classification.lyamin
ENOG-1 ddos-classification.lyaminAlexander Lyamin
 
Lyamin GroupIB Report 2015
Lyamin GroupIB Report 2015Lyamin GroupIB Report 2015
Lyamin GroupIB Report 2015Alexander Lyamin
 
Fork Yeah! The Rise and Development of illumos
Fork Yeah! The Rise and Development of illumosFork Yeah! The Rise and Development of illumos
Fork Yeah! The Rise and Development of illumosbcantrill
 
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...Alexander Krizhanovsky
 
Docker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote APIDocker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote APIbcantrill
 
Segment Routing & Application Engeering Routing
Segment Routing & Application Engeering RoutingSegment Routing & Application Engeering Routing
Segment Routing & Application Engeering RoutingBertrand Duvivier
 

Viewers also liked (6)

ENOG-1 ddos-classification.lyamin
ENOG-1 ddos-classification.lyaminENOG-1 ddos-classification.lyamin
ENOG-1 ddos-classification.lyamin
 
Lyamin GroupIB Report 2015
Lyamin GroupIB Report 2015Lyamin GroupIB Report 2015
Lyamin GroupIB Report 2015
 
Fork Yeah! The Rise and Development of illumos
Fork Yeah! The Rise and Development of illumosFork Yeah! The Rise and Development of illumos
Fork Yeah! The Rise and Development of illumos
 
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
 
Docker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote APIDocker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote API
 
Segment Routing & Application Engeering Routing
Segment Routing & Application Engeering RoutingSegment Routing & Application Engeering Routing
Segment Routing & Application Engeering Routing
 

Similar to MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...Yandex
 
The Future of Networking, and the Past of Protocols
The Future of Networking, and the Past of ProtocolsThe Future of Networking, and the Past of Protocols
The Future of Networking, and the Past of ProtocolsOpen Networking Summits
 
Distributed Clouds and Software Defined Networking
Distributed Clouds and Software Defined NetworkingDistributed Clouds and Software Defined Networking
Distributed Clouds and Software Defined NetworkingUS-Ignite
 
Lecture 11 Final.pptx
Lecture 11 Final.pptxLecture 11 Final.pptx
Lecture 11 Final.pptxHadeeb
 
Software-Defined Networking Layers presentation
Software-Defined Networking Layers presentationSoftware-Defined Networking Layers presentation
Software-Defined Networking Layers presentationAbdullah Salama
 
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...Radisys Corporation
 
Network architecure (3).pptx
Network architecure (3).pptxNetwork architecure (3).pptx
Network architecure (3).pptxKaythry P
 
Software Defined Networking: Network Virtualization
Software Defined Networking: Network VirtualizationSoftware Defined Networking: Network Virtualization
Software Defined Networking: Network VirtualizationNetCraftsmen
 
Why sdn
Why sdnWhy sdn
Why sdnlz1dsb
 
Topic02-Architecture.pptx
Topic02-Architecture.pptxTopic02-Architecture.pptx
Topic02-Architecture.pptxImXaib
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
 

Similar to MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale (20)

MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
 
The Future of Networking, and the Past of Protocols
The Future of Networking, and the Past of ProtocolsThe Future of Networking, and the Past of Protocols
The Future of Networking, and the Past of Protocols
 
Distributed Clouds and Software Defined Networking
Distributed Clouds and Software Defined NetworkingDistributed Clouds and Software Defined Networking
Distributed Clouds and Software Defined Networking
 
Lecture 11 Final.pptx
Lecture 11 Final.pptxLecture 11 Final.pptx
Lecture 11 Final.pptx
 
Software-Defined Networking Layers presentation
Software-Defined Networking Layers presentationSoftware-Defined Networking Layers presentation
Software-Defined Networking Layers presentation
 
Introduction to MPLS - NANOG 61
Introduction to MPLS - NANOG 61Introduction to MPLS - NANOG 61
Introduction to MPLS - NANOG 61
 
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
 
Topology.ppt
Topology.pptTopology.ppt
Topology.ppt
 
Network architecure (3).pptx
Network architecure (3).pptxNetwork architecure (3).pptx
Network architecure (3).pptx
 
Software Defined Networking: Network Virtualization
Software Defined Networking: Network VirtualizationSoftware Defined Networking: Network Virtualization
Software Defined Networking: Network Virtualization
 
08-sdnfvmec.pdf
08-sdnfvmec.pdf08-sdnfvmec.pdf
08-sdnfvmec.pdf
 
Introductionto SDN
Introductionto SDN Introductionto SDN
Introductionto SDN
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
 
Why sdn
Why sdnWhy sdn
Why sdn
 
Topic02-Architecture.pptx
Topic02-Architecture.pptxTopic02-Architecture.pptx
Topic02-Architecture.pptx
 
Raga_SDN_NSX_1
Raga_SDN_NSX_1Raga_SDN_NSX_1
Raga_SDN_NSX_1
 
Link_NwkingforDevOps
Link_NwkingforDevOpsLink_NwkingforDevOps
Link_NwkingforDevOps
 
Vaibhav (2)
Vaibhav (2)Vaibhav (2)
Vaibhav (2)
 
4_SDN.pdf
4_SDN.pdf4_SDN.pdf
4_SDN.pdf
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseWSO2
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringWSO2
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 

Recently uploaded (20)

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 

MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

  • 1. Dmitry Afanasiev, fl0w@yandex-team.ru Daniel Ginsburg, dbg@yandex-team.ru Network Architects MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale
  • 3. 3 • Founded in 1993 • NASDAQ:YNDX, Mkt Cap ~$12.5B • One of Europe's largest internet companies and the leading search provider in Russia • Over 60% of the local search market • Monthly user audience of over 90 million worldwide. • Services: search, music, video, cloud storage, news, weather, maps, traffic, email, ads ... What is Yandex
  • 4. 4 • We're rather typical MS-DC • Several DCs in Russia and abroad + MPLS backbone to connect them • About 100k servers and growing fast • Mostly IPv6 internally, need to serve external IPv4 • Network architecture is a bit outdated, needs rethinking Our Infrastructure
  • 5. In Search of New Arch
  • 6. 6 • It needs to be: – Scalable – Flexible – Programmable • Lots of approaches out there, some get many things right… • But not one combines all the right pieces in the right way • It's really surprising because right combination seems almost inevitable. In Search of New Arch
  • 7. 7 • Many of the ideas have been around for years (or even decades) • Interconnection network topology – folded Clos • Let the edge handle complexity • Core just delivers packets edge to edge • Overlay/underlay logical split • Control: mix of centralized and distributed. Needs a nice way to combine both • Simple commodity network elements • Hierarchy and automation to scale the network Ideas to Build Upon
  • 8. 8 • All these are ideas are well known, understood and almost universally accepted in the industry • People are trying to implement them using a wild mix of data plane mechanisms. • And it introduces enormous complexity • What's missing? Unified forwarding mechanism What’s missing
  • 9. 9 • Life is much easier when we don't have to deal with multitude of data planes and forwarding mechanisms. • Fortunately, there is already well known, well understood, standardized forwarding plane mechanism upon which we can implement all those ideas without compromising their value. • It has well defined and standardized mapping to many other popular forwarding panes. • It's known as MPLS. Missing… or overlooked?
  • 11. 11 • Different data plane mechanisms – different features • The unified data plane should be able to support all useful features and produce their combinations • MPLS is very flexible: – forwarding over a pre-signalled virtual circuit a-la ATM - this is what RSPV- TE does – source routing over a previously discovered topology a-la Token Ring networks - see Segment Routing proposal – hierarchical LPM a-la IP - just split the address over several labels and allow routers to act on the topmost one (not that we suggest it is practical, but it is definitely possible) Flexibility
  • 12. 12 • Best way to implement arbitrary semantics is to get rid of any semantics in protocol headers and assign it externally • Hardware works with protocol headers • Control software defines the semantics An Abstract Note on Semantics
  • 13. 13 • Why combining? To have the right features at the right place or produce useful combination of features • There're basically two ways to combine different data-planes together: stitch or interwork them, and overlay them on top of each other Combining Data Planes
  • 14. 14 • It’s pain • Might be done for subset of protocol features • Need to translate between protocols (complex, never perfect, looses information) • Need to provision interworking points: fragile, operational nightmare, create bottlenecks • Seems nobody really does this anymore… Or maybe we still have to sometimes? Stitching Data Planes
  • 15. 15 • Overlay to: scale, virtualize, augment one data plane with properties of another • Overlaying is building hierarchy • But with multiple data planes it is limited and ad-hoc • Often ugly: IP over Ethernet over VXLAN over IP over Ethernet • MPLS is intrinsically hierarchical (overlayable, if you will) Overlaying Data Planes
  • 16. 16 • Many hierarchical structures are already in the network: topology, addressing, management and control • Hierarchy is the most important and the most reliable way to scale things Hierarchy is your friend
  • 17. 17 • The ability to implement hierarchy natively enables us to ditch the notion of hard overlay/underlay boundary. • In a stack of DC-label, ToR-label, port-label, slice-label, vm-label, where's the boundary of overlay/underlay? Not in the packet • Placement of the boundary only depends on how you structure your control Overlay/underlay split is a metaphor
  • 18. 18 • Can be as granular or coarse-grained as one wishes. There's no network-imposed limitation • Easy behavior aggregation. Just add an extra label on top • Easy behavior disaggregation. One can expose additional granularity by adding extra label on bottom FEC is hierarchical
  • 20. 20 • MPLS control plane is notoriously complex • Good news: you don’t have to use all of it, can pick good parts • Classical distributed control is Ok for transport • Centralized control seems better for higher level artifacts on the edge, sometimes called services • Both styles can (and should) be combined MPLS is complex?
  • 21. 21 • The device has be a bit smarter than in OF • Gets parts of label stack from different control plane components • Assembles the full stack from those parts, using local logic to follow assembly instructions provided by control plane • Assembly instructions come in form of referencing by “name” • Assembly uses late binding Enabling combinability
  • 22. 22 • MPLS VPN (abstraction A) refers to MPLS tunnels (abstraction B), using next-hop resolution. • The resolution happens on the device itself, and two control plane entities are loosely coupled - MPLS tunnels paths can change their paths, the assigned labels etc, without MP-BGP caring about it • VPN abstraction refers to tunnel abstraction using next-hops. Next-hop is the name which one control plane abstraction refers to another Enabling combinability – example
  • 23. 23 • Recursive next-hop resolution with labeled routes (RFC 3107) is the powerful way to overlay one control plane abstraction over another • Able to express almost anything we currently want. Still, more expressive way is desired • BGP 3107 is the way to interact with all- classically-controlled MPLS networks Enabling Combinability – BGP 3107
  • 24. 24 • If you can ensure that the labels at some point of the network always stay the same (because you assigned them to be so), you can use static configuration on the other side • The way to go, when one wants to avoid any signaling dependencies • Static configuration can be calculated and disseminated automatically Static Configuration
  • 25. 25 • On the host! Or even right from the application • Hypervisor switch is the easiest point. SW only, very flexible. • Naturally fits centralized control • Helps to scale. Lots of RAM, each element keeps only needed state • Modern CPUs can forward 10s of Gbps without breaking sweat Where MPLS should start?
  • 26. 26 • A simple forwarding plane (3 simple ops) • A simple software agent on the device (receives parts of label stack from different control plane components, assembles full stack, and programs the HW) • Centralized and distributed control, or anything in between • Combinability of different control plane components with late binding via names, which the device resolves Looks SDNish
  • 27. 27 • “Modularity based on abstraction is the way things get done” --Liskov • “SDN ...Not a revolutionary technology... ...just a way of organizing network functionality” -- Shenker • “SDN is merely set of abstractions for control plane, not a specific set of mechanisms.” -- Shenker • “Most lasting legacy of SDN is not better datacenters - But better ways of reasoning about network control” --Shenker What SDN is
  • 28. 28 • Let the edge handle complexity – do it on host • Core just delivers packets edge to edge – hierarchy enables the devices to be agnostic to changes on the edge • Overlay/underlay logical split – just a way to implement hierarchy • Control: mix of centralized and distributed. Needs a nice way to combine both – yeah! • Simple commodity network elements – cheap MPLS capable silicon is finally there How Ideas Map to MPLS
  • 29. 29 • Key point of S-MPLS was to extend MPLS to access and separate transport and service in MPLS network • NFV describes how to host service nodes in DC. If you don’t have MPLS in DC it’s no longer seamless • Fix is obvious – extend MPLS into DC • Labels can carry additional metadata if one wants them to NFV and Seamless MPLS
  • 30. Case Study: New Yandex DC
  • 31. 31 • Cheap and abundant bandwidth • Scalable forwarding with minimal state • Multitenancy (=> network virtualization) • Efficient resource pooling • InterDC traffic engineering • Function chaining: load balancing, FW, etc. • Interconnection with existing infrastructure • Means to integrate all of above • Local response to some events, e.g. failures • Automation at scale What we need?
  • 32. 32 We are trying to keep design really simple. Don’t need many functions often perceived as desireable: • L2 (neither real, nor emulated) • VM mobility – In scale-out applications nodes coming and going is a norm, no need to move them around while preserving state and identity – VM mobility increases complexity as it depends on other features • Multicast • We don't have too many changes in topology What we don’t need
  • 33. 33 • Host with vLER (MPLS capable vRouter) • Fabric switching elements – LSRs • Centralized controller • Legacy routers. Need to interwork with fabric LSRs and controller. BGP 3107 is the tool Components
  • 34. 34 • 3-label stack: topmost for egress switch, next for egress port, bottom for VM • vRouter uses {dst prefix, VRF} to impose label stack • Bottom label processed by destination vLER • Expected state on a fabric switch: #switches_in_the_fabric + #local_access_ports Forwarding model
  • 35. 35 • iBGP 3107 (in-path RR w/ NHS) inside fabric for reachabilty and label distribution (draft- lapukhov…, but with iBGP and labels) • iBGP 3107 to interwork with legacy routers – Session with connected network element with NHS for switch label – Session with controller for remaining labels, binds to switch label via next hop • Label mappings on edge of the fabric are stable, can be provisioned rather than signaled • Internal fabric failures are handled locally • Label mappings on vRouters are distributed centrally Control plane
  • 36. Why Now and What’s Next?
  • 37. 37 “The world is changed… I smell it in the air” • A lot of similar ideas in the industry • Seems that thinking converges on something • But ... a lot of ugly ad-hoc solutions are popping out here and there • Better implement good solution until bad ones are entrenched • It would be a shame and missed opportunity to stick with VXLAN/… for years when we could get MPLS instead Why Now?
  • 38. 38 • Merchant silicon is finally MPLS capable. And the price is almost right. • Modern CPUs can process tens of Mpps in SW, making host-based switching feasible. • Several open source MPLS data plane implementations are emerging • Several "classical" MPLS control plane components are very useful - BGP 3107, and have been there for quite long time. What’s Ready?
  • 39. 39 • All RFC3107 implementations are broken (multiple labels). Talk to your vendor • Silicon is not perfect. Talk to your vendor • A more expressive way to control late binding of control plane artifacts than BGP 3107 • Perception MPLS as complex technology. It's current MPLS control plane that is complex • Perception of MPLS as WAN or metro technology Gaps