Virtual Routing - The anti-matter of network routing
Virtual Routing - The anti-matter of network routing
Submitted by Andreas Antonopoulos on Fri, 2008-12-12 15:47.From an idea mentioned by Doug Gourlay (@dgourlay) at the C-Scape conference:
"How about using netflow information to re-balance servers in a data center"
Routing: Controlling the flow of network traffic to an optimal path between two nodes
Virtual-Routing or Anti-Routing: VMotioning nodes (servers) to optimize the flow of traffic on the network.
Using netflow information, identify those nodes (virtual servers) that have the highest traffic "affinity" from a volume perspective (or some other desired metric, like desired latency etc) and move (VMotion, XenMotion) the nodes around to re-balance the network. For example, bring the virtual servers exchanging the most traffic to hosts on the same switch or even to the same host to minimize traffic crossing multiple switches. Create a whole-data-center mapping of traffic flows, solve for least switch hops per flow and re-map all the servers in the data center to optimize network traffic.
There's a startup there somewhere. Route the node, not the packet.


Another thought on this...
For workloads that are segmented by the virtual machine construct that have some amount of I/O intensity but do not necessarily consume the entirety of an N-way CPU core you could also go so far as to co-locate the VMs on the same physical machine, and as long as policy was consistently implemented between wired and virtual switching implementations you have a significant performance optimization engine on the network.
Some companies are out talking about how low their latency is (as if 200ns really makes that much of a difference to the aggregate workload processing, but hey, who is pragmatic in marketing anyway...) but if you remove switching latency AND transit latency AND serialization delay for workloads that are capable of that type of performance optimization then you have something intriguing. While keeping policy consistent with the virtual machine regardless of location, keeping state on counters that are critical for billing and remediation in client funded models, but optimizing by reducing most/all latency and moving from a wired transfer to a zero-copy memory infrastructure, that would be powerful I feel.
dg
Telmetry feedback for dynamic traffic engineering.
NetFlow is used for this purpose today, but manually. It's certainly possible to envisage a dynamic control-plane interaction/feedback mechansim which could trigger dynamic topological/resource shifting, but an accurate baseline of expected behavior, coupled with nuanced anomaly-detection and strong rulesets will be required in order to avoid pathological behaviors being induced either deliberately or inadvertently.
IP SLA and EEM also should play a role in such a system. As Flexible NetFlow becomes more widespread, it will allow deeper insight into the network traffic heuristics, of course.
Hosts, applications, virtualization systems, load-balancers, caches, DNS servers, et. al. should be instrumented to export their relevant telemetry in NetFlow v9 format (as we at Cisco have done with the ASA5580 and ASR1000 firewall policy-plane NetFlow export); flow telemetry scales more effectively than syslog or SNMP polling/traps, and it can be combinatorially analyzed with the TCP/IP heuristics telemetry currently exported from NetFlow exported from network infrastructure devices in order to provide operational context.
Traveling nodes = traveling salesmen
Interesting idea to turn the problem upside down! The optimization of placement is another traveling-salesman sort of optimization, but with a significant difference when compared to the routing problem: the salesman didn't have to take into account the actions and interactions of other salesmen! I fear the complexity of optimizing a large set of servers may be too much to handle quickly and robustly except where systems are relatively isolate and connections are sparse. Imagine "data center flapping"! I hope whoever takes this on gets some really BIG math brains on it.
...and then enter security.
This is a fascinating discussion from a network-centric perspective and a terrifying one from that of security.
Instrumentation is critical in automation, but besides being limited today, limiting the discussion to delivery -- nee availability -- with (at a minimum) confidentiality and integrity as defined by binding policy compacts not being present is dangerous.
Further, the granularity and fidelity of the variables exchanged in this telemetry must go deeper than just flow; determining state, disposition, intent, content and context are critical. One needs to be able to make decisions on content in context, not just distribution of traffic.
I wrote a little bit about this in response yesterday:
http://rationalsecurity.typepad.com/blog/2008/12/virtual-routing-the-antimatter-of-network-security.html
/Hoff
Post new comment