A comparison, a contrast and a complementarity (and why agile service providers cannot live without both)
By Mats Nordlund, CEO & co-founder, Netrounds
As seen in Connect-World Asia-Pacific issue 1 2017. See the full Connect-World Asia-Pacific issue here.
Are your network services delivered as promised?
Consider this – your orchestrator has just configured a new service. Does it work across all network layers and domains? Will it continue to work throughout its lifetime and deliver the end-to-end quality your customers expect? This article will show you how to answer these important questions and why they are crucial for staying competitive in the expanding, dynamic networks of today’s Asian marketplace.
Looking beyond the Impossible Mapping Machine
Every network operator and communications service provider (CSP) in Asia relies on several data sources to analyze the status of their network quality and the services they are delivering to their customers. The most pre-dominant of these sources is the rudimentary, built-in means that every network node has to indicate problems or faults with the hardware or software components of the network devices and interconnecting links. This is widely utilized by CSPs today as the foundation for network assurance and is referred to as passive monitoring. In this context, passive implies that devices listen to network traffic as it flows through them and store statistics as counters available for later polling. Devices also store counters of observed resources such as CPU load, memory usage and interface queue utilization. These device-oriented counters and statistics are then collected from a central location for threshold evaluation and further correlated with other alarm sources, syslogs, traps and inventories in an attempt to estimate how end users are experiencing their end-to-end services. We refer to this frequently unsuccessful correlation activity as the Impossible Mapping Machine, as illustrated in Figure 1.
As the Machine’s name implies, the mapping is virtually impossible to accomplish as there is very little correlation between the collected infrastructure-centric counters and actual end-user experience, so the Impossible Mapping Machine produces a misleading view of the network and services. Instead of guiding the operations team towards a solution, this correlation of passive monitoring sources by the Impossible Mapping Machine creates the following challenges:
- Hard to understand customer impact of quality problems and alarms
- Difficult to prioritize most important network issues
- Time consuming to localize and isolate problems
A Network Operations Center (NOC), in Japan for example, that bases its alarm dashboard solely on device-oriented passive monitoring will likely end up in a situation like the one depicted in Figure 2. Here the alarm list is overflowing with entries that are not related to any customer-related problems, and even more concerning is the number of customers that are actually affected by real problems yet do not have any associated alarms. This is a serious problem.
The solution to this lack of end-to-end service quality visibility is to complement the Impossible Mapping Machine with active test methods that directly assess the quality from the customer’s perspective. In this article we will compare and contrast active test methods and passive monitoring. Most importantly, we will explain how these solutions complement one another and shed some light on why competitive, agile network operators will need to take advantage of both solutions to become the customer experience winners.
What "active" means
Active testing injects traffic into your network to consume network services and imitate the usage behaviors of real end users. This method provides authentic real-time insights into how the network is behaving and how an end user is experiencing a service. This is possible as active test systems know exactly what traffic is sent out and what is received at the other end of the network, or what is coming back from content servers for further analysis. This active traffic is injected by test agents placed strategically throughout the network to provide extensive geographical coverage, within each country or across the Asian continent, as well as to provide a means for fault localization through segmentation.
Motivations for and benefits of using active testing methods
Customers expect immediate and flawless service-turn up. This creates high expectations to deliver the service “right the first time”. To achieve this goal of agile, assured service creation and delivery, active testing must be an integral part of the service provisioning process.
Today’s bandwidth and media-intensive services also require a consistently high level of quality. An inability to monitor and assure end-to-end service quality from the end-user perspective will lead to customer dissatisfaction and churn. Active, multi-layer monitoring is required to gain real-time insights into service quality.
Active testing methods enable these outcomes:
- Actively verify that services work after being provisioned: Generate real world traffic to ensure services are delivered correctly before they are exposed to end users and deliver birth certificates to key stakeholders.
- Ensure that provisioned services continue to work throughout their lifetime: Get service quality insights from the end-user perspective through active, real-time measurements.
- Resolve problems faster: Take advantage of remote testing capabilities; automate advanced test scenarios across layers, services, and domains.
- Minimize manual and field test efforts: Automate test sequences and use remote troubleshooting to reduce manual field efforts, dispatching technicians to fix problems, not to find them.
Passive monitoring is still required
The main use for passive monitoring systems is to provide a means for delivering network statistics intended for use in capacity planning and for business insights into understanding what customers are doing.
Passive monitoring solutions come in a number of different flavors, often separated into these two major categories:
- Infrastructure-centric – produced by observing resource utilization
- Monitoring of resource utilization such as CPU load, interface load, memory and storage usage, I/O transaction rates and interface queue utilization
- Traffic-centric – produced by listening to network traffic
- Methods like NetFlow, sFlow or IPFIX Analysis produce statistics on protocol distribution, traffic types and host data usage over any IP/Ethernet network
- Decoding of mobile signaling protocols and inspection of the packet core data plane produces statistics such as handover failure rates, session setup times and RTP packet loss rates
- Deep packet inspection (DPI) identifies and classifies traffic based on a signature database that includes information extracted from the data part of a packet, allowing finer control than the above methods. DPI is also typically aware of flows/sessions rather than just individual packets.
Technically, the two latter traffic-centric methods require external passive monitoring components to connect to span ports on network devices, whereas Netflow-like and infrastructure-centric methods collect counters directly from network devices.
There are some known limitations of passive systems. These include:
- Reliance on live customer traffic
- Off-net parts of the network are not covered
- Not possible to directly measure quality perceived by end users
Combining active testing and passive monitoring
Passive monitoring systems allow operators to collect utilization information, whereas active systems provide a precise and real-time tool to ensure operators that services are delivered correctly the first time to customers, that perform ongoing service-specific tests to report issues before customers find them, and that pinpoint problems when they are found and send alarms for quick resolution.
As shown in Figure 3, active and passive methods are complementary and both types of solutions are needed to achieve efficient operations.
Netrounds provides out-of-the-box capabilities to actively test and monitor performance metrics for L2/L3 connections, as well as voice, video and data services (Layer 2 to Layer 7 testing and monitoring capabilities). All of these active test and monitoring activities are automatically driven by service orchestrators when the service is provisioned to achieve closed-loop automation. All integration towards Netrounds is done through a NETCONF/YANG API.
About the author
Mats comes from a successful engineering background focused on test and measurement in the telecommunications industry. Prior to co-founding Netrounds, Mats managed nationwide projects within broadband services and fixed wireline access at the Swedish network operator Telia Company. Previously, he also spent 7 years as a research engineer with both Ericsson’s and Telia’s research and development arms. Mats obtained his Master of Science degree in Computer Engineering and Signal Processing from Luleå University of Technology in Sweden.