
Author Profile - Robert Merrill is Telchemy’s Creative Writer. Before joining Telchemy in 2007, he worked for eight years (and surfed a succession of corporate mergers) as an Engineering Technical Writer for BellSouth.net, BellSouth Science and Technology and AT&T Labs. He enjoys writing, learning foreign languages, riding motorcycles, and drinking craft-brewed and imported beer. Robert suggests that the riding of motorcycles and drinking beer, no matter how finely brewed should not be done simultaneously and tasting should only begin after the cycle is put up for the day. Robert writes and works with Dr. Alan Clark, the Founder and CEO of Telchemy and his skills for accurate review of technologies are well reviewed and formidable.
In my previous article, I discussed different methods of evaluating VoIP call quality using subjective methods, such as the polling of live test subjects (e.g., ACR testing), and objective methods, both intrusive (PSQM, PESQ) and non-intrusive (the E Model and Telchemy’s VQmon® performance analysis algorithm). I also described the distributed performance management framework—comprising strategic placement of mid-stream monitoring devices along with embedded performance analysis agents integrated into VoIP endpoints—and why it was the best approach for many VoIP quality monitoring applications.
Regardless of which topology or performance analysis algorithm is used, a VoIP quality assessment model is useful only if can deliver on its promises by delivering reliably accurate, repeatable results. The ITU-T Recommendation P.564, Conformance testing for voice over IP transmission quality assessment models, is an effort to define operating standards for these models and to present a methodology for objectively measuring and comparing the accuracy of their results.
P.564 establishes minimum criteria for speech quality assessment models that use objective data to assess the impact of IP impairments on one-way listening quality. Originally specific to narrowband (3.1 kHz) telephony applications, the Recommendation was extended to include wideband (7 kHz) telephony in November 2007.
Models that comply with P.564:
- Produce Mean Opinion Scores (MOS) on the ACR Listening Quality Scale---a range of 1 to 5, where 1 represents “Unacceptable” and 5 “Excellent.”
- Evaluate voice quality without regard to the actual voice payload; i.e., independent of the speech content of the analyzed RTP stream.
- Consider the impact of the voice codec used, but do not consider speech level, background noise, delay, sidetone level, echo, or other impairments that can greatly impact the conversational quality of a call.
- Can be deployed in endpoint locations as embedded monitoring agents, at mid-network monitoring locations, or a combination of both.
The accuracy of each assessment model is determined by comparing the model’s performance with that of the P.862.1 PESQ full reference algorithm, using predetermined test vectors created with set of four 8-second sample speech files that are included as an electronic attachment to the P.564 Recommendation.
Conformance: why does it matter?
Conformance with P.564 helps ensure minimum levels of consistency and accuracy in quality assessment models, defines an objective methodology for measuring their accuracy, and thereby allows the accuracy of various models to be compared in a meaningful way. A specific aim is the reduction of “false positive” and “false negative” errors: estimated speech quality scores that are too high or low, respectively. This is particularly important when the speech quality assessment model is used to monitor compliance with service level agreements (SLAs), a situation where inaccurate results can have a direct financial impact on both the VoIP service provider and the customer.
Caveats and limitations: is compliance enough?
A P.564 device can be located anywhere in the network, and may be a VoIP endpoint (such as an IP phone or gateway) containing an integrated performance analysis agent, a mid-stream device such as a router or switch with an embedded agent, or a mid-stream probe or analyzer. In IP networks, packets are individually routed and may take various paths—so care must be taken to place mid-point monitoring devices strategically, making sure that they have visibility of all packets on the data stream being monitored. (This, of course, is essential for P.564 conformance testing and for day-to-day VoIP performance monitoring.)
Also note that “minimum criteria” means just that—although compliance with P.564 is an important step, compliance alone is not a guarantee that the model will produce accurate MOS scores in real-world conditions. P.564-compliant models are required to show accuracy in predicting listening quality (more specifically MOS-LQON/MOS-LQOW, a MOS Listening Quality Objective measurement for narrowband or wideband telephony systems, respectively) from RTP packet analysis, but in real life, phone conversations are (in most cases) a two-way affair and quality can be impacted by analog (signal) impairments. Advanced performance analysis algorithms such as VQmon® significantly improve the accuracy of their estimated quality scores by including analog and conversational factors, and provide measurements of both the listening quality (MOS-LQ) and conversational quality (MOS-CQ) of each monitored call.
What are VoIP call quality assessment models used for?
P.564-compliant quality assessment models are used for a variety of VoIP performance management applications, including:
- Non-intrusive monitoring of live calls using passive agents such as VQmon.
- Diagnosing network problems and their impact on service quality using one or more correlation methods—such as observing the time intervals when quality is degraded, identifying similar impairment events on multiple call streams, and performing mid-stream monitoring at multiple points to locate problematic network links
- Managing service quality of IP Centrex/Hosted PBX customers by monitoring at the service provider edge, the customer demarcation point, and the endpoint (IP phone or gateway)
- Monitoring service quality for compliance with service level agreements (SLAs)
- Call quality monitoring and reporting using endpoint-generated metrics delivered at the end of each call (or at specified intervals) using protocols such as SIP or RTCP XR
- Active test call generation using active test agents (such as those provided with Telchemy’s DVQattest®) that can make and receive test calls, for pre-deployment testing, troubleshooting, and SLA monitoring
P.564 conformance testing methodology in a nutshell
Conformance testing in P.564 involves performing the following (simplified) steps:
- Creating a set of test vectors and associated PESQ MOS-LQOx scores by subjecting the P.564 sample speech files to various network impairments,
- Processing the test vectors using the VoIP quality assessment model in order to obtain its estimated MOS-LQOx scores,
- For each test vector, comparing the PESQ listening quality scores to those generated by the quality assessment model to observe correlation, incidence of errors, and the percentage of “false positive” and “false negative” scores predicted by the VoIP quality assessment model.
Each conformance test evaluates compliance for a particular endpoint, voice codec, and packet size, and therefore multiple variations of each test type may be necessary.
To generate test vectors, the four 8-second sample speech files (four individual talkers, two male and two female), provided as an attachment to P.564, are encoded and packetized by inserting them into IP:UDP:RTP streams. The stream is then impaired (to simulate real-world conditions of packet loss/delay) and the modified stream captured as a PCAP format file for use as a test vector.
To obtain the PESQ score, the impaired stream is also delivered to the VoIP endpoint, which may be an actual or simulated device, but which consists (at a minimum) of a jitter buffer and decoder. Using the received RTP packets, the endpoint reconstructs the speech signal, which is compared to the original (unimpaired) signal using the PESQ algorithm to determine a listening quality (MOS-LQON or MOS-LQOw) score.
Each test vector is then in turn processed by the VoIP quality assessment model to obtain an estimated MOS-LQOx score, and the results compared with the reference score obtained using PESQ. The levels of correlation, distribution and percentage of errors, and percentage of false positive/false negative predictions made by the quality assessment model are determined. Note that “positive” and “negative” refer to the model’s ability to recognize poor performance, so a “false positive” is a case where the model predicts a low score, i.e., a MOS of less than 2.00 (poor), and the PESQ MOS is greater or equal to 3.00 (fair). (A “false negative” is the inverse.)
For every conformance test completed, a statement of P.564 compliance can be created that lists descriptive information about the test performed and a summary of the performance measurements taken.
Example Test Results
The charts below show some example P.564 test results. In the first chart, the P.564 test methodology was applied to VQmon; in the second, p.564 was applied to the E Model (G.107). To achieve compliance, the algorithm being tested (VQmon or the E Model in this case) must predict the MOS-LQO score to within +/- 0.25 over the full range of impairments.
![]()
VQmon test results for P.564 – Class 1 Compliance
![]()
E Model test results for P.564 – Failed to Comply
Summary
P.564 takes VoIP quality assessment models a step in the right direction, establishing minimum criteria and presenting a methodology for objectively measuring, and comparing, the accuracy of their estimated perceptual quality scores. However, signal/noise/echo levels and conversational factors—which are not considered in P.564’s conformance requirements—can have a significant impact on the quality of VoIP calls.
For real-world performance monitoring applications, the most accurate results are provided by quality assessment models using advanced performance analysis algorithms, such as Telchemy’s VQmon, that can exceed P.564’s minimum standards by calculating the impact of analog and conversational factors on VoIP call quality.
Telchemy and its CEO and founder, Dr. Alan Clark, were actively involved in the ITU SG12 committee that developed the P.564 standard. Telchemy has been an active contributor to a number of standards organizations—including ATIS, ETSI, IETF, ITU, PacketCable, Telemanagement Forum, and TIA—and has led the development of many of the protocols and standards related to VoIP and IPTV performance reporting.
Founded in 1999, Telchemy is the world's leading provider of technology for performance management of voice and video over IP. Over 65 million units of Telchemy's VQmon® performance analysis agent have been licensed by more than 90 equipment vendors worldwide, providing accurate, reliable quality monitoring and diagnostic technology in a wide range of available network infrastructure and test equipment.
Continue reading other LoveMyTool posts on Telchemy »











Recent Comments