Thursday, February 1, 2007

Tuesday, January 30, 2007 Understanding Voip

Introduction to VoIP

Since the telephone was invented in the late 1800s, telephone communication has not changed substantially. Of course, new technologies like digital circuits, DTMF (or, "touch tone"), and caller ID have improved on this invention, but the basic functionality is still the same. Over the years, service provides made a number of changes "behind the scenes" to improve on the kinds and types of services offered to subscribers, including toll-free numbers, call-return, call forwarding, etc. By and large, users do not know how those services work, but they did know two things: the same old telephone is used and the service provider charges for each and every little incremental service addition introduced.
In the 1990s, a number of individuals in research environments, both in educational and corporate institutions, took a serious interest in carrying voice and video over IP networks, especially corporate intranets and the Internet. This technology is commonly referred to today as VoIP and is, in simple terms, the process of breaking up audio or video into small chunks, transmitting those chunks over an IP network, and reassembling those chunks at the far end so that two people can communicate using audio and video.

This idea of VoIP is certainly not new, as there are research papers and patents dating back several decades and demonstrations of the concept given at various times over the years. VoIP took center stage with the "information super highway" (or, the Internet) concept that was popularized by former Vice President Al Gore in the 1990s, as the Internet would make it possible to interconnect every home and every business with a packet-switched data network. Before Al Gore's effort to grow the Internet, the Internet was generally limited to use in academic environments, but the possibility of mass deployment of the Internet sparked this renewed interest in VoIP.

Why is VoIP Important?

One of the most important things to point out is that VoIP is not limited to voice communication. In fact, a number of efforts have been made to change this popular marketing term to better reflect the fact that VoIP means voice, video, and data conferencing. All such attempts have failed up to this point, but do understand that video telephony and real-time text communication (ToIP), for example, is definitely within the scope of the VoIP.

VoIP is important because, for the first time in more than 100 years, there is an opportunity to bring about significant change in the way that people communicate. In addition to being able to use the telephones we have today to communicate in real-time, we also have the possibility of using pure IP-based phones, including desktop and wireless phones. We also have the ability to use videophones, much like those seen in science fiction movies. Rather than calling home to talk to the family, a person can call home to see the family.

One of the more interesting aspects of VoIP is that we also have the ability to integrate a stand-alone telephone or videophone with the personal computer. One can use a computer entirely for voice and video communications (softphones), use a telephone for voice and the computer for video, or can simply use the computer in conjunction with a separate voice/video phone to provide data conferencing functions, like application sharing, electronic whiteboarding, and text chat.

VoIP allows something else: the ability to use a single high-speed Internet connection for all voice, video, and data communications. This idea is commonly referred to as convergence and is one of the primary drivers for corporate interest in the technology. The benefit of convergence should be fairly obvious: by using a single data network for all communications, it is possible to reduce the overall maintenance and deployment costs. The benefit for both home and corporate customers is that they now have the opportunity to choose from a much larger selection of service providers to provide voice and video communication services. Since the VoIP service provider can be located virtually anywhere in the world, a person with Internet access is no longer geographically restricted in their selection of service providers and is certainly not bound to their Internet access provider.

In short, VoIP enables people to communicate in more ways and with more choices.

How Does VoIP Work?

It is very easy to get into a discussion that is very technical and confusing to most readers. The purpose of this section will be to provide a very high-level overview of Voice over IP (VoIP) aimed at those who do not consider themselves experts in the subject and hopefully with enough clarity that it serves as a good introduction to most readers.

Many people have used a computer and a microphone to record a human voice or other sounds. The process involves sampling the sound that is heard by the computer at a very high rate (at least 8,000 times per second or more) and storing those "samples" in memory or in a file on the computer. Each sample of sound is just a very tiny bit of the person's voice or other sound recorded by the computer. The computer has the wherewithal to take all of those samples and play them, so that the listener can hear what was recorded.

VoIP is based on the same idea, but the difference is that the audio samples are not stored locally. Instead, they are sent over the IP network to another computer and played there.

Of course, there is much more required in order to make VoIP work. When recording the sound samples, the computer might compress those sounds so that they require less space and will certainly record only a limited frequency range. There are a number of ways to compress audio, the algorithm for which is referred to as a "compressor/de-compressor", or simply CODEC. Many CODECs exist for a variety of applications (e.g., movies and sound recordings) and, for VoIP, the CODECs are optimized for compressing voice, which significantly reduce the bandwidth used compared to an uncompressed audio stream. Speech CODECs are optimized to improve spoken words at the expense of sounds outside the frequency range of human speech. Recorded music and other sounds do not generally sound very good when passed through a speech CODEC, but that is perfectly OK for the task at hand.

Once the sound is recorded by the computer and compressed into very small samples, the samples are collected together into larger chunks and placed into data packets for transmission over the IP network. This process is referred to packetization. Generally, a single IP packet will contain 10 or more milliseconds of audio, with 20 or 30 milliseconds being most common.

Vint Cerf, who is often called the Father of the Internet, once explained packets in a way that is very easy to understand. Paraphrasing his description, he suggested to think of a packet as a postcards sent via postal mail. A postcard contains just a limited amount of information. To deliver a very long message, one must send a lot of postcards. Of course, the post office might lose one or more postcards. One also has to assemble the received postcards in order, so some kind of mechanism must be used to properly order to postcards, such as placing a sequence number on the bottom right corner. One can think of data packets in an IP network as postcards.

Just like postcards sent via the postal system, some IP data packets get lost and the CODECs must compensate for lost packets by "filling in the gaps" with audio that is acceptable to the human ear. This process is referred to as packet-loss concealment (PLC). In some cases, packets are sent multiple times in order to overcome packet loss. This method is called, appropriately enough, redundancy. Another method to address packet loss, known as forward-error correction (FEC), is to include some information from previously transmitted packets in subsequent packets. By performing mathematical operations in a particular FEC scheme, it is possible to reconstruct a lost packet from information bits in neighboring packets.

Packets are also sometimes delayed, just as with the postcards sent through the post office. This is particularly problematic for VoIP systems, as delays in delivering a voice packet means the information is too old to play. Such old packets are simply discarded, just as if the packet was never received. This is acceptable, as the same PLC algorithms can smooth the audio to provide good audio quality.

Computers generally measure the packet delay and expect the delay to remain relatively constant, though delay can increase and decrease during the course of a conversation. Variation in delay (called jitter) is the most frustrating for IP devices. Delay, itself, just means it takes longer for the recorded voice spoken by the first person to be heard by the user on the far end. In general, good networks have an end-to-end delay of less than 100ms, though delay up to 400ms is considered acceptable (especially when using satellite systems). Jitter can result in choppy voice or temporary glitches, so VoIP devices must implement jitter buffer algorithms to compensate for jitter. Essentially, this means that a certain number of packets are queued before play-out and the queue length may be increased or decreased over time to reduce the number of discarded, late-arriving packets or to reduce "mouth to ear" delay. Such "adaptive jitter buffer" schemes are also used by CD recorders and other types of devices that deal with variable delay.

Video works in much the same way as voice. Video information received through a camera is broken into small pieces, compressed with a CODEC, placed into small packets, and transmitted over the IP network. This is one reason why VoIP is promising as a new technology: adding video or other media is relatively simple. Of course, there are certain issues that must be considered that are unique to video (e.g., frame refresh and much higher bandwidth requirements), but the basic principles of VoIP equally apply to video telephony.

Of course there is much more to VoIP than just sending the audio/video packets over the Internet. There must also be an agreed protocol for how computers find each other and how information is exchanged in order to allow packets to ultimately flow between the communicating devices. There must also be an agreed format (called payload format) for the contents of the media packets. We will describe some of the popular VoIP protocols in the next section.

Through this section, we have focused on computers that communicate with each other. However, VoIP is certainly not limited to desktop computers. VoIP is implemented in a variety of hardware devices, including IP phones, analog terminal adapters (ATAs), and gateways. In short, a large number of devices can enable VoIP communication, some of which allow one to use traditional telephone devices to interface with the IP networks: one does not have to throw out existing equipment to migrate to VoIP.


VoIP Protocols

There are a number of protocols that may be employed in order to provide for VoIP communication services. In this section, we will focus on those which are most common to the majority of the devices deployed and being deployed today.

Virtually every device in the world uses a standard called Real-Time Protocol (RTP) for transmitting audio and video packets between communicating computers. RTP is defined by the IETF in RFC 3550. The payload format for a number of CODECs are defined in RFC 3551, though payload format specifications are defined in documents also published by the ITU and in other IETF RFCs. RTP also addresses issues like packet order and provides mechanisms (via the Real-Time Control Protocol, or RTCP, also defined in RFC 3550) to help address delay and jitter.

One of the areas of concern for people communicating over the Internet is the potential a person to eavesdrop on communication. To address these security concerns, RTP was improved upon with the result being called Secure RTP (defined in RFC 3711). Secure RTP provides for encryption, authentication, and integrity of the audio and video packets transmitted between communicating devices.

Before audio or video media can flow between two computers, various protocols must be employed to find the remote device and to negotiate the means by which media will flow between the two devices. The protocols that are central to this process are referred to as call-signaling protocols, the most popular of which are H.323 and Session Initiation Protocol (SIP) and they both rely on static provisioning, RAS (ITU-T Rec. H.225.0), DNS, TRIP (RFC 3219), ENUM (RFC 3762), and other protocols to find other users.

H.323 and SIP both have their origins in 1995 as researchers looked to solve the problem of how two computers can initiate communication in order to exchange audio and video media streams. H.323 enjoyed the first commercial success, due to the fact that those working on the protocol in the ITU worked quickly to publish the first standard in early 1996. SIP, on the other hand, progressed much more slowly in the IETF, with the first draft published in 1996, but the first recognized "standard" published later in 1999. SIP was revised over the years and re-published in 2002 as RFC 3261, which is the currently recognized standard for SIP. These delays in the standards process resulted in delays in market adoption of the SIP protocol.

Fundamentally, H.323 and SIP allow users to do the same thing: to establish multimedia communication (audio, video, or other data communication). However, H.323 and SIP differ significantly in design, with H.323 borrowing heavily from legacy communication systems and being a binary protocol, and with SIP not adopting many of the information elements found in legacy systems and being an ASCII-based protocol. Supporters of each protocol have debated at length as to which approach is better and the results are certainly mixed.

Over the years, there have been a lot of papers debating H.323 vs. SIP, but most of the arguments have often been "religious" in nature (e.g., "ITU vs. IETF" and "binary versus ASCII"). Very few of the papers and reports have compared the protocol on the basis of functionality and what really matters: does the protocol do the job? The fact is, both can do the job, though H.323 is superior in a number of ways: better interoperability with the PSTN, better support for video, excellent interoperability with legacy video systems (e.g., H.320), and reliable out-of-band transport of DTMF. SIP, being a "session initiation protocol", was not designed to address many of the problems that were raised and solved in legacy communication systems. SIP was also popularized in the market through misstatements that it was "easy to implement and debug". The truth is that there is a certain amount of complexity in any communication system and, no matter how one looks at it, it requires about the same amount of work to do the same thing two different ways.

In the simplest deployment, the SIP implementation is certainly easier to develop and troubleshoot. However, there are very few real-world deployments that are "simple". As a result, SIP proponents have defined a number of non-standard variations of SIP (e.g., SIP-T and SIP-I), as well as a number of non-standard extensions in order to carry the necessary information or provide the required functionality. Some have said that there are as many variations of SIP as there are SIP deployments.

Today, H.323 still commands the bulk of the VoIP deployments in the service provider market for voice transit, especially for transporting voice calls internationally. H.323 is also widely used in room-based video conferencing systems and is the #1 protocol for IP-based video systems. SIP has, most recently, become more popular for use in instant messaging systems, though there have been no successful commercial deployments of SIP-based instant messaging at the time of this writing.

Both H.323 and SIP can be referred to as "intelligent endpoint protocols". What this means is that all of the intelligence required to locate the remote endpoint and to establish media streams between the local and remote device is an integral part of the protocol. There is another class of protocols which is complementary to H.323 and SIP referred to as "device control protocols". Those protocols are H.248 and MGCP.

To understand the purpose of H.248 and MGCP, it is important to first understand the function of a gateway. A gateway is a device that offers an IP interface on one side and some sort of legacy telephone interface on the other side. The legacy telephone interface may be complex, such as an interface to a legacy PSTN switch, or may be a simple interface that allows one to connect one or a few traditional telephones. Depending on the size and purpose of the gateway, it may allow IP-originated calls to terminate to the PSTN (and vice-versa) or may simply provide a means for a person to connect a telephone to the Internet.

Originally, gateways were viewed as monolithic devices that had call control (H.323/SIP) and hardware required to control the PSTN interface. In 1998, the idea of splitting the gateway into two logical parts was proposed: one part, which contains the call control logic, is called the media gateway controller (MGC) or call agent (CA), and the other part, which interfaces with the PSTN, is called the media gateway (MG). With this functional split, a new interface existed (going between the MGC and MG), driving the necessity to define MGCP and H.248.

Some service providers provide users with devices that implement H.248 or MGCP (or comparable protocols). In the core of the network, some device serving as the MGC provides the H.323 or SIP logic necessary to properly terminate VoIP calls around the world.

Outside of H.323/SIP and H.248/MGCP, there are also non-standard protocols introduced by various companies that have been very successful in the market. Skype is one such company that has been extremely successful using a proprietary protocol. Which protocol is best for you? It really depends on your requirements, but most people simply want to make a phone call and, as such, it really does not matter.

VoIP-Enabled Services

Many people have proclaimed that VoIP enables all kinds of new services that were never possible before. This is certainly true, though the hype far exceeds reality and what is practical. Even so, there are a number of new capabilities which are practical and will come forward as we continue to deploy VoIP systems.

Video telephony is probably the first new service that will come forward that helps set VoIP apart from traditional telephone systems. Service providers are already rolling out services offering video terminals to allow people to call friends and family using video-enabled phones.

VoIP also allows one to potentially launch calls from the PC, determine the availability of friends and family members (called "presence"), control telephone services from the PC, etc. The market acceptance of most of these new kinds of services are questionable at this point, but the potential is there and has certainly garnered a tremendous amount of focus from companies trying to find a niche in this new market.

The one business application that VoIP, video telephony (or, videoconferencing), and instant messaging will enable is application sharing and electronic whiteboarding. The ITU has defined a suite of protocols (called T.120) to address this application and it has been used in tools like Microsoft NetMeeting. While NetMeeting met some success, it failed to gain wider market adoption due to the fact that it was somewhat difficult to set up and use in a corporate environment. By having better integration with the phone and wider deployment of VoIP, businesses will probably find the ability to do application sharing and electronic whiteboarding very appealing in order to improve productivity. These kinds of services that are related to VoIP are most exciting.

Hype vs. Reality

VoIP has enjoyed a significant amount of hype in the marketplace. It was initially viewed as a way to get free phone calls over the Internet and has evolved to being viewed as the technology that will replace the legacy PSTN. There have been literally hundreds of companies who have entered the market, the vast majority of which have failed. As with any new technology, there is a certain time required to grow the market and the growth of the VoIP market has been much slower than anticipated.

Even so, VoIP is real, it works, and companies that have been able to "hang in there" are starting to reap the reward. Literally hundreds of thousands of end users and a very large number of enterprise customers are now using VoIP as their primary phone service. Also, while many people do not know, a very large percentage of international phone calls going over IP VoIP networks today.

The work on VoIP is far from over, though. Many experts in the field are still actively working to make improvements on the technology. Over time, it should prove to be an adequate replace to the current PSTN used around the world today and is already an adequate replacement in limited deployments, such as enterprise environments where network quality-of-service (QoS) is well-managed. It also works extremely well for residential users who are willing to sacrifice a little voice quality for significantly lower telephone costs. Companies like Vonage provide an excellent service to such residential customers.

With that said, there is still a lot of hype. The technology does not always deliver the same QoS as the PSTN, so customers on networks that are not well-managed may hear distorted or poor quality audio. As a practical matter, nobody today can come to a person's home and help install VoIP service so the customer can use VoIP service on all phones in the house. This may sound like a small matter, but some people simply cannot or will not do the necessary re-wiring in the home. Finally, some service providers offer very different levels of service and have varying degrees of reliability. It's not uncommon with some service providers to see phone calls to a destination work one day and not the next. This fact is not the fault of VoIP, but due to the fact that some new, smaller VoIP service providers do not have the resources to provide the same level of reliability found in the older, mature, well-funded PSTN.

As service providers mature in their business, the quality on all fronts will improve. Until then, VoIP will remain a viable technology that should be approached with some caution. Users of the technology need to understand the limits and the potential issues before using VoIP as a replacement for current service. Residential customers should keep a mobile phone as a back-up "just in case" and enterprise customers should take the necessary steps to provide QoS on corporate networks.

Next Generation Network (NGN)

One of the interesting side-effects of VoIP is that the technology has forced all of the incumbent service providers around the world to pause and re-examine their own business. They have all come to one realization: VoIP will replace the PSTN and is a serious threat to their current business model.

In an effort to regain control of the explosion of new service providers and competition that will erode their revenues, traditional service providers have initiated a new effort referred to as the Next Generation Network (NGN). The definition of the NGN seems fairly benign as defined in ITU Recommendation Y.2001:

Next Generation Network (NGN): a packet-based network able to provide telecommunication services and able to make use of multiple broadband, QoS-enabled transport technologies and in which service-related functions are independent from underlying transport-related technologies. It offers unrestricted access by users to different service providers. It supports generalized mobility which will allow consistent and ubiquitous provision of services to users.

Any person who reads this definition and understands the technology would summarize this definition as "a well-managed Internet". This certainly sounds encouraging for those who hope to perpetuate the growth of VoIP and other multimedia services.

Unfortunately, not all things are as they appear. One of the statements made in the NGN specifications is that the IP Multimedia Subsystem (IMS) defined by 3GPP is at the core of the NGN and "all other" IP services (including data collaboration, movies-on-demand, Internet radio, etc.) is simply lumped into one small part of the NGN and is given little or no attention at all. As such, the NGN can rightfully be viewed as a very-much voice-centric effort with no real desire to grow and encourage other non-voice services.

The NGN work has a long way to go, but there is certainly a lot of hype around the effort and quite possibly one that will result in stunting the growth of new services and new choices in the market. In any case, it is far too early to tell what kind of impact the NGN effort will have on the market.

Regardless, the NGN work currently underway will have a big impact on the communication systems employed today. For more information, visit the NGN Service Provider site.

Source: http://www.packetizer.com/voip/papers/understanding_voip/voip_introduction.html

No comments: