HOW DO YOU TELL THE TELCO THE PROBLEM IS IN THEIR NETWORK?

 

To operate successfully, most large distributed systems depend on software, hardware, and human operators and maintainers to function correctly. Failure of any one of these elements can disrupt or bring down an entire system. One such distributed system, the US Public Switched Telephone Network (PSTN), is the US portion of possibly the largest distributed system in existence.[1] Like all telephone switching networks, the PSTN performs a fairly simple task: It connects point A with point B. Paradoxically, this seemingly trivial task requires some of the most complex and sophisticated computing systems in existence. Software for a switch with even a relatively small set of features may comprise several million lines of code. The PSTN contains thousands of switches. Switches include redundant hardware and extensive self-checking and recovery software. For several decades, AT&T has expected its switches to experience not more than two hours of failure in 40 years [2] a failure rate of 5.7 x 10^-6.


The PSTN's dependability stems from a design that successfully exploits the loose coupling of system components. Because the PSTN has many similarities with other types of distributed systems, the analysis may suggest factors to consider in the design of distributed systems in general. Major sources of failure were human error (on the part of both telephone company personnel and others), act of nature, and overloads. Overloads caused nearly half of all downtime (44 percent) in terms of outage minutes. An unexpected finding, given the complexity of the PSTN and its heavy reliance on software, was that software errors caused less system downtime (2 percent) than any other source of failure except vandalism. Hardware and software failures were similar in terms of average number of customers affected (96,000 and 118,000) and duration of outage (160 and 119 minutes). Errors on the part of telephone company personnel and acts of nature caused similar amounts of downtime (14 and 18 percent).
Usually we can ignore the network when considering Telephone problems. In figure 1, we see the usual simplified view of the network. Normally, this simplistic view is sufficient to allow us to solve our difficulties - After all, everything in the cloud is the Telco's responsibility, right? True enough, and network problems will frequently resolve of their own accord. However, sometimes we can't wait for someone else to discover and fix the problem. Put on your Sherlock Holmes hat and let’s investigate!
The key to troubleshooting network problems is persistence. If we make enough calls, and we eventually get one that does not fail, this tells us several things:
• The problem is not our equipment. Terminal equipment (such as a telephone or codec) should not care how many calls you make - it should act similarly in each case.
• The problem is “acting like” a “network” (e.g. a “trunk”) problem in that it is non-absolute; rather, it is probabilistic
Generally, we will want to make 15 calls, carefully keeping track of the number of calls where the problem occurs (we can then calculate a “success rate” from this raw data). Next, we reverse the direction of the call, and place 15 calls. If the success rates are markedly different, we can be very suspicious this is a network problem. The logic for this conclusion is as follows: On each call the same customer equipment and same Central Office switches will be used. However, as we have seen, trunk selection is dynamic. Another clue that the problem may be network related, is if the success rate varies substantially depending on the time of day. You will also sometimes note that the success of Circuit Switched Data (CSD) calls at 56 kbps may differ versus CSD calls at 64 kbps, and both will usually act differently versus voice calls.
THE NETWORK - THE BIG PICTURE
Before we go on to more detailed troubleshooting, let’s examine the network in greater detail. We will examine the USA network, but you will find similar topology in other parts of the world.


Long distance access 

Tandem switches and trunks. Figure 5, adds the network facilities to allow A to make long distance calls from CO1. USA telecommunications policy requires that users be permitted “equal access” to various competing long distance carriers. Therefore, the local Telco’s have something called an "Access Tandem Switch" that allows for this flexibility. This means that you may observe the somewhat paradoxical situation where a problem that occurs only with long distance calls is actually due to the local Telco.

Comments

Popular posts from this blog

Forex Trading mAde eAsy...

What is Forex Trading ? – An Introduction to the World of Currency Trading

VOIP (Voice Over IP)