Project team
Victor Malishev, professor
Boris Lavrenko, postgraduate student
Anastasiya Tsyba, postgraduate student
Saint-Petersburg State Electrotechnical University
Introduction
Today wireless technologies are increasingly applied to transmit a human speech. As number of wireless users seeks to grow, improved trade-off between network capacity and high speech quality is in greater demand than ever before.
One of the most important factors of effective exploitation of channel capacity is choice of the most proper algorithm of coding/decoding speech information – codec. Codec should represent an audio signal with a minimum bit rate while maintaining perceptual quality. There is always a trade-off between the bit rate and the quality. It’s need to note that usually codecs developed for packet switched network are designed for wired technologies. So it’s became the subject of consideration how such codecs will be suitable for wireless network with random delays of packets and bit errors.
Currently, a great number of various speech coding schemes are exist. All of codecs can be broadly divided into 3 groups: waveform codecs, source codecs and hybrid codecs. Typically waveform codecs are used at high bit rates, and give very good quality speech. Source codecs operate at very low bit rates, but tend to produce speech which sounds synthetic, such codecs use harmonic synthesis of signal based on information about it’s vocal components – phoneme. Hybrid codecs use techniques from both source and waveform coding, and give good quality speech at intermediate bit rates. And the trade-off between bandwidth and quality in hybrid codecs is much better. That’s why this group of codes is of interest.
The hybrid codecs have different coding algorithms, so they have different complexity of algorithm, packet lengths, the delay introduced, speech quality measure. Also they differently respond to channel errors or packets loss.
Another classification of hybrid speech codecs accomplishes in operating bit rate. Some codecs have a fixed bit rates but another ones can dynamically respond to varying channel conditions. For example in present research will be consider such codecs as G.729, GSM FR, iLBC with fixed bit rate and EVRC, GSM AMR, Speex with dynamically changed bit rate.
The project concentrated on narrowband speech codecs, designed to provide an efficient digital representation of telephone-like signals, bandlimited to between 200 and 3400 Hz and sampled at 8 kHz, for short range (100…1000 meters) wireless ad-hoc packet switching network based on chirp spread spectrum signals in 2.4 GHz ISM band and 60…80 MHz bandwidth. The throughput of a data link is about 500 kbit per second; packet length varies from 64 to 256 byte. Computational power “on board” is strictly limited.
So this project is aimed to carry out the choice of most suitable speech codec and to fulfill practical implementation of algorithm chosen or may be some modification of selected codec for the wireless network which has specific characteristics such as bit error, random delays and short-term link losses.
Project stages
First of all it’s need to gather all necessary information about speech codecs and to range it. The complexity of algorithm which influences in MIPS of DSP, delay introduced, MOS and other parameters of codecs are of interest. Such theoretical research is considered as initial stage.
Mainly the choice of coding scheme is inseparably connected with channel conditions and parameters of DSP used. So alteration one of this conditions can (but it’s not necessarily) lead to codec change.
Next part of present research deals with practical application of codecs and codec selection. The main interest is to study how different codecs respond to channel error, packet delay and packet loss and here what bandwidth is needed.
To make this study complete it is intended to test the system in real working conditions.
Targets
The main purpose of this project is to determine the most acceptable algorithm of speech coding and to apply them in real network. Such algorithm should be robust with packet loss and channel errors, and should provide the optimum trade-off between bit rate, the quality of reconstructed speech, the complexity of the algorithm and the delay introduced.