diff options
author | Aki <please@ignore.pl> | 2022-02-09 22:23:03 +0100 |
---|---|---|
committer | Aki <please@ignore.pl> | 2022-02-09 22:53:55 +0100 |
commit | 373dc625f82b47096893add42c4472e4a57ab7eb (patch) | |
tree | 640228d02476d379de13071b13d1b1fa322b767f /vorbis/doc/stereo.html | |
parent | 2d7dd844219965b81e81848e60d7f7bf23035ee4 (diff) | |
download | starshatter-373dc625f82b47096893add42c4472e4a57ab7eb.zip starshatter-373dc625f82b47096893add42c4472e4a57ab7eb.tar.gz starshatter-373dc625f82b47096893add42c4472e4a57ab7eb.tar.bz2 |
Moved third-party libraries to a separate subdirectory
Diffstat (limited to 'vorbis/doc/stereo.html')
-rw-r--r-- | vorbis/doc/stereo.html | 419 |
1 files changed, 0 insertions, 419 deletions
diff --git a/vorbis/doc/stereo.html b/vorbis/doc/stereo.html deleted file mode 100644 index 9cfbbea..0000000 --- a/vorbis/doc/stereo.html +++ /dev/null @@ -1,419 +0,0 @@ -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> -<html> -<head> - -<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/> -<title>Ogg Vorbis Documentation</title> - -<style type="text/css"> -body { - margin: 0 18px 0 18px; - padding-bottom: 30px; - font-family: Verdana, Arial, Helvetica, sans-serif; - color: #333333; - font-size: .8em; -} - -a { - color: #3366cc; -} - -img { - border: 0; -} - -#xiphlogo { - margin: 30px 0 16px 0; -} - -#content p { - line-height: 1.4; -} - -h1, h1 a, h2, h2 a, h3, h3 a, h4, h4 a { - font-weight: bold; - color: #ff9900; - margin: 1.3em 0 8px 0; -} - -h1 { - font-size: 1.3em; -} - -h2 { - font-size: 1.2em; -} - -h3 { - font-size: 1.1em; -} - -li { - line-height: 1.4; -} - -#copyright { - margin-top: 30px; - line-height: 1.5em; - text-align: center; - font-size: .8em; - color: #888888; - clear: both; -} -</style> - -</head> - -<body> - -<div id="xiphlogo"> - <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.Org"/></a> -</div> - -<h1>Ogg Vorbis stereo-specific channel coupling discussion</h1> - -<h2>Abstract</h2> - -<p>The Vorbis audio CODEC provides a channel coupling -mechanisms designed to reduce effective bitrate by both eliminating -interchannel redundancy and eliminating stereo image information -labeled inaudible or undesirable according to spatial psychoacoustic -models. This document describes both the mechanical coupling -mechanisms available within the Vorbis specification, as well as the -specific stereo coupling models used by the reference -<tt>libvorbis</tt> codec provided by xiph.org.</p> - -<h2>Mechanisms</h2> - -<p>In encoder release beta 4 and earlier, Vorbis supported multiple -channel encoding, but the channels were encoded entirely separately -with no cross-analysis or redundancy elimination between channels. -This multichannel strategy is very similar to the mp3's <em>dual -stereo</em> mode and Vorbis uses the same name for its analogous -uncoupled multichannel modes.</p> - -<p>However, the Vorbis spec provides for, and Vorbis release 1.0 rc1 and -later implement a coupled channel strategy. Vorbis has two specific -mechanisms that may be used alone or in conjunction to implement -channel coupling. The first is <em>channel interleaving</em> via -residue backend type 2, and the second is <em>square polar -mapping</em>. These two general mechanisms are particularly well -suited to coupling due to the structure of Vorbis encoding, as we'll -explore below, and using both we can implement both totally -<em>lossless stereo image coupling</em> [bit-for-bit decode-identical -to uncoupled modes], as well as various lossy models that seek to -eliminate inaudible or unimportant aspects of the stereo image in -order to enhance bitrate. The exact coupling implementation is -generalized to allow the encoder a great deal of flexibility in -implementation of a stereo or surround model without requiring any -significant complexity increase over the combinatorially simpler -mid/side joint stereo of mp3 and other current audio codecs.</p> - -<p>A particular Vorbis bitstream may apply channel coupling directly to -more than a pair of channels; polar mapping is hierarchical such that -polar coupling may be extrapolated to an arbitrary number of channels -and is not restricted to only stereo, quadraphonics, ambisonics or 5.1 -surround. However, the scope of this document restricts itself to the -stereo coupling case.</p> - -<a name="sqpm"></a> -<h3>Square Polar Mapping</h3> - -<h4>maximal correlation</h4> - -<p>Recall that the basic structure of a a Vorbis I stream first generates -from input audio a spectral 'floor' function that serves as an -MDCT-domain whitening filter. This floor is meant to represent the -rough envelope of the frequency spectrum, using whatever metric the -encoder cares to define. This floor is subtracted from the log -frequency spectrum, effectively normalizing the spectrum by frequency. -Each input channel is associated with a unique floor function.</p> - -<p>The basic idea behind any stereo coupling is that the left and right -channels usually correlate. This correlation is even stronger if one -first accounts for energy differences in any given frequency band -across left and right; think for example of individual instruments -mixed into different portions of the stereo image, or a stereo -recording with a dominant feature not perfectly in the center. The -floor functions, each specific to a channel, provide the perfect means -of normalizing left and right energies across the spectrum to maximize -correlation before coupling. This feature of the Vorbis format is not -a convenient accident.</p> - -<p>Because we strive to maximally correlate the left and right channels -and generally succeed in doing so, left and right residue is typically -nearly identical. We could use channel interleaving (discussed below) -alone to efficiently remove the redundancy between the left and right -channels as a side effect of entropy encoding, but a polar -representation gives benefits when left/right correlation is -strong.</p> - -<h4>point and diffuse imaging</h4> - -<p>The first advantage of a polar representation is that it effectively -separates the spatial audio information into a 'point image' -(magnitude) at a given frequency and located somewhere in the sound -field, and a 'diffuse image' (angle) that fills a large amount of -space simultaneously. Even if we preserve only the magnitude (point) -data, a detailed and carefully chosen floor function in each channel -provides us with a free, fine-grained, frequency relative intensity -stereo*. Angle information represents diffuse sound fields, such as -reverberation that fills the entire space simultaneously.</p> - -<p>*<em>Because the Vorbis model supports a number of different possible -stereo models and these models may be mixed, we do not use the term -'intensity stereo' talking about Vorbis; instead we use the terms -'point stereo', 'phase stereo' and subcategories of each.</em></p> - -<p>The majority of a stereo image is representable by polar magnitude -alone, as strong sounds tend to be produced at near-point sources; -even non-diffuse, fast, sharp echoes track very accurately using -magnitude representation almost alone (for those experimenting with -Vorbis tuning, this strategy works much better with the precise, -piecewise control of floor 1; the continuous approximation of floor 0 -results in unstable imaging). Reverberation and diffuse sounds tend -to contain less energy and be psychoacoustically dominated by the -point sources embedded in them. Thus, we again tend to concentrate -more represented energy into a predictably smaller number of numbers. -Separating representation of point and diffuse imaging also allows us -to model and manipulate point and diffuse qualities separately.</p> - -<h4>controlling bit leakage and symbol crosstalk</h4> - -<p>Because polar -representation concentrates represented energy into fewer large -values, we reduce bit 'leakage' during cascading (multistage VQ -encoding) as a secondary benefit. A single large, monolithic VQ -codebook is more efficient than a cascaded book due to entropy -'crosstalk' among symbols between different stages of a multistage cascade. -Polar representation is a way of further concentrating entropy into -predictable locations so that codebook design can take steps to -improve multistage codebook efficiency. It also allows us to cascade -various elements of the stereo image independently.</p> - -<h4>eliminating trigonometry and rounding</h4> - -<p>Rounding and computational complexity are potential problems with a -polar representation. As our encoding process involves quantization, -mixing a polar representation and quantization makes it potentially -impossible, depending on implementation, to construct a coupled stereo -mechanism that results in bit-identical decompressed output compared -to an uncoupled encoding should the encoder desire it.</p> - -<p>Vorbis uses a mapping that preserves the most useful qualities of -polar representation, relies only on addition/subtraction (during -decode; high quality encoding still requires some trig), and makes it -trivial before or after quantization to represent an angle/magnitude -through a one-to-one mapping from possible left/right value -permutations. We do this by basing our polar representation on the -unit square rather than the unit-circle.</p> - -<p>Given a magnitude and angle, we recover left and right using the -following function (note that A/B may be left/right or right/left -depending on the coupling definition used by the encoder):</p> - -<pre> - if(magnitude>0) - if(angle>0){ - A=magnitude; - B=magnitude-angle; - }else{ - B=magnitude; - A=magnitude+angle; - } - else - if(angle>0){ - A=magnitude; - B=magnitude+angle; - }else{ - B=magnitude; - A=magnitude-angle; - } - } -</pre> - -<p>The function is antisymmetric for positive and negative magnitudes in -order to eliminate a redundant value when quantizing. For example, if -we're quantizing to integer values, we can visualize a magnitude of 5 -and an angle of -2 as follows:</p> - -<p><img src="squarepolar.png" alt="square polar"/></p> - -<p>This representation loses or replicates no values; if the range of A -and B are integral -5 through 5, the number of possible Cartesian -permutations is 121. Represented in square polar notation, the -possible values are:</p> - -<pre> - 0, 0 - --1,-2 -1,-1 -1, 0 -1, 1 - - 1,-2 1,-1 1, 0 1, 1 - --2,-4 -2,-3 -2,-2 -2,-1 -2, 0 -2, 1 -2, 2 -2, 3 - - 2,-4 2,-3 ... following the pattern ... - - ... 5, 1 5, 2 5, 3 5, 4 5, 5 5, 6 5, 7 5, 8 5, 9 - -</pre> - -<p>...for a grand total of 121 possible values, the same number as in -Cartesian representation (note that, for example, <tt>5,-10</tt> is -the same as <tt>-5,10</tt>, so there's no reason to represent -both. 2,10 cannot happen, and there's no reason to account for it.) -It's also obvious that this mapping is exactly reversible.</p> - -<h3>Channel interleaving</h3> - -<p>We can remap and A/B vector using polar mapping into a magnitude/angle -vector, and it's clear that, in general, this concentrates energy in -the magnitude vector and reduces the amount of information to encode -in the angle vector. Encoding these vectors independently with -residue backend #0 or residue backend #1 will result in bitrate -savings. However, there are still implicit correlations between the -magnitude and angle vectors. The most obvious is that the amplitude -of the angle is bounded by its corresponding magnitude value.</p> - -<p>Entropy coding the results, then, further benefits from the entropy -model being able to compress magnitude and angle simultaneously. For -this reason, Vorbis implements residue backend #2 which pre-interleaves -a number of input vectors (in the stereo case, two, A and B) into a -single output vector (with the elements in the order of -A_0, B_0, A_1, B_1, A_2 ... A_n-1, B_n-1) before entropy encoding. Thus -each vector to be coded by the vector quantization backend consists of -matching magnitude and angle values.</p> - -<p>The astute reader, at this point, will notice that in the theoretical -case in which we can use monolithic codebooks of arbitrarily large -size, we can directly interleave and encode left and right without -polar mapping; in fact, the polar mapping does not appear to lend any -benefit whatsoever to the efficiency of the entropy coding. In fact, -it is perfectly possible and reasonable to build a Vorbis encoder that -dispenses with polar mapping entirely and merely interleaves the -channel. Libvorbis based encoders may configure such an encoding and -it will work as intended.</p> - -<p>However, when we leave the ideal/theoretical domain, we notice that -polar mapping does give additional practical benefits, as discussed in -the above section on polar mapping and summarized again here:</p> - -<ul> -<li>Polar mapping aids in controlling entropy 'leakage' between stages -of a cascaded codebook.</li> -<li>Polar mapping separates the stereo image -into point and diffuse components which may be analyzed and handled -differently.</li> -</ul> - -<h2>Stereo Models</h2> - -<h3>Dual Stereo</h3> - -<p>Dual stereo refers to stereo encoding where the channels are entirely -separate; they are analyzed and encoded as entirely distinct entities. -This terminology is familiar from mp3.</p> - -<h3>Lossless Stereo</h3> - -<p>Using polar mapping and/or channel interleaving, it's possible to -couple Vorbis channels losslessly, that is, construct a stereo -coupling encoding that both saves space but also decodes -bit-identically to dual stereo. OggEnc 1.0 and later uses this -mode in all high-bitrate encoding.</p> - -<p>Overall, this stereo mode is overkill; however, it offers a safe -alternative to users concerned about the slightest possible -degradation to the stereo image or archival quality audio.</p> - -<h3>Phase Stereo</h3> - -<p>Phase stereo is the least aggressive means of gracefully dropping -resolution from the stereo image; it affects only diffuse imaging.</p> - -<p>It's often quoted that the human ear is deaf to signal phase above -about 4kHz; this is nearly true and a passable rule of thumb, but it -can be demonstrated that even an average user can tell the difference -between high frequency in-phase and out-of-phase noise. Obviously -then, the statement is not entirely true. However, it's also the case -that one must resort to nearly such an extreme demonstration before -finding the counterexample.</p> - -<p>'Phase stereo' is simply a more aggressive quantization of the polar -angle vector; above 4kHz it's generally quite safe to quantize noise -and noisy elements to only a handful of allowed phases, or to thin the -phase with respect to the magnitude. The phases of high amplitude -pure tones may or may not be preserved more carefully (they are -relatively rare and L/R tend to be in phase, so there is generally -little reason not to spend a few more bits on them)</p> - -<h4>example: eight phase stereo</h4> - -<p>Vorbis may implement phase stereo coupling by preserving the entirety -of the magnitude vector (essential to fine amplitude and energy -resolution overall) and quantizing the angle vector to one of only -four possible values. Given that the magnitude vector may be positive -or negative, this results in left and right phase having eight -possible permutation, thus 'eight phase stereo':</p> - -<p><img src="eightphase.png" alt="eight phase"/></p> - -<p>Left and right may be in phase (positive or negative), the most common -case by far, or out of phase by 90 or 180 degrees.</p> - -<h4>example: four phase stereo</h4> - -<p>Similarly, four phase stereo takes the quantization one step further; -it allows only in-phase and 180 degree out-out-phase signals:</p> - -<p><img src="fourphase.png" alt="four phase"/></p> - -<h3>example: point stereo</h3> - -<p>Point stereo eliminates the possibility of out-of-phase signal -entirely. Any diffuse quality to a sound source tends to collapse -inward to a point somewhere within the stereo image. A practical -example would be balanced reverberations within a large, live space; -normally the sound is diffuse and soft, giving a sonic impression of -volume. In point-stereo, the reverberations would still exist, but -sound fairly firmly centered within the image (assuming the -reverberation was centered overall; if the reverberation is stronger -to the left, then the point of localization in point stereo would be -to the left). This effect is most noticeable at low and mid -frequencies and using headphones (which grant perfect stereo -separation). Point stereo is is a graceful but generally easy to -detect degradation to the sound quality and is thus used in frequency -ranges where it is least noticeable.</p> - -<h3>Mixed Stereo</h3> - -<p>Mixed stereo is the simultaneous use of more than one of the above -stereo encoding models, generally using more aggressive modes in -higher frequencies, lower amplitudes or 'nearly' in-phase sound.</p> - -<p>It is also the case that near-DC frequencies should be encoded using -lossless coupling to avoid frame blocking artifacts.</p> - -<h3>Vorbis Stereo Modes</h3> - -<p>Vorbis, as of 1.0, uses lossless stereo and a number of mixed modes -constructed out of lossless and point stereo. Phase stereo was used -in the rc2 encoder, but is not currently used for simplicity's sake. It -will likely be re-added to the stereo model in the future.</p> - -<div id="copyright"> - The Xiph Fish Logo is a - trademark (™) of Xiph.Org.<br/> - - These pages © 1994 - 2005 Xiph.Org. All rights reserved. -</div> - -</body> -</html> - - - - - - |