summaryrefslogtreecommitdiffhomepage
path: root/contrib/vorbis/doc/08-residue.tex
blob: ea382437d91b38adf6af51d1f677d9add7d0c335 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
% -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
%!TEX root = Vorbis_I_spec.tex
\section{Residue setup and decode} \label{vorbis:spec:residue}

\subsection{Overview}

A residue vector represents the fine detail of the audio spectrum of
one channel in an audio frame after the encoder subtracts the floor
curve and performs any channel coupling.  A residue vector may
represent spectral lines, spectral magnitude, spectral phase or
hybrids as mixed by channel coupling.  The exact semantic content of
the vector does not matter to the residue abstraction.

Whatever the exact qualities, the Vorbis residue abstraction codes the
residue vectors into the bitstream packet, and then reconstructs the
vectors during decode.  Vorbis makes use of three different encoding
variants (numbered 0, 1 and 2) of the same basic vector encoding
abstraction.



\subsection{Residue format}

Residue format partitions each vector in the vector bundle into chunks,
classifies each chunk, encodes the chunk classifications and finally
encodes the chunks themselves using the the specific VQ arrangement
defined for each selected classification.
The exact interleaving and partitioning vary by residue encoding number,
however the high-level process used to classify and encode the residue
vector is the same in all three variants.

A set of coded residue vectors are all of the same length.  High level
coding structure, ignoring for the moment exactly how a partition is
encoded and simply trusting that it is, is as follows:

\begin{itemize}
\item Each vector is partitioned into multiple equal sized chunks
according to configuration specified.  If we have a vector size of
\emph{n}, a partition size \emph{residue\_partition\_size}, and a total
of \emph{ch} residue vectors, the total number of partitioned chunks
coded is \emph{n}/\emph{residue\_partition\_size}*\emph{ch}.  It is
important to note that the integer division truncates.  In the below
example, we assume an example \emph{residue\_partition\_size} of 8.

\item Each partition in each vector has a classification number that
specifies which of multiple configured VQ codebook setups are used to
decode that partition.  The classification numbers of each partition
can be thought of as forming a vector in their own right, as in the
illustration below.  Just as the residue vectors are coded in grouped
partitions to increase encoding efficiency, the classification vector
is also partitioned into chunks.  The integer elements of each scalar
in a classification chunk are built into a single scalar that
represents the classification numbers in that chunk.  In the below
example, the classification codeword encodes two classification
numbers.

\item The values in a residue vector may be encoded monolithically in a
single pass through the residue vector, but more often efficient
codebook design dictates that each vector is encoded as the additive
sum of several passes through the residue vector using more than one
VQ codebook.  Thus, each residue value potentially accumulates values
from multiple decode passes.  The classification value associated with
a partition is the same in each pass, thus the classification codeword
is coded only in the first pass.

\end{itemize}


\begin{center}
\includegraphics[width=\textwidth]{residue-pack}
\captionof{figure}{illustration of residue vector format}
\end{center}



\subsection{residue 0}

Residue 0 and 1 differ only in the way the values within a residue
partition are interleaved during partition encoding (visually treated
as a black box--or cyan box or brown box--in the above figure).

Residue encoding 0 interleaves VQ encoding according to the
dimension of the codebook used to encode a partition in a specific
pass.  The dimension of the codebook need not be the same in multiple
passes, however the partition size must be an even multiple of the
codebook dimension.

As an example, assume a partition vector of size eight, to be encoded
by residue 0 using codebook sizes of 8, 4, 2 and 1:

\begin{programlisting}

            original residue vector: [ 0 1 2 3 4 5 6 7 ]

codebook dimensions = 8  encoded as: [ 0 1 2 3 4 5 6 7 ]

codebook dimensions = 4  encoded as: [ 0 2 4 6 ], [ 1 3 5 7 ]

codebook dimensions = 2  encoded as: [ 0 4 ], [ 1 5 ], [ 2 6 ], [ 3 7 ]

codebook dimensions = 1  encoded as: [ 0 ], [ 1 ], [ 2 ], [ 3 ], [ 4 ], [ 5 ], [ 6 ], [ 7 ]

\end{programlisting}

It is worth mentioning at this point that no configurable value in the
residue coding setup is restricted to a power of two.



\subsection{residue 1}

Residue 1 does not interleave VQ encoding.  It represents partition
vector scalars in order.  As with residue 0, however, partition length
must be an integer multiple of the codebook dimension, although
dimension may vary from pass to pass.

As an example, assume a partition vector of size eight, to be encoded
by residue 0 using codebook sizes of 8, 4, 2 and 1:

\begin{programlisting}

            original residue vector: [ 0 1 2 3 4 5 6 7 ]

codebook dimensions = 8  encoded as: [ 0 1 2 3 4 5 6 7 ]

codebook dimensions = 4  encoded as: [ 0 1 2 3 ], [ 4 5 6 7 ]

codebook dimensions = 2  encoded as: [ 0 1 ], [ 2 3 ], [ 4 5 ], [ 6 7 ]

codebook dimensions = 1  encoded as: [ 0 ], [ 1 ], [ 2 ], [ 3 ], [ 4 ], [ 5 ], [ 6 ], [ 7 ]

\end{programlisting}



\subsection{residue 2}

Residue type two can be thought of as a variant of residue type 1.
Rather than encoding multiple passed-in vectors as in residue type 1,
the \emph{ch} passed in vectors of length \emph{n} are first
interleaved and flattened into a single vector of length
\emph{ch}*\emph{n}.  Encoding then proceeds as in type 1. Decoding is
as in type 1 with decode interleave reversed. If operating on a single
vector to begin with, residue type 1 and type 2 are equivalent.

\begin{center}
\includegraphics[width=\textwidth]{residue2}
\captionof{figure}{illustration of residue type 2}
\end{center}


\subsection{Residue decode}

\subsubsection{header decode}

Header decode for all three residue types is identical.
\begin{programlisting}
  1) [residue\_begin] = read 24 bits as unsigned integer
  2) [residue\_end] = read 24 bits as unsigned integer
  3) [residue\_partition\_size] = read 24 bits as unsigned integer and add one
  4) [residue\_classifications] = read 6 bits as unsigned integer and add one
  5) [residue\_classbook] = read 8 bits as unsigned integer
\end{programlisting}

\varname{[residue\_begin]} and
\varname{[residue\_end]} select the specific sub-portion of
each vector that is actually coded; it implements akin to a bandpass
where, for coding purposes, the vector effectively begins at element
\varname{[residue\_begin]} and ends at
\varname{[residue\_end]}.  Preceding and following values in
the unpacked vectors are zeroed.  Note that for residue type 2, these
values as well as \varname{[residue\_partition\_size]}apply to
the interleaved vector, not the individual vectors before interleave.
\varname{[residue\_partition\_size]} is as explained above,
\varname{[residue\_classifications]} is the number of possible
classification to which a partition can belong and
\varname{[residue\_classbook]} is the codebook number used to
code classification codewords.  The number of dimensions in book
\varname{[residue\_classbook]} determines how many
classification values are grouped into a single classification
codeword.  Note that the number of entries and dimensions in book
\varname{[residue\_classbook]}, along with
\varname{[residue\_classifications]}, overdetermines to
possible number of classification codewords.  
If \varname{[residue\_classifications]}\^{}\varname{[residue\_classbook]}.dimensions
exceeds \varname{[residue\_classbook]}.entries, the
bitstream should be regarded to be undecodable.

Next we read a bitmap pattern that specifies which partition classes
code values in which passes.

\begin{programlisting}
  1) iterate [i] over the range 0 ... [residue\_classifications]-1 {

       2) [high\_bits] = 0
       3) [low\_bits] = read 3 bits as unsigned integer
       4) [bitflag] = read one bit as boolean
       5) if ( [bitflag] is set ) then [high\_bits] = read five bits as unsigned integer
       6) vector [residue\_cascade] element [i] = [high\_bits] * 8 + [low\_bits]
     }
  7) done
\end{programlisting}

Finally, we read in a list of book numbers, each corresponding to
specific bit set in the cascade bitmap.  We loop over the possible
codebook classifications and the maximum possible number of encoding
stages (8 in Vorbis I, as constrained by the elements of the cascade
bitmap being eight bits):

\begin{programlisting}
  1) iterate [i] over the range 0 ... [residue\_classifications]-1 {

       2) iterate [j] over the range 0 ... 7 {

            3) if ( vector [residue\_cascade] element [i] bit [j] is set ) {

                 4) array [residue\_books] element [i][j] = read 8 bits as unsigned integer

               } else {

                 5) array [residue\_books] element [i][j] = unused

               }
          }
      }

  6) done
\end{programlisting}

An end-of-packet condition at any point in header decode renders the
stream undecodable.  In addition, any codebook number greater than the
maximum numbered codebook set up in this stream also renders the
stream undecodable. All codebooks in array [residue\_books] are
required to have a value mapping.  The presence of codebook in array
[residue\_books] without a value mapping (maptype equals zero) renders
the stream undecodable.



\subsubsection{packet decode}

Format 0 and 1 packet decode is identical except for specific
partition interleave.  Format 2 packet decode can be built out of the
format 1 decode process.  Thus we describe first the decode
infrastructure identical to all three formats.

In addition to configuration information, the residue decode process
is passed the number of vectors in the submap bundle and a vector of
flags indicating if any of the vectors are not to be decoded.  If the
passed in number of vectors is 3 and vector number 1 is marked 'do not
decode', decode skips vector 1 during the decode loop.  However, even
'do not decode' vectors are allocated and zeroed.

Depending on the values of \varname{[residue\_begin]} and
\varname{[residue\_end]}, it is obvious that the encoded
portion of a residue vector may be the entire possible residue vector
or some other strict subset of the actual residue vector size with
zero padding at either uncoded end.  However, it is also possible to
set \varname{[residue\_begin]} and
\varname{[residue\_end]} to specify a range partially or
wholly beyond the maximum vector size.  Before beginning residue
decode, limit \varname{[residue\_begin]} and
\varname{[residue\_end]} to the maximum possible vector size
as follows.  We assume that the number of vectors being encoded,
\varname{[ch]} is provided by the higher level decoding
process.

\begin{programlisting}
  1) [actual\_size] = current blocksize/2;
  2) if residue encoding is format 2
       3) [actual\_size] = [actual\_size] * [ch];
  4) [limit\_residue\_begin] = minimum of ([residue\_begin],[actual\_size]);
  5) [limit\_residue\_end] = minimum of ([residue\_end],[actual\_size]);
\end{programlisting}

The following convenience values are conceptually useful to clarifying
the decode process:

\begin{programlisting}
  1) [classwords\_per\_codeword] = [codebook\_dimensions] value of codebook [residue\_classbook]
  2) [n\_to\_read] = [limit\_residue\_end] - [limit\_residue\_begin]
  3) [partitions\_to\_read] = [n\_to\_read] / [residue\_partition\_size]
\end{programlisting}

Packet decode proceeds as follows, matching the description offered earlier in the document.
\begin{programlisting}
  1) allocate and zero all vectors that will be returned.
  2) if ([n\_to\_read] is zero), stop; there is no residue to decode.
  3) iterate [pass] over the range 0 ... 7 {

       4) [partition\_count] = 0

       5) while [partition\_count] is less than [partitions\_to\_read]

            6) if ([pass] is zero) {

                 7) iterate [j] over the range 0 .. [ch]-1 {

                      8) if vector [j] is not marked 'do not decode' {

                           9) [temp] = read from packet using codebook [residue\_classbook] in scalar context
                          10) iterate [i] descending over the range [classwords\_per\_codeword]-1 ... 0 {

                               11) array [classifications] element [j],([i]+[partition\_count]) =
                                   [temp] integer modulo [residue\_classifications]
                               12) [temp] = [temp] / [residue\_classifications] using integer division

                              }

                         }

                    }

               }

           13) iterate [i] over the range 0 .. ([classwords\_per\_codeword] - 1) while [partition\_count]
               is also less than [partitions\_to\_read] {

                 14) iterate [j] over the range 0 .. [ch]-1 {

                      15) if vector [j] is not marked 'do not decode' {

                           16) [vqclass] = array [classifications] element [j],[partition\_count]
                           17) [vqbook] = array [residue\_books] element [vqclass],[pass]
                           18) if ([vqbook] is not 'unused') {

                                19) decode partition into output vector number [j], starting at scalar
                                    offset [limit\_residue\_begin]+[partition\_count]*[residue\_partition\_size] using
                                    codebook number [vqbook] in VQ context
                          }
                     }

                 20) increment [partition\_count] by one

               }
          }
     }

 21) done

\end{programlisting}

An end-of-packet condition during packet decode is to be considered a
nominal occurrence.  Decode returns the result of vector decode up to
that point.



\subsubsection{format 0 specifics}

Format zero decodes partitions exactly as described earlier in the
'Residue Format: residue 0' section.  The following pseudocode
presents the same algorithm. Assume:

\begin{itemize}
\item  \varname{[n]} is the value in \varname{[residue\_partition\_size]}
\item \varname{[v]} is the residue vector
\item \varname{[offset]} is the beginning read offset in [v]
\end{itemize}


\begin{programlisting}
 1) [step] = [n] / [codebook\_dimensions]
 2) iterate [i] over the range 0 ... [step]-1 {

      3) vector [entry\_temp] = read vector from packet using current codebook in VQ context
      4) iterate [j] over the range 0 ... [codebook\_dimensions]-1 {

           5) vector [v] element ([offset]+[i]+[j]*[step]) =
	        vector [v] element ([offset]+[i]+[j]*[step]) +
                vector [entry\_temp] element [j]

         }

    }

  6) done

\end{programlisting}



\subsubsection{format 1 specifics}

Format 1 decodes partitions exactly as described earlier in the
'Residue Format: residue 1' section.  The following pseudocode
presents the same algorithm. Assume:

\begin{itemize}
\item  \varname{[n]} is the value in
\varname{[residue\_partition\_size]}
\item \varname{[v]} is the residue vector
\item \varname{[offset]} is the beginning read offset in [v]
\end{itemize}


\begin{programlisting}
 1) [i] = 0
 2) vector [entry\_temp] = read vector from packet using current codebook in VQ context
 3) iterate [j] over the range 0 ... [codebook\_dimensions]-1 {

      4) vector [v] element ([offset]+[i]) =
	  vector [v] element ([offset]+[i]) +
          vector [entry\_temp] element [j]
      5) increment [i]

    }

  6) if ( [i] is less than [n] ) continue at step 2
  7) done
\end{programlisting}



\subsubsection{format 2 specifics}

Format 2 is reducible to format 1.  It may be implemented as an additional step prior to and an additional post-decode step after a normal format 1 decode.


Format 2 handles 'do not decode' vectors differently than residue 0 or
1; if all vectors are marked 'do not decode', no decode occurrs.
However, if at least one vector is to be decoded, all the vectors are
decoded.  We then request normal format 1 to decode a single vector
representing all output channels, rather than a vector for each
channel.  After decode, deinterleave the vector into independent vectors, one for each output channel.  That is:

\begin{enumerate}
 \item If all vectors 0 through \emph{ch}-1 are marked 'do not decode', allocate and clear a single vector \varname{[v]}of length \emph{ch*n} and skip step 2 below; proceed directly to the post-decode step.
 \item Rather than performing format 1 decode to produce \emph{ch} vectors of length \emph{n} each, call format 1 decode to produce a single vector \varname{[v]} of length \emph{ch*n}.
 \item Post decode: Deinterleave the single vector \varname{[v]} returned by format 1 decode as described above into \emph{ch} independent vectors, one for each outputchannel, according to:
  \begin{programlisting}
  1) iterate [i] over the range 0 ... [n]-1 {

       2) iterate [j] over the range 0 ... [ch]-1 {

            3) output vector number [j] element [i] = vector [v] element ([i] * [ch] + [j])

          }
     }

  4) done
  \end{programlisting}

\end{enumerate}