From 74f4b1bc3b627ba4c7e03498234d88cacdfbe97b Mon Sep 17 00:00:00 2001 From: Aki Date: Wed, 29 Sep 2021 22:52:49 +0200 Subject: Squashed 'vorbis/' content from commit d22c3ab5f git-subtree-dir: vorbis git-subtree-split: d22c3ab5f633460abc2532feee60ca0892134cbf --- doc/vorbis-clip.txt | 139 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 139 insertions(+) create mode 100644 doc/vorbis-clip.txt (limited to 'doc/vorbis-clip.txt') diff --git a/doc/vorbis-clip.txt b/doc/vorbis-clip.txt new file mode 100644 index 0000000..2e67034 --- /dev/null +++ b/doc/vorbis-clip.txt @@ -0,0 +1,139 @@ +Topic: + +Sample granularity editing of a Vorbis file; inferred arbitrary sample +length starting offsets / PCM stream lengths + +Overview: + +Vorbis, like mp3, is a frame-based* audio compression where audio is +broken up into discrete short time segments. These segments are +'atomic' that is, one must recover the entire short time segment from +the frame packet; there's no way to recover only a part of the PCM time +segment from part of the coded packet without expanding the entire +packet and then discarding a portion of the resulting PCM audio. + +* In mp3, the data segment representing a given time period is called + a 'frame'; the roughly equivalent Vorbis construct is a 'packet'. + +Thus, when we edit a Vorbis stream, the finest physical editing +granularity is on these packet boundaries (the mp3 case is +actually somewhat more complex and mp3 editing is more complicated +than just snipping on a frame boundary because time data can be spread +backward or forward over frames. In Vorbis, packets are all +stand-alone). Thus, at the physical packet level, Vorbis is still +limited to streams that contain an integral number of packets. + +However, Vorbis streams may still exactly represent and be edited to a +PCM stream of arbitrary length and starting offset without padding the +beginning or end of the decoded stream or requiring that the desired +edit points be packet aligned. Vorbis makes use of Ogg stream +framing, and this framing provides time-stamping data, called a +'granule position'; our starting offset and finished stream length may +be inferred from correct usage of the granule position data. + +Time stamping mechanism: + +Vorbis packets are bundled into into Ogg pages (note that pages do not +necessarily contain integral numbers of packets, but that isn't +inportant in this discussion. More about Ogg framing can be found in +ogg/doc/framing.html). Each page that contains a packet boundary is +stamped with the absolute sample-granularity offset of the data, that +is, 'complete samples-to-date' up to the last completed packet of that +page. (The same mechanism is used for eg, video, where the number +represents complete 2-D frames, and so on). + +(It's possible but rare for a packet to span more than two pages such +that page[s] in the middle have no packet boundary; these packets have +a granule position of '-1'.) + +This granule position mechaism in Ogg is used by Vorbis to indicate when the +PCM data intended to be represented in a Vorbis segment begins a +number of samples into the data represented by the first packet[s] +and/or ends before the physical PCM data represented in the last +packet[s]. + +File length a non-integral number of frames: + +A file to be encoded in Vorbis will probably not encode into an +integral number of packets; such a file is encoded with the last +packet containing 'extra'* samples. These samples are not padding; they +will be discarded in decode. + +*(For best results, the encoder should use extra samples that preserve +the character of the last frame. Simply setting them to zero will +introduce a 'cliff' that's hard to encode, resulting in spread-frame +noise. Libvorbis extrapolates the last frame past the end of data to +produce the extra samples. Even simply duplicating the last value is +better than clamping the signal to zero). + +The encoder indicates to the decoder that the file is actually shorter +than all of the samples ('original' + 'extra') by setting the granule +position in the last page to a short value, that is, the last +timestamp is the original length of the file discarding extra samples. +The decoder will see that the number of samples it has decoded in the +last page is too many; it is 'original' + 'extra', where the +granulepos says that through the last packet we only have 'original' +number of samples. The decoder then ignores the 'extra' samples. +This behavior is to occur only when the end-of-stream bit is set in +the page (indicating last page of the logical stream). + +Note that it not legal for the granule position of the last page to +indicate that there are more samples in the file than actually exist, +however, implementations should handle such an illegal file gracefully +in the interests of robust programming. + +Beginning point not on integral packet boundary: + +It is possible that we will the PCM data represented by a Vorbis +stream to begin at a position later than where the decoded PCM data +really begins after an integral packet boundary, a situation analagous +to the above description where the PCM data does not end at an +integral packet boundary. The easiest example is taking a clip out of +a larger Vorbis stream, and choosing a beginning point of the clip +that is not on a packet boundary; we need to ignore a few samples to +get the desired beginning point. + +The process of marking the desired beginning point is similar to +marking an arbitrary ending point. If the encoder wishes sample zero +to be some location past the actual beginning of data, it associates a +'short' granule position value with the completion of the second* +audio packet. The granule position is associated with the second +packet simply by making sure the second packet completes its page. + +*(We associate the short value with the second packet for two reasons. + a) The first packet only primes the overlap/add buffer. No data is + returned before decoding the second packet; this places the decision + information at the point of decision. b) Placing the short value on + the first packet would make the value negative (as the first packet + normally represents position zero); a negative value would break the + requirement that granule positions increase; the headers have + position values of zero) + +The decoder sees that on the first page that will return +data from the overlap/add queue, we have more samples than the granule +position accounts for, and discards the 'surplus' from the beginning +of the queue. + +Note that short granule values (indicating less than the actually +returned about of data) are not legal in the Vorbis spec outside of +indicating beginning and ending sample positions. However, decoders +should, at minimum, tolerate inadvertant short values elsewhere in the +stream (just as they should tolerate out-of-order/non-increasing +granulepos values, although this too is illegal). + +Beginning point at arbitrary positive timestamp (no 'zero' sample): + +It's also possible that the granule position of the first page of an +audio stream is a 'long value', that is, a value larger than the +amount of PCM audio decoded. This implies only that we are starting +playback at some point into the logical stream, a potentially common +occurence in streaming applications where the decoder may be +connecting into a live stream. The decoder should not treat the long +value specially. + +A long value elsewhere in the stream would normally occur only when a +page is lost or out of sequence, as indicated by the page's sequence +number. A long value under any other situation is not legal, however +a decoder should tolerate both possibilities. + + -- cgit v1.1