From 8e94244f86e657e4113e35438e59cf5771882b25 Mon Sep 17 00:00:00 2001 From: Aki Date: Sun, 3 Mar 2024 12:51:03 +0100 Subject: libogg and libvorbis are no longer part of this source tree --- contrib/ogg/doc/ogg-multiplex.html | 446 ------------------------------------- 1 file changed, 446 deletions(-) delete mode 100644 contrib/ogg/doc/ogg-multiplex.html (limited to 'contrib/ogg/doc/ogg-multiplex.html') diff --git a/contrib/ogg/doc/ogg-multiplex.html b/contrib/ogg/doc/ogg-multiplex.html deleted file mode 100644 index bd08e25..0000000 --- a/contrib/ogg/doc/ogg-multiplex.html +++ /dev/null @@ -1,446 +0,0 @@ - - - - - -Ogg Documentation - - - - - - - - - -

Page Multiplexing and Ordering in a Physical Ogg Stream

- -

The low-level mechanisms of an Ogg stream (as described in the Ogg -Bitstream Overview) provide means for mixing multiple logical streams -and media types into a single linear-chronological stream. This -document specifies the high-level arrangement and use of page -structure to multiplex multiple streams of mixed media type within a -physical Ogg stream.

- -

Design Elements

- -

The design and arrangement of the Ogg container format is governed by -several high-level design decisions that form the reasoning behind -specific low-level design decisions.

- -

Linear media

- -

The Ogg bitstream is intended to encapsulate chronological, -time-linear mixed media into a single delivery stream or file. The -design is such that an application can always encode and/or decode a -full-featured bitstream in one pass with no seeking and minimal -buffering. Seeking to provide optimized encoding (such as two-pass -encoding) or interactive decoding (such as scrubbing or instant -replay) is not disallowed or discouraged, however no bitstream feature -must require nonlinear operation on the bitstream.

- -

Multiplexing

- -

Ogg bitstreams multiplex multiple logical streams into a single -physical stream at the page level. Each page contains an abstract -time stamp (the Granule Position) that represents an absolute time -landmark within the stream. After the pages representing stream -headers (all logical stream headers occur at the beginning of a -physical bitstream section before any logical stream data), logical -stream data pages are arranged in a physical bitstream in strict -non-decreasing order by chronological absolute time as -specified by the granule position.

- -

The only exception to arranging pages in strictly ascending time order -by granule position is those pages that do not set the granule -position value. This is a special case when exceptionally large -packets span multiple pages; the specifics of handling this special -case are described later under 'Continuous and Discontinuous -Streams'.

- -

Seeking

- -

Ogg is designed to use an interpolated bisection search to -implement exact positional seeking. Interpolated bisection search is -a spec-mandated mechanism.

- -

An index may improve objective performance, but it seldom -improves subjective performance outside of a few high-latency use -cases and adds no additional functionality as bisection search -delivers the same functionality for both one- and two-pass stream -types. For these reasons, use of indexes is discouraged, except in -cases where an index provides demonstrable and noticable performance -improvement.

- -

Seek operations are by absolute time; a direct bisection search must -find the exact time position requested. Information in the Ogg -bitstream is arranged such that all information to be presented for -playback from the desired seek point will occur at or after the -desired seek point. Seek operations are neither 'fuzzy' nor -heuristic.

- -

Although key frame handling in video appears to be an exception to -"all needed playback information lies ahead of a given seek", -key frames can still be handled directly within this indexless -framework. Seeking to a key frame in video (as well as seeking in other -media types with analogous restraints) is handled as two seeks; first -a seek to the desired time which extracts state information that -decodes to the time of the last key frame, followed by a second seek -directly to the key frame. The location of the previous key frame is -embedded as state information in the granulepos; this mechanism is -described in more detail later.

- -

Continuous and Discontinuous Streams

- -

Logical streams within a physical Ogg stream belong to one of two -categories, "Continuous" streams and "Discontinuous" streams. -Although these are discussed in more detail later, the distinction is -important to a high-level understanding of how to buffer an Ogg -stream.

- -

A stream that provides a gapless, time-continuous media type with a -fine-grained timebase is considered to be 'Continuous'. A continuous -stream should never be starved of data. Clear examples of continuous -data types include broadcast audio and video.

- -

A stream that delivers data in a potentially irregular pattern or with -widely spaced timing gaps is considered to be 'Discontinuous'. A -discontinuous stream may be best thought of as data representing -scattered events; although they happen in order, they are typically -unconnected data often located far apart. One possible example of a -discontinuous stream types would be captioning. Although it's -possible to design captions as a continuous stream type, it's most -natural to think of captions as widely spaced pieces of text with -little happening between.

- -

The fundamental design distinction between continuous and -discontinuous streams concerns buffering.

- -

Buffering

- -

Because a continuous stream is, by definition, gapless, Ogg buffering -is based on the simple premise of never allowing any active continuous -stream to starve for data during decode; buffering proceeds ahead -until all continuous streams in a physical stream have data ready to -decode on demand.

- -

Discontinuous stream data may occur on a fairly regular basis, but the -timing of, for example, a specific caption is impossible to predict -with certainty in most captioning systems. Thus the buffering system -should take discontinuous data 'as it comes' rather than working ahead -(for a potentially unbounded period) to look for future discontinuous -data. As such, discontinuous streams are ignored when managing -buffering; their pages simply 'fall out' of the stream when continuous -streams are handled properly.

- -

Buffering requirements need not be explicitly declared or managed for -the encoded stream; the decoder simply reads as much data as is -necessary to keep all continuous stream types gapless (also ensuring -discontinuous data arrives in time) and no more, resulting in optimum -implicit buffer usage for a given stream. Because all pages of all -data types are stamped with absolute timing information within the -stream, inter-stream synchronization timing is always explicitly -maintained without the need for explicitly declared buffer-ahead -hinting.

- -

Further details, mechanisms and reasons for the differing arrangement -and behavior of continuous and discontinuous streams is discussed -later.

- -

Whole-stream navigation

- -

Ogg is designed so that the simplest navigation operations treat the -physical Ogg stream as a whole summary of its streams, rather than -navigating each interleaved stream as a separate entity.

- -

First Example: seeking to a desired time position in a multiplexed (or -unmultiplexed) Ogg stream can be accomplished through a bisection -search on time position of all pages in the stream (as encoded in the -granule position). More powerful searches (such as a key frame-aware -seek within video) are also possible with additional search -complexity, but similar computational complexity.

- -

Second Example: A bitstream section may consist of three multiplexed -streams of differing lengths. The result of multiplexing these -streams should be thought of as a single mixed stream with a length -equal to the longest of the three component streams. Although it is -also possible to think of the multiplexed results as three concurrent -streams of different lengths and it is possible to recover the three -original streams, it will also become obvious that once multiplexed, -it isn't possible to find the internal lengths of the component -streams without a linear search of the whole bitstream section. -However, it is possible to find the length of the whole bitstream -section easily (in near-constant time per section) just as it is for a -single-media unmultiplexed stream.

- -

Granule Position

- -

Description

- -

The Granule Position is a signed 64 bit field appearing in the header -of every Ogg page. Although the granule position represents absolute -time within a logical stream, its value does not necessarily directly -encode a simple timestamp. It may represent frames elapsed (as in -Vorbis), a simple timestamp, or a more complex bit-division encoding -(such as in Theora). The exact encoding of the granule position is up -to a specific codec.

- -

The granule position is governed by the following rules:

- - - -

Example: timestamp

- -

In general, a codec/stream type should choose the simplest granule -position encoding that addresses its requirements. The examples here -are by no means exhaustive of the possibilities within Ogg.

- -

A simple granule position could encode a timestamp directly. For -example, a granule position that encoded milliseconds from beginning -of stream would allow a logical stream length of over 100,000,000,000 -days before beginning a new logical stream (to avoid the granule -position wrapping).

- -

Example: framestamp

- -

A simple millisecond timestamp granule encoding might suit many stream -types, but a millisecond resolution is inappropriate to, eg, most -audio encodings where exact single-sample resolution is generally a -requirement. A millisecond is both too large a granule and often does -not represent an integer number of samples.

- -

In the event that audio frames are always encoded as the same number of -samples, the granule position could simply be a linear count of frames -since beginning of stream. This has the advantages of being exact and -efficient. Position in time would simply be [granule_position] * -[samples_per_frame] / [samples_per_second].

- -

Example: samplestamp (Vorbis)

- -

Frame counting is insufficient in codecs such as Vorbis where an audio -frame [packet] encodes a variable number of samples. In Vorbis's -case, the granule position is a count of the number of raw samples -from the beginning of stream; the absolute time of -a granule position is [granule_position] / -[samples_per_second].

- -

Example: bit-divided framestamp (Theora)

- -

Some video codecs may be able to use the simple framestamp scheme for -granule position. However, most modern video codecs introduce at -least the following complications:

- - - -

The first two points can be handled straightforwardly via the fact -that the codec has complete control mapping granule position to -absolute time; non-integer frame rates and offsets can be set in the -codec's initial header, and the rest is just arithmetic.

- -

The third point appears trickier at first glance, but it too can be -handled through the granule position mapping mechanism. Here we -arrange the granule position in such a way that granule positions of -key frames are easy to find. Divide the granule position into two -fields; the most-significant bits are an absolute frame counter, but -it's only updated at each key frame. The least significant bits encode -the number of frames since the last key frame. In this way, each -granule position both encodes the absolute time of the current frame -as well as the absolute time of the last key frame.

- -

Seeking to a most recent preceding key frame is then accomplished by -first seeking to the original desired point, inspecting the granulepos -of the resulting video page, extracting from that granulepos the -absolute time of the desired key frame, and then seeking directly to -that key frame's page. Of course, it's still possible for an -application to ignore key frames and use a simpler seeking algorithm -(decode would be unable to present decoded video until the next -key frame). Surprisingly many player applications do choose the -simpler approach.

- -

granule position, packets and pages

- -

Although each packet of data in a logical stream theoretically has a -specific granule position, only one granule position is encoded -per page. It is possible to encode a logical stream such that each -page contains only a single packet (so that granule positions are -preserved for each packet), however a one-to-one packet/page mapping -is not intended to be the general case.

- -

Because Ogg functions at the page, not packet, level, this -once-per-page time information provides Ogg with the finest-grained -time information is can use. Ogg passes this granule positioning data -to the codec (along with the packets extracted from a page); it is the -responsibility of codecs to track timing information at granularities -finer than a single page.

- -

start-time and end-time positioning

- -

A granule position represents the instantaneous time location -between two pages. However, continuous streams and discontinuous -streams differ on whether the granulepos represents the end-time of -the data on a page or the start-time. Continuous streams are -'end-time' encoded; the granulepos represents the point in time -immediately after the last data decoded from a page. Discontinuous -streams are 'start-time' encoded; the granulepos represents the point -in time of the first data decoded from the page.

- -

An Ogg stream type is declared continuous or discontinuous by its -codec. A given codec may support both continuous and discontinuous -operation so long as any given logical stream is continuous or -discontinuous for its entirety and the codec is able to ascertain (and -inform the Ogg layer) as to which after decoding the initial stream -header. The majority of codecs will always be continuous (such as -Vorbis) or discontinuous (such as Writ).

- -

Start- and end-time encoding do not affect multiplexing sort-order; -pages are still sorted by the absolute time a given granulepos maps to -regardless of whether that granulepos represents start- or -end-time.

- -

Multiplex/Demultiplex Division of Labor

- -

The Ogg multiplex/demultiplex layer provides mechanisms for encoding -raw packets into Ogg pages, decoding Ogg pages back into the original -codec packets, determining the logical structure of an Ogg stream, and -navigating through and synchronizing with an Ogg stream at a desired -stream location. Strict multiplex/demultiplex operations are entirely -in the Ogg domain and require no intervention from codecs.

- -

Implementation of more complex operations does require codec -knowledge, however. Unlike other framing systems, Ogg maintains -strict separation between framing and the framed bitstream data; Ogg -does not replicate codec-specific information in the page/framing -data, nor does Ogg blur the line between framing and stream -data/metadata. Because Ogg is fully data-agnostic toward the data it -frames, operations which require specifics of bitstream data (such as -'seek to key frame') also require interaction with the codec layer -(because, in this example, the Ogg layer is not aware of the concept -of key frames). This is different from systems that blur the -separation between framing and stream data in order to simplify the -separation of code. The Ogg system purposely keeps the distinction in -data simple so that later codec innovations are not constrained by -framing design.

- -

For this reason, however, complex seeking operations require -interaction with the codecs in order to decode the granule position of -a given stream type back to absolute time or in order to find -'decodable points' such as key frames in video.

- -

Unsorted Discussion Points

- -

flushes around key frames? RFC suggestion: repaginating or building a -stream this way is nice but not required

- -

Appendix A: multiplexing examples

- - - - - -- cgit v1.1