From 8f3471999e929bb99116fac52b94d572c42ba15e Mon Sep 17 00:00:00 2001 From: Aki Date: Wed, 29 Sep 2021 22:52:15 +0200 Subject: Squashed 'ogg/' content from commit 4380566a4 git-subtree-dir: ogg git-subtree-split: 4380566a44b8d5e85ad511c9c17eb04197863ec5 --- doc/ogg-multiplex.html | 446 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 446 insertions(+) create mode 100644 doc/ogg-multiplex.html (limited to 'doc/ogg-multiplex.html') diff --git a/doc/ogg-multiplex.html b/doc/ogg-multiplex.html new file mode 100644 index 0000000..bd08e25 --- /dev/null +++ b/doc/ogg-multiplex.html @@ -0,0 +1,446 @@ + + + + + +Ogg Documentation + + + + + + + + + +

Page Multiplexing and Ordering in a Physical Ogg Stream

+ +

The low-level mechanisms of an Ogg stream (as described in the Ogg +Bitstream Overview) provide means for mixing multiple logical streams +and media types into a single linear-chronological stream. This +document specifies the high-level arrangement and use of page +structure to multiplex multiple streams of mixed media type within a +physical Ogg stream.

+ +

Design Elements

+ +

The design and arrangement of the Ogg container format is governed by +several high-level design decisions that form the reasoning behind +specific low-level design decisions.

+ +

Linear media

+ +

The Ogg bitstream is intended to encapsulate chronological, +time-linear mixed media into a single delivery stream or file. The +design is such that an application can always encode and/or decode a +full-featured bitstream in one pass with no seeking and minimal +buffering. Seeking to provide optimized encoding (such as two-pass +encoding) or interactive decoding (such as scrubbing or instant +replay) is not disallowed or discouraged, however no bitstream feature +must require nonlinear operation on the bitstream.

+ +

Multiplexing

+ +

Ogg bitstreams multiplex multiple logical streams into a single +physical stream at the page level. Each page contains an abstract +time stamp (the Granule Position) that represents an absolute time +landmark within the stream. After the pages representing stream +headers (all logical stream headers occur at the beginning of a +physical bitstream section before any logical stream data), logical +stream data pages are arranged in a physical bitstream in strict +non-decreasing order by chronological absolute time as +specified by the granule position.

+ +

The only exception to arranging pages in strictly ascending time order +by granule position is those pages that do not set the granule +position value. This is a special case when exceptionally large +packets span multiple pages; the specifics of handling this special +case are described later under 'Continuous and Discontinuous +Streams'.

+ +

Seeking

+ +

Ogg is designed to use an interpolated bisection search to +implement exact positional seeking. Interpolated bisection search is +a spec-mandated mechanism.

+ +

An index may improve objective performance, but it seldom +improves subjective performance outside of a few high-latency use +cases and adds no additional functionality as bisection search +delivers the same functionality for both one- and two-pass stream +types. For these reasons, use of indexes is discouraged, except in +cases where an index provides demonstrable and noticable performance +improvement.

+ +

Seek operations are by absolute time; a direct bisection search must +find the exact time position requested. Information in the Ogg +bitstream is arranged such that all information to be presented for +playback from the desired seek point will occur at or after the +desired seek point. Seek operations are neither 'fuzzy' nor +heuristic.

+ +

Although key frame handling in video appears to be an exception to +"all needed playback information lies ahead of a given seek", +key frames can still be handled directly within this indexless +framework. Seeking to a key frame in video (as well as seeking in other +media types with analogous restraints) is handled as two seeks; first +a seek to the desired time which extracts state information that +decodes to the time of the last key frame, followed by a second seek +directly to the key frame. The location of the previous key frame is +embedded as state information in the granulepos; this mechanism is +described in more detail later.

+ +

Continuous and Discontinuous Streams

+ +

Logical streams within a physical Ogg stream belong to one of two +categories, "Continuous" streams and "Discontinuous" streams. +Although these are discussed in more detail later, the distinction is +important to a high-level understanding of how to buffer an Ogg +stream.

+ +

A stream that provides a gapless, time-continuous media type with a +fine-grained timebase is considered to be 'Continuous'. A continuous +stream should never be starved of data. Clear examples of continuous +data types include broadcast audio and video.

+ +

A stream that delivers data in a potentially irregular pattern or with +widely spaced timing gaps is considered to be 'Discontinuous'. A +discontinuous stream may be best thought of as data representing +scattered events; although they happen in order, they are typically +unconnected data often located far apart. One possible example of a +discontinuous stream types would be captioning. Although it's +possible to design captions as a continuous stream type, it's most +natural to think of captions as widely spaced pieces of text with +little happening between.

+ +

The fundamental design distinction between continuous and +discontinuous streams concerns buffering.

+ +

Buffering

+ +

Because a continuous stream is, by definition, gapless, Ogg buffering +is based on the simple premise of never allowing any active continuous +stream to starve for data during decode; buffering proceeds ahead +until all continuous streams in a physical stream have data ready to +decode on demand.

+ +

Discontinuous stream data may occur on a fairly regular basis, but the +timing of, for example, a specific caption is impossible to predict +with certainty in most captioning systems. Thus the buffering system +should take discontinuous data 'as it comes' rather than working ahead +(for a potentially unbounded period) to look for future discontinuous +data. As such, discontinuous streams are ignored when managing +buffering; their pages simply 'fall out' of the stream when continuous +streams are handled properly.

+ +

Buffering requirements need not be explicitly declared or managed for +the encoded stream; the decoder simply reads as much data as is +necessary to keep all continuous stream types gapless (also ensuring +discontinuous data arrives in time) and no more, resulting in optimum +implicit buffer usage for a given stream. Because all pages of all +data types are stamped with absolute timing information within the +stream, inter-stream synchronization timing is always explicitly +maintained without the need for explicitly declared buffer-ahead +hinting.

+ +

Further details, mechanisms and reasons for the differing arrangement +and behavior of continuous and discontinuous streams is discussed +later.

+ +

Whole-stream navigation

+ +

Ogg is designed so that the simplest navigation operations treat the +physical Ogg stream as a whole summary of its streams, rather than +navigating each interleaved stream as a separate entity.

+ +

First Example: seeking to a desired time position in a multiplexed (or +unmultiplexed) Ogg stream can be accomplished through a bisection +search on time position of all pages in the stream (as encoded in the +granule position). More powerful searches (such as a key frame-aware +seek within video) are also possible with additional search +complexity, but similar computational complexity.

+ +

Second Example: A bitstream section may consist of three multiplexed +streams of differing lengths. The result of multiplexing these +streams should be thought of as a single mixed stream with a length +equal to the longest of the three component streams. Although it is +also possible to think of the multiplexed results as three concurrent +streams of different lengths and it is possible to recover the three +original streams, it will also become obvious that once multiplexed, +it isn't possible to find the internal lengths of the component +streams without a linear search of the whole bitstream section. +However, it is possible to find the length of the whole bitstream +section easily (in near-constant time per section) just as it is for a +single-media unmultiplexed stream.

+ +

Granule Position

+ +

Description

+ +

The Granule Position is a signed 64 bit field appearing in the header +of every Ogg page. Although the granule position represents absolute +time within a logical stream, its value does not necessarily directly +encode a simple timestamp. It may represent frames elapsed (as in +Vorbis), a simple timestamp, or a more complex bit-division encoding +(such as in Theora). The exact encoding of the granule position is up +to a specific codec.

+ +

The granule position is governed by the following rules:

+ + + +

Example: timestamp

+ +

In general, a codec/stream type should choose the simplest granule +position encoding that addresses its requirements. The examples here +are by no means exhaustive of the possibilities within Ogg.

+ +

A simple granule position could encode a timestamp directly. For +example, a granule position that encoded milliseconds from beginning +of stream would allow a logical stream length of over 100,000,000,000 +days before beginning a new logical stream (to avoid the granule +position wrapping).

+ +

Example: framestamp

+ +

A simple millisecond timestamp granule encoding might suit many stream +types, but a millisecond resolution is inappropriate to, eg, most +audio encodings where exact single-sample resolution is generally a +requirement. A millisecond is both too large a granule and often does +not represent an integer number of samples.

+ +

In the event that audio frames are always encoded as the same number of +samples, the granule position could simply be a linear count of frames +since beginning of stream. This has the advantages of being exact and +efficient. Position in time would simply be [granule_position] * +[samples_per_frame] / [samples_per_second].

+ +

Example: samplestamp (Vorbis)

+ +

Frame counting is insufficient in codecs such as Vorbis where an audio +frame [packet] encodes a variable number of samples. In Vorbis's +case, the granule position is a count of the number of raw samples +from the beginning of stream; the absolute time of +a granule position is [granule_position] / +[samples_per_second].

+ +

Example: bit-divided framestamp (Theora)

+ +

Some video codecs may be able to use the simple framestamp scheme for +granule position. However, most modern video codecs introduce at +least the following complications:

+ + + +

The first two points can be handled straightforwardly via the fact +that the codec has complete control mapping granule position to +absolute time; non-integer frame rates and offsets can be set in the +codec's initial header, and the rest is just arithmetic.

+ +

The third point appears trickier at first glance, but it too can be +handled through the granule position mapping mechanism. Here we +arrange the granule position in such a way that granule positions of +key frames are easy to find. Divide the granule position into two +fields; the most-significant bits are an absolute frame counter, but +it's only updated at each key frame. The least significant bits encode +the number of frames since the last key frame. In this way, each +granule position both encodes the absolute time of the current frame +as well as the absolute time of the last key frame.

+ +

Seeking to a most recent preceding key frame is then accomplished by +first seeking to the original desired point, inspecting the granulepos +of the resulting video page, extracting from that granulepos the +absolute time of the desired key frame, and then seeking directly to +that key frame's page. Of course, it's still possible for an +application to ignore key frames and use a simpler seeking algorithm +(decode would be unable to present decoded video until the next +key frame). Surprisingly many player applications do choose the +simpler approach.

+ +

granule position, packets and pages

+ +

Although each packet of data in a logical stream theoretically has a +specific granule position, only one granule position is encoded +per page. It is possible to encode a logical stream such that each +page contains only a single packet (so that granule positions are +preserved for each packet), however a one-to-one packet/page mapping +is not intended to be the general case.

+ +

Because Ogg functions at the page, not packet, level, this +once-per-page time information provides Ogg with the finest-grained +time information is can use. Ogg passes this granule positioning data +to the codec (along with the packets extracted from a page); it is the +responsibility of codecs to track timing information at granularities +finer than a single page.

+ +

start-time and end-time positioning

+ +

A granule position represents the instantaneous time location +between two pages. However, continuous streams and discontinuous +streams differ on whether the granulepos represents the end-time of +the data on a page or the start-time. Continuous streams are +'end-time' encoded; the granulepos represents the point in time +immediately after the last data decoded from a page. Discontinuous +streams are 'start-time' encoded; the granulepos represents the point +in time of the first data decoded from the page.

+ +

An Ogg stream type is declared continuous or discontinuous by its +codec. A given codec may support both continuous and discontinuous +operation so long as any given logical stream is continuous or +discontinuous for its entirety and the codec is able to ascertain (and +inform the Ogg layer) as to which after decoding the initial stream +header. The majority of codecs will always be continuous (such as +Vorbis) or discontinuous (such as Writ).

+ +

Start- and end-time encoding do not affect multiplexing sort-order; +pages are still sorted by the absolute time a given granulepos maps to +regardless of whether that granulepos represents start- or +end-time.

+ +

Multiplex/Demultiplex Division of Labor

+ +

The Ogg multiplex/demultiplex layer provides mechanisms for encoding +raw packets into Ogg pages, decoding Ogg pages back into the original +codec packets, determining the logical structure of an Ogg stream, and +navigating through and synchronizing with an Ogg stream at a desired +stream location. Strict multiplex/demultiplex operations are entirely +in the Ogg domain and require no intervention from codecs.

+ +

Implementation of more complex operations does require codec +knowledge, however. Unlike other framing systems, Ogg maintains +strict separation between framing and the framed bitstream data; Ogg +does not replicate codec-specific information in the page/framing +data, nor does Ogg blur the line between framing and stream +data/metadata. Because Ogg is fully data-agnostic toward the data it +frames, operations which require specifics of bitstream data (such as +'seek to key frame') also require interaction with the codec layer +(because, in this example, the Ogg layer is not aware of the concept +of key frames). This is different from systems that blur the +separation between framing and stream data in order to simplify the +separation of code. The Ogg system purposely keeps the distinction in +data simple so that later codec innovations are not constrained by +framing design.

+ +

For this reason, however, complex seeking operations require +interaction with the codecs in order to decode the granule position of +a given stream type back to absolute time or in order to find +'decodable points' such as key frames in video.

+ +

Unsorted Discussion Points

+ +

flushes around key frames? RFC suggestion: repaginating or building a +stream this way is nice but not required

+ +

Appendix A: multiplexing examples

+ + + + + -- cgit v1.1