From 8e94244f86e657e4113e35438e59cf5771882b25 Mon Sep 17 00:00:00 2001 From: Aki Date: Sun, 3 Mar 2024 12:51:03 +0100 Subject: libogg and libvorbis are no longer part of this source tree --- contrib/ogg/doc/oggstream.html | 594 ----------------------------------------- 1 file changed, 594 deletions(-) delete mode 100644 contrib/ogg/doc/oggstream.html (limited to 'contrib/ogg/doc/oggstream.html') diff --git a/contrib/ogg/doc/oggstream.html b/contrib/ogg/doc/oggstream.html deleted file mode 100644 index 71bbce7..0000000 --- a/contrib/ogg/doc/oggstream.html +++ /dev/null @@ -1,594 +0,0 @@ - - - - - -Ogg Documentation - - - - - - -
- - - -

Ogg bitstream overview

- -

This document serves as starting point for understanding the design -and implementation of the Ogg container format. If you're new to Ogg -or merely want a high-level technical overview, start reading here. -Other documents linked from the index page -give distilled technical descriptions and references of the container -mechanisms. This document is intended to aid understanding. - -

Container format design points

- -

Ogg is intended to be a simplest-possible container, concerned only -with framing, ordering, and interleave. It can be used as a stream delivery -mechanism, for media file storage, or as a building block toward -implementing a more complex, non-linear container (for example, see -the Skeleton or Annodex/CMML). - -

The Ogg container is not intended to be a monolithic -'kitchen-sink'. It exists only to frame and deliver in-order stream -data and as such is vastly simpler than most other containers. -Elementary and multiplexed streams are both constructed entirely from a -single building block (an Ogg page) comprised of eight fields -totalling twenty-eight bytes (the page header) a list of packet lengths -(up to 255 bytes) and payload data (up to 65025 bytes). The structure -of every page is the same. There are no optional fields or alternate -encodings. - -

Stream and media metadata is contained in Ogg and not built into -the Ogg container itself. Metadata is thus compartmentalized and -layered rather than part of a monolithic design, an especially good -idea as no two groups seem able to agree on what a complete or -complete-enough metadata set should be. In this way, the container and -container implementation are isolated from unnecessary metadata design -flux. - -

Streaming

- -

The Ogg container is primarily a streaming format, -encapsulating chronological, time-linear mixed media into a single -delivery stream or file. The design is such that an application can -always encode and/or decode all features of a bitstream in one pass -with no seeking and minimal buffering. Seeking to provide optimized -encoding (such as two-pass encoding) or interactive decoding (such as -scrubbing or instant replay) is not disallowed or discouraged, however -no container feature requires nonlinear access of the bitstream. - -

Variable Bit Rate, Variable Payload Size

- -

Ogg is designed to contain any size data payload with bounded, -predictable efficiency. Ogg packets have no maximum size and a -zero-byte minimum size. There is no restriction on size changes from -packet to packet. Variable size packets do not require the use of any -optional or additional container features. There is no optimal -suggested packet size, though special consideration was paid to make -sure 50-200 byte packets were no less efficient than larger packet -sizes. The original design criteria was a 2% overhead at 50 byte -packets, dropping to a maximum working overhead of 1% with larger -packets, and a typical working overhead of .5-.7% for most practical -uses. - -

Simple pagination

- -

Ogg is a byte-aligned container with no context-dependent, optional -or variable-length fields. Ogg requires no repacking of codec data. -The page structure is written out in-line as packet data is submitted -to the streaming abstraction. In addition, it is possible to -implement both Ogg mux and demux as MT-hot zero-copy abstractions (as -is done in the Tremor sourcebase). - -

Capture

- -

Ogg is designed for efficient and immediate stream capture with -high confidence. Although packets have no size limit in Ogg, pages -are a maximum of just under 64kB meaning that any Ogg stream can be -captured with confidence after seeing 128kB of data or less [worst -case; typical figure is 6kB] from any random starting point in the -stream. - -

Seeking

- -

Ogg implements simple coarse- and fine-grained seeking by design. - -

Coarse seeking may be performed by simply 'moving the tone arm' to a -new position and 'dropping the needle'. Rapid capture with -accompanying timecode from any location in an Ogg file is guaranteed -by the stream design. From the acquisition of the first timecode, -all data needed to play back from that time code forward is ahead of -the stream cursor. - -

Ogg implements full sample-granularity seeking using an -interpolated bisection search built on the capture and timecode -mechanisms used by coarse seeking. As above, once a search finds -the desired timecode, all data needed to play back from that time code -forward is ahead of the stream cursor. - -

Both coarse and fine seeking use the page structure and sequencing -inherent to the Ogg format. All Ogg streams are fully seekable from -creation; seekability is unaffected by truncation or missing data, and -is tolerant of gross corruption. Seek operations are neither 'fuzzy' nor -heuristic. - -

Seeking without use of an index is a major point of the Ogg -design. There two primary reasons why Ogg transport forgoes an index: - -

    - -
  1. An index is only marginally useful in Ogg for the complexity -added; it adds no new functionality and seldom improves performance -noticeably. Empirical testing shows that indexless interpolation -search does not require many more seeks in practice than using an -index would. - -
  2. 'Optional' indexes encourage lazy implementations that can seek -only when indexes are present, or that implement indexless seeking -only by building an internal index after reading the entire file -beginning to end. This has been the fate of other containers that -specify optional indexing. - -
- -

In addition, it must be possible to create an Ogg stream in a -single pass. Although an optional index can simply be tacked on the -end of the created stream, some software groups object to -end-positioned indexes and claim to be unwilling to support indexes -not located at the stream beginning. - -

All this said, it's become clear that an optional index is a -demanded feature. For this reason, the OggSkeleton now defines a -proposed index. - -

Simple multiplexing

- -

Ogg multiplexes streams by interleaving pages from multiple elementary streams into a -multiplexed stream in time order. The multiplexed pages are not -altered. Muxing an Ogg AV stream out of separate audio, -video and data streams is akin to shuffling several decks of cards -together into a single deck; the cards themselves remain unchanged. -Demultiplexing is similarly simple (as the cards are marked). - -

The goal of this design is to make the mux/demux operation as -trivial as possible to allow live streaming systems to build and -rebuild streams on the fly with minimal CPU usage and no additional -storage or latency requirements. - -

Continuous and Discontinuous Media

- -

Ogg streams belong to one of two categories, "Continuous" streams and -"Discontinuous" streams. - -

A stream that provides a gapless, time-continuous media type with a -fine-grained timebase is considered to be 'Continuous'. A continuous -stream should never be starved of data. Examples of continuous data -types include broadcast audio and video. - -

A stream that delivers data in a potentially irregular pattern or -with widely spaced timing gaps is considered to be 'Discontinuous'. A -discontinuous stream may be best thought of as data representing -scattered events; although they happen in order, they are typically -unconnected data often located far apart. One example of a -discontinuous stream types would be captioning such as Ogg Kate. Although it's -possible to design captions as a continuous stream type, it's most -natural to think of captions as widely spaced pieces of text with -little happening between. - -

The fundamental reason for distinction between continuous and -discontinuous streams concerns buffering. - -

Buffering

- -

A continuous stream is, by definition, gapless. Ogg buffering is based -on the simple premise of never allowing an active continuous stream -to starve for data during decode; buffering works ahead until all -continuous streams in a physical stream have data ready and no further. - -

Discontinuous stream data is not assumed to be predictable. The -buffering design takes discontinuous data 'as it comes' rather than -working ahead to look for future discontinuous data for a potentially -unbounded period. Thus, the buffering process makes no attempt to fill -discontinuous stream buffers; their pages simply 'fall out' of the -stream when continuous streams are handled properly. - -

Buffering requirements in this design need not be explicitly -declared or managed in the encoded stream. The decoder simply reads as -much data as is necessary to keep all continuous stream types gapless -and no more, with discontinuous data processed as it arrives in the -continuous data. Buffering is implicitly optimal for the given -stream. Because all pages of all data types are stamped with absolute -timing information within the stream, inter-stream synchronization -timing is always maintained without the need for explicitly declared -buffer-ahead hinting. - -

Codec metadata

- -

Ogg does not replicate codec-specific metadata into the mux layer -in an attempt to make the mux and codec layer implementations 'fully -separable'. Things like specific timebase, keyframing strategy, frame -duration, etc, do not appear in the Ogg container. The mux layer is, -instead, expected to query a codec through a centralized interface, -left to the implementation, for this data when it is needed. - -

Though modern design wisdom usually prefers to predict all possible -needs of current and future codecs then embed these dependencies and -the required metadata into the container itself, this strategy -increases container specification complexity, fragility, and rigidity. -The mux and codec code becomes more independent, but the -specifications become logically less independent. A codec can't do -what a container hasn't already provided for. Novel codecs are harder -to support, and you can do fewer useful things with the ones you've -already got (eg, try to make a good splitter without using any codecs. -Such a splitter is limited to splitting at keyframes only, or building -yet another new mechanism into the container layer to mark what frames -to skip displaying). - -

Ogg's design goes the opposite direction, where the specification -is to be as simple, easy to understand, and 'proofed' against novel -codecs as possible. When an Ogg mux layer requires codec-specific -information, it queries the codec (or a codec stub). This trades a -more complex implementation for a simpler, more flexible -specification. - -

Stream structure metadata

- -

The Ogg container itself does not define a metadata system for -declaring the structure and interrelations between multiple media -types in a muxed stream. That is, the Ogg container itself does not -specify data like 'which steam is the subtitle stream?' or 'which -video stream is the primary angle?'. This metadata still exists, but -is stored by the Ogg container rather than being built into the Ogg -container itself. Xiph specifies the 'Skeleton' metadata format for Ogg -streams, but this decoupling of container and stream structure -metadata means it is possible to use Ogg with any metadata -specification without altering the container itself, or without stream -structure metadata at all. - -

Frame accurate absolute position

- -

Every Ogg page is stamped with a 64 bit 'granule position' that -serves as an absolute timestamp for mux and seeking. A few nifty -little tricks are usually also embedded in the granpos state, but -we'll leave those aside for the moment (strictly speaking, they're -part of each codec's mapping, not Ogg). - -

As previously mentioned above, granule positions are mapped into -absolute timestamps by the codec, rather than being a hard timestamp. -This allows maximally efficient use of the available 64 bits to -address every sample/frame position without approximation while -supporting new and previously unknown timebase encodings without -needing to extend or update the mux layer. When a codec needs a novel -timebase, it simply brings the code for that mapping along with it. -This is not a theoretical curiosity; new, wholly novel timebases were -deployed with the adoption of both Theora and Dirac. "Rolling INTRA" -(keyframeless video) also benefits from novel use of the granule -position. - -

Ogg stream arrangement

- -

Packets, pages, and bitstreams

- -

Ogg codecs place raw compressed data into packets. -Packets are octet payloads containing the data needed for a single -decompressed unit, eg, one video frame. Packets have no maximum size -and may be zero length. They do not generally have any framing -information; strung together, the unframed packets form a logical -bitstream of codec data with no internal landmarks. - -

- - -

Packets of raw codec data are not typically internally framed. - When they are strung together into a stream without any container to - provide framing, they lose their individual boundaries. Seek and - capture are not possible within an unframed stream, and for many - codecs with variable length payloads and/or early-packet termination - (such as Vorbis), it may become impossible to recover the original - frame boundaries even if the stream is scanned linearly from - beginning to end. - -

- -

Logical bitstream packets are grouped and framed into Ogg pages -along with a unique stream serial number to produce a -physical bitstream. An elementary stream is a -physical bitstream containing only a single logical bitstream. Each -page is a self contained entity, although a packet may be split and -encoded across one or more pages. The page decode mechanism is -designed to recognize, verify and handle single pages at a time from -the overall bitstream. - -

- - -

The primary purpose of a container is to provide framing for raw - packets, marking the packet boundaries so the exact packets can be - retrieved for decode later. The container also provides secondary - functions such as capture, timestamping, sequencing, stream - identification and so on. Not all of these functions are represented in the diagram. - -

In the Ogg container, pages do not necessarily contain - integer numbers of packets. Packets may span across page boundaries - or even multiple pages. This is necessary as pages have a maximum - possible size in order to provide capture guarantees, but packet - size is unbounded. -

- - -

Ogg Bitstream Framing specifies -the page format of an Ogg bitstream, the packet coding process -and elementary bitstreams in detail. - -

Multiplexed bitstreams

- -

Multiple logical/elementary bitstreams can be combined into a single -multiplexed bitstream by interleaving whole pages from each -contributing elementary stream in time order. The result is a single -physical stream that multiplexes and frames multiple logical streams. -Each logical stream is identified by the unique stream serial number -stamped in its pages. A physical stream may include a 'meta-header' -(such as the Ogg Skeleton) comprising its -own Ogg page at the beginning of the physical stream. A decoder -recovers the original logical/elementary bitstreams out of the -physical bitstream by taking the pages in order from the physical -bitstream and redirecting them into the appropriate logical decoding -entity. - -

- - -

Multiple media types are mutliplexed into a single Ogg stream by -interleaving the pages from each elementary physical stream. - -

- -

Ogg Bitstream Multiplexing specifies -proper multiplexing of an Ogg bitstream in detail. - -

Chaining

- -

Multiple Ogg physical bitstreams may be concatenated into a single new -stream; this is chaining. The bitstreams do not overlap; the -final page of a given logical bitstream is immediately followed by the -initial page of the next.

- -

Each logical bitstream in a chain must have a unique serial number -within the scope of the full physical bitstream, not only within a -particular link or segment of the chain.

- -

Continuous and discontinuous streams

- -

Within Ogg, each stream must be declared (by the codec) to be -continuous- or discontinuous-time. Most codecs treat all streams they -use as either inherently continuous- or discontinuous-time, although -this is not a requirement. A codec may, as part of its mapping, choose -according to data in the initial header. - -

Continuous-time pages are stamped by end-time, discontinuous pages -are stamped by begin-time. Pages in a multiplexed stream are -interleaved in order of the time stamp regardless of stream type. -Both continuous and discontinuous logical streams are used to seek -within a physical stream, however only continuous streams are used to -determine buffering depth; because discontinuous streams are stamped -by start time, they will always 'fall out' at the proper time when -buffering the continuous streams. See 'Examples' for an illustration -of the buffering mechanism. - -

Multiplexing Requirements

- -

Multiplexing requirements within Ogg are straightforward. When -constructing a single-link (unchained) physical bitstream consisting -of multiple elementary streams: - -

    - -
  1. The initial header for each stream appears in sequence, each -header on a single page. All initial headers must appear with no -intervening data (no auxiliary header pages or packets, no data pages -or packets). Order of the initial headers is unspecified. The -'beginning of stream' flag is set on each initial header. - -

  2. All auxiliary headers for all streams must follow. Order -is unspecified. The final auxiliary header of each stream must flush -its page. - -

  3. Data pages for each stream follow, interleaved in time order. - -

  4. The final page of each stream sets the 'end of stream' flag. -Unlike initial pages, terminal pages for the logical bitstreams need -not occur contiguously; indeed it may not be possible for them to do so. -

- -

Each grouped bitstream must have a unique serial number within the -scope of the physical bitstream.

- -

chaining and multiplexing

- -

Multiplexed and/or unmultiplexed bitstreams may be chained -consecutively. Such a physical bitstream obeys all the rules of both -chained and multiplexed streams. Each link, when unchained, must -stand on its own as a valid physical bitstream. Chained streams do -not mix or interleave; a new segment may not begin until all streams -in the preceding segment have terminated.

- -

Codec Mapping Requirements

- -

Each codec is allowed some freedom in deciding how its logical -bitstream is encapsulated into an Ogg bitstream (even if it is a -trivial mapping, eg, 'plop the packets in and go'). This is the -codec's mapping. Ogg imposes a few mapping requirements -on any codec. - -

    - -
  1. The framing specification defines -'beginning of stream' and 'end of stream' page markers via a header -flag (it is possible for a stream to consist of a single page). A -correct stream always consists of an integer number of pages, an easy -requirement given the variable size nature of pages.

    - -
  2. The first page of an elementary Ogg bitstream consists of a single, -small 'initial header' packet that must include sufficient information -to identify the exact CODEC type. From this initial header, the codec -must also be able to determine its timebase and whether or not it is a -continuous- or discontinuous-time stream. The initial header must fit -on a single page. If a codec makes use of auxiliary headers (for -example, Vorbis uses two auxiliary headers), these headers must follow -the initial header immediately. The last header finishes its page; -data begins on a fresh page. - -

    As an example, Ogg Vorbis places the name and revision of the -Vorbis CODEC, the audio rate and the audio quality into this initial -header. Vorbis comments and detailed codec setup appears in the larger -auxiliary headers.

    - -
  3. Granule positions must be translatable to an exact absolute -time value. As described above, the mux layer is permitted to query a -codec or codec stub plugin to perform this mapping. It is not -necessary for an absolute time to be mappable into a single unique -granule position value. - -

  4. Codecs are not required to use a fixed duration-per-packet (for -example, Vorbis does not). the mux layer is permitted to query a -codec or codec stub plugin for the time duration of a packet. - -

  5. Although an absolute time need not be translatable to a unique -granule position, a codec must be able to determine the unique granule -position of the current packet using the granule position of a -preceeding packet. - -

  6. Packets and pages must be arranged in ascending -granule-position and time order. - -

- -

Examples

- -[More to come shortly; this section is currently being revised and expanded] - -

Below, we present an example of a multiplexed and chained bitstream:

- -

stream

- -

In this example, we see pages from five total logical bitstreams -multiplexed into a physical bitstream. Note the following -characteristics:

- -
    -
  1. Multiplexed bitstreams in a given link begin together; all of the -initial pages must appear before any data pages. When concurrently -multiplexed groups are chained, the new group does not begin until all -the bitstreams in the previous group have terminated.
  2. - -
  3. The ordering of pages of concurrently multiplexed bitstreams is -goverened by timestamp (not shown here); there is no regular -interleaving order. Pages within a logical bitstream appear in -sequence order.
  4. -
- - - -
- - -- cgit v1.1