diff options
Diffstat (limited to 'zlib/doc/rfc1952.txt')
-rw-r--r-- | zlib/doc/rfc1952.txt | 675 |
1 files changed, 0 insertions, 675 deletions
diff --git a/zlib/doc/rfc1952.txt b/zlib/doc/rfc1952.txt deleted file mode 100644 index a8e51b4..0000000 --- a/zlib/doc/rfc1952.txt +++ /dev/null @@ -1,675 +0,0 @@ - - - - - - -Network Working Group P. Deutsch -Request for Comments: 1952 Aladdin Enterprises -Category: Informational May 1996 - - - GZIP file format specification version 4.3 - -Status of This Memo - - This memo provides information for the Internet community. This memo - does not specify an Internet standard of any kind. Distribution of - this memo is unlimited. - -IESG Note: - - The IESG takes no position on the validity of any Intellectual - Property Rights statements contained in this document. - -Notices - - Copyright (c) 1996 L. Peter Deutsch - - Permission is granted to copy and distribute this document for any - purpose and without charge, including translations into other - languages and incorporation into compilations, provided that the - copyright notice and this notice are preserved, and that any - substantive changes or deletions from the original are clearly - marked. - - A pointer to the latest version of this and related documentation in - HTML format can be found at the URL - <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>. - -Abstract - - This specification defines a lossless compressed data format that is - compatible with the widely used GZIP utility. The format includes a - cyclic redundancy check value for detecting data corruption. The - format presently uses the DEFLATE method of compression but can be - easily extended to use other compression methods. The format can be - implemented readily in a manner not covered by patents. - - - - - - - - - - -Deutsch Informational [Page 1] - -RFC 1952 GZIP File Format Specification May 1996 - - -Table of Contents - - 1. Introduction ................................................... 2 - 1.1. Purpose ................................................... 2 - 1.2. Intended audience ......................................... 3 - 1.3. Scope ..................................................... 3 - 1.4. Compliance ................................................ 3 - 1.5. Definitions of terms and conventions used ................. 3 - 1.6. Changes from previous versions ............................ 3 - 2. Detailed specification ......................................... 4 - 2.1. Overall conventions ....................................... 4 - 2.2. File format ............................................... 5 - 2.3. Member format ............................................. 5 - 2.3.1. Member header and trailer ........................... 6 - 2.3.1.1. Extra field ................................... 8 - 2.3.1.2. Compliance .................................... 9 - 3. References .................................................. 9 - 4. Security Considerations .................................... 10 - 5. Acknowledgements ........................................... 10 - 6. Author's Address ........................................... 10 - 7. Appendix: Jean-Loup Gailly's gzip utility .................. 11 - 8. Appendix: Sample CRC Code .................................. 11 - -1. Introduction - - 1.1. Purpose - - The purpose of this specification is to define a lossless - compressed data format that: - - * Is independent of CPU type, operating system, file system, - and character set, and hence can be used for interchange; - * Can compress or decompress a data stream (as opposed to a - randomly accessible file) to produce another data stream, - using only an a priori bounded amount of intermediate - storage, and hence can be used in data communications or - similar structures such as Unix filters; - * Compresses data with efficiency comparable to the best - currently available general-purpose compression methods, - and in particular considerably better than the "compress" - program; - * Can be implemented readily in a manner not covered by - patents, and hence can be practiced freely; - * Is compatible with the file format produced by the current - widely used gzip utility, in that conforming decompressors - will be able to read data produced by the existing gzip - compressor. - - - - -Deutsch Informational [Page 2] - -RFC 1952 GZIP File Format Specification May 1996 - - - The data format defined by this specification does not attempt to: - - * Provide random access to compressed data; - * Compress specialized data (e.g., raster graphics) as well as - the best currently available specialized algorithms. - - 1.2. Intended audience - - This specification is intended for use by implementors of software - to compress data into gzip format and/or decompress data from gzip - format. - - The text of the specification assumes a basic background in - programming at the level of bits and other primitive data - representations. - - 1.3. Scope - - The specification specifies a compression method and a file format - (the latter assuming only that a file can store a sequence of - arbitrary bytes). It does not specify any particular interface to - a file system or anything about character sets or encodings - (except for file names and comments, which are optional). - - 1.4. Compliance - - Unless otherwise indicated below, a compliant decompressor must be - able to accept and decompress any file that conforms to all the - specifications presented here; a compliant compressor must produce - files that conform to all the specifications presented here. The - material in the appendices is not part of the specification per se - and is not relevant to compliance. - - 1.5. Definitions of terms and conventions used - - byte: 8 bits stored or transmitted as a unit (same as an octet). - (For this specification, a byte is exactly 8 bits, even on - machines which store a character on a number of bits different - from 8.) See below for the numbering of bits within a byte. - - 1.6. Changes from previous versions - - There have been no technical changes to the gzip format since - version 4.1 of this specification. In version 4.2, some - terminology was changed, and the sample CRC code was rewritten for - clarity and to eliminate the requirement for the caller to do pre- - and post-conditioning. Version 4.3 is a conversion of the - specification to RFC style. - - - -Deutsch Informational [Page 3] - -RFC 1952 GZIP File Format Specification May 1996 - - -2. Detailed specification - - 2.1. Overall conventions - - In the diagrams below, a box like this: - - +---+ - | | <-- the vertical bars might be missing - +---+ - - represents one byte; a box like this: - - +==============+ - | | - +==============+ - - represents a variable number of bytes. - - Bytes stored within a computer do not have a "bit order", since - they are always treated as a unit. However, a byte considered as - an integer between 0 and 255 does have a most- and least- - significant bit, and since we write numbers with the most- - significant digit on the left, we also write bytes with the most- - significant bit on the left. In the diagrams below, we number the - bits of a byte so that bit 0 is the least-significant bit, i.e., - the bits are numbered: - - +--------+ - |76543210| - +--------+ - - This document does not address the issue of the order in which - bits of a byte are transmitted on a bit-sequential medium, since - the data format described here is byte- rather than bit-oriented. - - Within a computer, a number may occupy multiple bytes. All - multi-byte numbers in the format described here are stored with - the least-significant byte first (at the lower memory address). - For example, the decimal number 520 is stored as: - - 0 1 - +--------+--------+ - |00001000|00000010| - +--------+--------+ - ^ ^ - | | - | + more significant byte = 2 x 256 - + less significant byte = 8 - - - -Deutsch Informational [Page 4] - -RFC 1952 GZIP File Format Specification May 1996 - - - 2.2. File format - - A gzip file consists of a series of "members" (compressed data - sets). The format of each member is specified in the following - section. The members simply appear one after another in the file, - with no additional information before, between, or after them. - - 2.3. Member format - - Each member has the following structure: - - +---+---+---+---+---+---+---+---+---+---+ - |ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->) - +---+---+---+---+---+---+---+---+---+---+ - - (if FLG.FEXTRA set) - - +---+---+=================================+ - | XLEN |...XLEN bytes of "extra field"...| (more-->) - +---+---+=================================+ - - (if FLG.FNAME set) - - +=========================================+ - |...original file name, zero-terminated...| (more-->) - +=========================================+ - - (if FLG.FCOMMENT set) - - +===================================+ - |...file comment, zero-terminated...| (more-->) - +===================================+ - - (if FLG.FHCRC set) - - +---+---+ - | CRC16 | - +---+---+ - - +=======================+ - |...compressed blocks...| (more-->) - +=======================+ - - 0 1 2 3 4 5 6 7 - +---+---+---+---+---+---+---+---+ - | CRC32 | ISIZE | - +---+---+---+---+---+---+---+---+ - - - - -Deutsch Informational [Page 5] - -RFC 1952 GZIP File Format Specification May 1996 - - - 2.3.1. Member header and trailer - - ID1 (IDentification 1) - ID2 (IDentification 2) - These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139 - (0x8b, \213), to identify the file as being in gzip format. - - CM (Compression Method) - This identifies the compression method used in the file. CM - = 0-7 are reserved. CM = 8 denotes the "deflate" - compression method, which is the one customarily used by - gzip and which is documented elsewhere. - - FLG (FLaGs) - This flag byte is divided into individual bits as follows: - - bit 0 FTEXT - bit 1 FHCRC - bit 2 FEXTRA - bit 3 FNAME - bit 4 FCOMMENT - bit 5 reserved - bit 6 reserved - bit 7 reserved - - If FTEXT is set, the file is probably ASCII text. This is - an optional indication, which the compressor may set by - checking a small amount of the input data to see whether any - non-ASCII characters are present. In case of doubt, FTEXT - is cleared, indicating binary data. For systems which have - different file formats for ascii text and binary data, the - decompressor can use FTEXT to choose the appropriate format. - We deliberately do not specify the algorithm used to set - this bit, since a compressor always has the option of - leaving it cleared and a decompressor always has the option - of ignoring it and letting some other program handle issues - of data conversion. - - If FHCRC is set, a CRC16 for the gzip header is present, - immediately before the compressed data. The CRC16 consists - of the two least significant bytes of the CRC32 for all - bytes of the gzip header up to and not including the CRC16. - [The FHCRC bit was never set by versions of gzip up to - 1.2.4, even though it was documented with a different - meaning in gzip 1.2.4.] - - If FEXTRA is set, optional extra fields are present, as - described in a following section. - - - -Deutsch Informational [Page 6] - -RFC 1952 GZIP File Format Specification May 1996 - - - If FNAME is set, an original file name is present, - terminated by a zero byte. The name must consist of ISO - 8859-1 (LATIN-1) characters; on operating systems using - EBCDIC or any other character set for file names, the name - must be translated to the ISO LATIN-1 character set. This - is the original name of the file being compressed, with any - directory components removed, and, if the file being - compressed is on a file system with case insensitive names, - forced to lower case. There is no original file name if the - data was compressed from a source other than a named file; - for example, if the source was stdin on a Unix system, there - is no file name. - - If FCOMMENT is set, a zero-terminated file comment is - present. This comment is not interpreted; it is only - intended for human consumption. The comment must consist of - ISO 8859-1 (LATIN-1) characters. Line breaks should be - denoted by a single line feed character (10 decimal). - - Reserved FLG bits must be zero. - - MTIME (Modification TIME) - This gives the most recent modification time of the original - file being compressed. The time is in Unix format, i.e., - seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this - may cause problems for MS-DOS and other systems that use - local rather than Universal time.) If the compressed data - did not come from a file, MTIME is set to the time at which - compression started. MTIME = 0 means no time stamp is - available. - - XFL (eXtra FLags) - These flags are available for use by specific compression - methods. The "deflate" method (CM = 8) sets these flags as - follows: - - XFL = 2 - compressor used maximum compression, - slowest algorithm - XFL = 4 - compressor used fastest algorithm - - OS (Operating System) - This identifies the type of file system on which compression - took place. This may be useful in determining end-of-line - convention for text files. The currently defined values are - as follows: - - - - - - -Deutsch Informational [Page 7] - -RFC 1952 GZIP File Format Specification May 1996 - - - 0 - FAT filesystem (MS-DOS, OS/2, NT/Win32) - 1 - Amiga - 2 - VMS (or OpenVMS) - 3 - Unix - 4 - VM/CMS - 5 - Atari TOS - 6 - HPFS filesystem (OS/2, NT) - 7 - Macintosh - 8 - Z-System - 9 - CP/M - 10 - TOPS-20 - 11 - NTFS filesystem (NT) - 12 - QDOS - 13 - Acorn RISCOS - 255 - unknown - - XLEN (eXtra LENgth) - If FLG.FEXTRA is set, this gives the length of the optional - extra field. See below for details. - - CRC32 (CRC-32) - This contains a Cyclic Redundancy Check value of the - uncompressed data computed according to CRC-32 algorithm - used in the ISO 3309 standard and in section 8.1.1.6.2 of - ITU-T recommendation V.42. (See http://www.iso.ch for - ordering ISO documents. See gopher://info.itu.ch for an - online version of ITU-T V.42.) - - ISIZE (Input SIZE) - This contains the size of the original (uncompressed) input - data modulo 2^32. - - 2.3.1.1. Extra field - - If the FLG.FEXTRA bit is set, an "extra field" is present in - the header, with total length XLEN bytes. It consists of a - series of subfields, each of the form: - - +---+---+---+---+==================================+ - |SI1|SI2| LEN |... LEN bytes of subfield data ...| - +---+---+---+---+==================================+ - - SI1 and SI2 provide a subfield ID, typically two ASCII letters - with some mnemonic value. Jean-Loup Gailly - <gzip@prep.ai.mit.edu> is maintaining a registry of subfield - IDs; please send him any subfield ID you wish to use. Subfield - IDs with SI2 = 0 are reserved for future use. The following - IDs are currently defined: - - - -Deutsch Informational [Page 8] - -RFC 1952 GZIP File Format Specification May 1996 - - - SI1 SI2 Data - ---------- ---------- ---- - 0x41 ('A') 0x70 ('P') Apollo file type information - - LEN gives the length of the subfield data, excluding the 4 - initial bytes. - - 2.3.1.2. Compliance - - A compliant compressor must produce files with correct ID1, - ID2, CM, CRC32, and ISIZE, but may set all the other fields in - the fixed-length part of the header to default values (255 for - OS, 0 for all others). The compressor must set all reserved - bits to zero. - - A compliant decompressor must check ID1, ID2, and CM, and - provide an error indication if any of these have incorrect - values. It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC - at least so it can skip over the optional fields if they are - present. It need not examine any other part of the header or - trailer; in particular, a decompressor may ignore FTEXT and OS - and always produce binary output, and still be compliant. A - compliant decompressor must give an error indication if any - reserved bit is non-zero, since such a bit could indicate the - presence of a new field that would cause subsequent data to be - interpreted incorrectly. - -3. References - - [1] "Information Processing - 8-bit single-byte coded graphic - character sets - Part 1: Latin alphabet No.1" (ISO 8859-1:1987). - The ISO 8859-1 (Latin-1) character set is a superset of 7-bit - ASCII. Files defining this character set are available as - iso_8859-1.* in ftp://ftp.uu.net/graphics/png/documents/ - - [2] ISO 3309 - - [3] ITU-T recommendation V.42 - - [4] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification", - available in ftp://ftp.uu.net/pub/archiving/zip/doc/ - - [5] Gailly, J.-L., GZIP documentation, available as gzip-*.tar in - ftp://prep.ai.mit.edu/pub/gnu/ - - [6] Sarwate, D.V., "Computation of Cyclic Redundancy Checks via Table - Look-Up", Communications of the ACM, 31(8), pp.1008-1013. - - - - -Deutsch Informational [Page 9] - -RFC 1952 GZIP File Format Specification May 1996 - - - [7] Schwaderer, W.D., "CRC Calculation", April 85 PC Tech Journal, - pp.118-133. - - [8] ftp://ftp.adelaide.edu.au/pub/rocksoft/papers/crc_v3.txt, - describing the CRC concept. - -4. Security Considerations - - Any data compression method involves the reduction of redundancy in - the data. Consequently, any corruption of the data is likely to have - severe effects and be difficult to correct. Uncompressed text, on - the other hand, will probably still be readable despite the presence - of some corrupted bytes. - - It is recommended that systems using this data format provide some - means of validating the integrity of the compressed data, such as by - setting and checking the CRC-32 check value. - -5. Acknowledgements - - Trademarks cited in this document are the property of their - respective owners. - - Jean-Loup Gailly designed the gzip format and wrote, with Mark Adler, - the related software described in this specification. Glenn - Randers-Pehrson converted this document to RFC and HTML format. - -6. Author's Address - - L. Peter Deutsch - Aladdin Enterprises - 203 Santa Margarita Ave. - Menlo Park, CA 94025 - - Phone: (415) 322-0103 (AM only) - FAX: (415) 322-1734 - EMail: <ghost@aladdin.com> - - Questions about the technical content of this specification can be - sent by email to: - - Jean-Loup Gailly <gzip@prep.ai.mit.edu> and - Mark Adler <madler@alumni.caltech.edu> - - Editorial comments on this specification can be sent by email to: - - L. Peter Deutsch <ghost@aladdin.com> and - Glenn Randers-Pehrson <randeg@alumni.rpi.edu> - - - -Deutsch Informational [Page 10] - -RFC 1952 GZIP File Format Specification May 1996 - - -7. Appendix: Jean-Loup Gailly's gzip utility - - The most widely used implementation of gzip compression, and the - original documentation on which this specification is based, were - created by Jean-Loup Gailly <gzip@prep.ai.mit.edu>. Since this - implementation is a de facto standard, we mention some more of its - features here. Again, the material in this section is not part of - the specification per se, and implementations need not follow it to - be compliant. - - When compressing or decompressing a file, gzip preserves the - protection, ownership, and modification time attributes on the local - file system, since there is no provision for representing protection - attributes in the gzip file format itself. Since the file format - includes a modification time, the gzip decompressor provides a - command line switch that assigns the modification time from the file, - rather than the local modification time of the compressed input, to - the decompressed output. - -8. Appendix: Sample CRC Code - - The following sample code represents a practical implementation of - the CRC (Cyclic Redundancy Check). (See also ISO 3309 and ITU-T V.42 - for a formal specification.) - - The sample code is in the ANSI C programming language. Non C users - may find it easier to read with these hints: - - & Bitwise AND operator. - ^ Bitwise exclusive-OR operator. - >> Bitwise right shift operator. When applied to an - unsigned quantity, as here, right shift inserts zero - bit(s) at the left. - ! Logical NOT operator. - ++ "n++" increments the variable n. - 0xNNN 0x introduces a hexadecimal (base 16) constant. - Suffix L indicates a long value (at least 32 bits). - - /* Table of CRCs of all 8-bit messages. */ - unsigned long crc_table[256]; - - /* Flag: has the table been computed? Initially false. */ - int crc_table_computed = 0; - - /* Make the table for a fast CRC. */ - void make_crc_table(void) - { - unsigned long c; - - - -Deutsch Informational [Page 11] - -RFC 1952 GZIP File Format Specification May 1996 - - - int n, k; - for (n = 0; n < 256; n++) { - c = (unsigned long) n; - for (k = 0; k < 8; k++) { - if (c & 1) { - c = 0xedb88320L ^ (c >> 1); - } else { - c = c >> 1; - } - } - crc_table[n] = c; - } - crc_table_computed = 1; - } - - /* - Update a running crc with the bytes buf[0..len-1] and return - the updated crc. The crc should be initialized to zero. Pre- and - post-conditioning (one's complement) is performed within this - function so it shouldn't be done by the caller. Usage example: - - unsigned long crc = 0L; - - while (read_buffer(buffer, length) != EOF) { - crc = update_crc(crc, buffer, length); - } - if (crc != original_crc) error(); - */ - unsigned long update_crc(unsigned long crc, - unsigned char *buf, int len) - { - unsigned long c = crc ^ 0xffffffffL; - int n; - - if (!crc_table_computed) - make_crc_table(); - for (n = 0; n < len; n++) { - c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8); - } - return c ^ 0xffffffffL; - } - - /* Return the CRC of the bytes buf[0..len-1]. */ - unsigned long crc(unsigned char *buf, int len) - { - return update_crc(0L, buf, len); - } - - - - -Deutsch Informational [Page 12] - |