summaryrefslogtreecommitdiffstats
path: root/lzma.txt
diff options
context:
space:
mode:
Diffstat (limited to 'lzma.txt')
-rwxr-xr-xlzma.txt598
1 files changed, 598 insertions, 0 deletions
diff --git a/lzma.txt b/lzma.txt
new file mode 100755
index 0000000..579f2cc
--- /dev/null
+++ b/lzma.txt
@@ -0,0 +1,598 @@
+LZMA SDK 9.20
+-------------
+
+LZMA SDK provides the documentation, samples, header files, libraries,
+and tools you need to develop applications that use LZMA compression.
+
+LZMA is default and general compression method of 7z format
+in 7-Zip compression program (www.7-zip.org). LZMA provides high
+compression ratio and very fast decompression.
+
+LZMA is an improved version of famous LZ77 compression algorithm.
+It was improved in way of maximum increasing of compression ratio,
+keeping high decompression speed and low memory requirements for
+decompressing.
+
+
+
+LICENSE
+-------
+
+LZMA SDK is written and placed in the public domain by Igor Pavlov.
+
+Some code in LZMA SDK is based on public domain code from another developers:
+ 1) PPMd var.H (2001): Dmitry Shkarin
+ 2) SHA-256: Wei Dai (Crypto++ library)
+
+
+LZMA SDK Contents
+-----------------
+
+LZMA SDK includes:
+
+ - ANSI-C/C++/C#/Java source code for LZMA compressing and decompressing
+ - Compiled file->file LZMA compressing/decompressing program for Windows system
+
+
+UNIX/Linux version
+------------------
+To compile C++ version of file->file LZMA encoding, go to directory
+CPP/7zip/Bundles/LzmaCon
+and call make to recompile it:
+ make -f makefile.gcc clean all
+
+In some UNIX/Linux versions you must compile LZMA with static libraries.
+To compile with static libraries, you can use
+LIB = -lm -static
+
+
+Files
+---------------------
+lzma.txt - LZMA SDK description (this file)
+7zFormat.txt - 7z Format description
+7zC.txt - 7z ANSI-C Decoder description
+methods.txt - Compression method IDs for .7z
+lzma.exe - Compiled file->file LZMA encoder/decoder for Windows
+7zr.exe - 7-Zip with 7z/lzma/xz support.
+history.txt - history of the LZMA SDK
+
+
+Source code structure
+---------------------
+
+C/ - C files
+ 7zCrc*.* - CRC code
+ Alloc.* - Memory allocation functions
+ Bra*.* - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code
+ LzFind.* - Match finder for LZ (LZMA) encoders
+ LzFindMt.* - Match finder for LZ (LZMA) encoders for multithreading encoding
+ LzHash.h - Additional file for LZ match finder
+ LzmaDec.* - LZMA decoding
+ LzmaEnc.* - LZMA encoding
+ LzmaLib.* - LZMA Library for DLL calling
+ Types.h - Basic types for another .c files
+ Threads.* - The code for multithreading.
+
+ LzmaLib - LZMA Library (.DLL for Windows)
+
+ LzmaUtil - LZMA Utility (file->file LZMA encoder/decoder).
+
+ Archive - files related to archiving
+ 7z - 7z ANSI-C Decoder
+
+CPP/ -- CPP files
+
+ Common - common files for C++ projects
+ Windows - common files for Windows related code
+
+ 7zip - files related to 7-Zip Project
+
+ Common - common files for 7-Zip
+
+ Compress - files related to compression/decompression
+
+ Archive - files related to archiving
+
+ Common - common files for archive handling
+ 7z - 7z C++ Encoder/Decoder
+
+ Bundles - Modules that are bundles of other modules
+
+ Alone7z - 7zr.exe: Standalone version of 7z.exe that supports only 7z/LZMA/BCJ/BCJ2
+ LzmaCon - lzma.exe: LZMA compression/decompression
+ Format7zR - 7zr.dll: Reduced version of 7za.dll: extracting/compressing to 7z/LZMA/BCJ/BCJ2
+ Format7zExtractR - 7zxr.dll: Reduced version of 7zxa.dll: extracting from 7z/LZMA/BCJ/BCJ2.
+
+ UI - User Interface files
+
+ Client7z - Test application for 7za.dll, 7zr.dll, 7zxr.dll
+ Common - Common UI files
+ Console - Code for console archiver
+
+
+
+CS/ - C# files
+ 7zip
+ Common - some common files for 7-Zip
+ Compress - files related to compression/decompression
+ LZ - files related to LZ (Lempel-Ziv) compression algorithm
+ LZMA - LZMA compression/decompression
+ LzmaAlone - file->file LZMA compression/decompression
+ RangeCoder - Range Coder (special code of compression/decompression)
+
+Java/ - Java files
+ SevenZip
+ Compression - files related to compression/decompression
+ LZ - files related to LZ (Lempel-Ziv) compression algorithm
+ LZMA - LZMA compression/decompression
+ RangeCoder - Range Coder (special code of compression/decompression)
+
+
+C/C++ source code of LZMA SDK is part of 7-Zip project.
+7-Zip source code can be downloaded from 7-Zip's SourceForge page:
+
+ http://sourceforge.net/projects/sevenzip/
+
+
+
+LZMA features
+-------------
+ - Variable dictionary size (up to 1 GB)
+ - Estimated compressing speed: about 2 MB/s on 2 GHz CPU
+ - Estimated decompressing speed:
+ - 20-30 MB/s on 2 GHz Core 2 or AMD Athlon 64
+ - 1-2 MB/s on 200 MHz ARM, MIPS, PowerPC or other simple RISC
+ - Small memory requirements for decompressing (16 KB + DictionarySize)
+ - Small code size for decompressing: 5-8 KB
+
+LZMA decoder uses only integer operations and can be
+implemented in any modern 32-bit CPU (or on 16-bit CPU with some conditions).
+
+Some critical operations that affect the speed of LZMA decompression:
+ 1) 32*16 bit integer multiply
+ 2) Misspredicted branches (penalty mostly depends from pipeline length)
+ 3) 32-bit shift and arithmetic operations
+
+The speed of LZMA decompressing mostly depends from CPU speed.
+Memory speed has no big meaning. But if your CPU has small data cache,
+overall weight of memory speed will slightly increase.
+
+
+How To Use
+----------
+
+Using LZMA encoder/decoder executable
+--------------------------------------
+
+Usage: LZMA <e|d> inputFile outputFile [<switches>...]
+
+ e: encode file
+
+ d: decode file
+
+ b: Benchmark. There are two tests: compressing and decompressing
+ with LZMA method. Benchmark shows rating in MIPS (million
+ instructions per second). Rating value is calculated from
+ measured speed and it is normalized with Intel's Core 2 results.
+ Also Benchmark checks possible hardware errors (RAM
+ errors in most cases). Benchmark uses these settings:
+ (-a1, -d21, -fb32, -mfbt4). You can change only -d parameter.
+ Also you can change the number of iterations. Example for 30 iterations:
+ LZMA b 30
+ Default number of iterations is 10.
+
+<Switches>
+
+
+ -a{N}: set compression mode 0 = fast, 1 = normal
+ default: 1 (normal)
+
+ d{N}: Sets Dictionary size - [0, 30], default: 23 (8MB)
+ The maximum value for dictionary size is 1 GB = 2^30 bytes.
+ Dictionary size is calculated as DictionarySize = 2^N bytes.
+ For decompressing file compressed by LZMA method with dictionary
+ size D = 2^N you need about D bytes of memory (RAM).
+
+ -fb{N}: set number of fast bytes - [5, 273], default: 128
+ Usually big number gives a little bit better compression ratio
+ and slower compression process.
+
+ -lc{N}: set number of literal context bits - [0, 8], default: 3
+ Sometimes lc=4 gives gain for big files.
+
+ -lp{N}: set number of literal pos bits - [0, 4], default: 0
+ lp switch is intended for periodical data when period is
+ equal 2^N. For example, for 32-bit (4 bytes)
+ periodical data you can use lp=2. Often it's better to set lc0,
+ if you change lp switch.
+
+ -pb{N}: set number of pos bits - [0, 4], default: 2
+ pb switch is intended for periodical data
+ when period is equal 2^N.
+
+ -mf{MF_ID}: set Match Finder. Default: bt4.
+ Algorithms from hc* group doesn't provide good compression
+ ratio, but they often works pretty fast in combination with
+ fast mode (-a0).
+
+ Memory requirements depend from dictionary size
+ (parameter "d" in table below).
+
+ MF_ID Memory Description
+
+ bt2 d * 9.5 + 4MB Binary Tree with 2 bytes hashing.
+ bt3 d * 11.5 + 4MB Binary Tree with 3 bytes hashing.
+ bt4 d * 11.5 + 4MB Binary Tree with 4 bytes hashing.
+ hc4 d * 7.5 + 4MB Hash Chain with 4 bytes hashing.
+
+ -eos: write End Of Stream marker. By default LZMA doesn't write
+ eos marker, since LZMA decoder knows uncompressed size
+ stored in .lzma file header.
+
+ -si: Read data from stdin (it will write End Of Stream marker).
+ -so: Write data to stdout
+
+
+Examples:
+
+1) LZMA e file.bin file.lzma -d16 -lc0
+
+compresses file.bin to file.lzma with 64 KB dictionary (2^16=64K)
+and 0 literal context bits. -lc0 allows to reduce memory requirements
+for decompression.
+
+
+2) LZMA e file.bin file.lzma -lc0 -lp2
+
+compresses file.bin to file.lzma with settings suitable
+for 32-bit periodical data (for example, ARM or MIPS code).
+
+3) LZMA d file.lzma file.bin
+
+decompresses file.lzma to file.bin.
+
+
+Compression ratio hints
+-----------------------
+
+Recommendations
+---------------
+
+To increase the compression ratio for LZMA compressing it's desirable
+to have aligned data (if it's possible) and also it's desirable to locate
+data in such order, where code is grouped in one place and data is
+grouped in other place (it's better than such mixing: code, data, code,
+data, ...).
+
+
+Filters
+-------
+You can increase the compression ratio for some data types, using
+special filters before compressing. For example, it's possible to
+increase the compression ratio on 5-10% for code for those CPU ISAs:
+x86, IA-64, ARM, ARM-Thumb, PowerPC, SPARC.
+
+You can find C source code of such filters in C/Bra*.* files
+
+You can check the compression ratio gain of these filters with such
+7-Zip commands (example for ARM code):
+No filter:
+ 7z a a1.7z a.bin -m0=lzma
+
+With filter for little-endian ARM code:
+ 7z a a2.7z a.bin -m0=arm -m1=lzma
+
+It works in such manner:
+Compressing = Filter_encoding + LZMA_encoding
+Decompressing = LZMA_decoding + Filter_decoding
+
+Compressing and decompressing speed of such filters is very high,
+so it will not increase decompressing time too much.
+Moreover, it reduces decompression time for LZMA_decoding,
+since compression ratio with filtering is higher.
+
+These filters convert CALL (calling procedure) instructions
+from relative offsets to absolute addresses, so such data becomes more
+compressible.
+
+For some ISAs (for example, for MIPS) it's impossible to get gain from such filter.
+
+
+LZMA compressed file format
+---------------------------
+Offset Size Description
+ 0 1 Special LZMA properties (lc,lp, pb in encoded form)
+ 1 4 Dictionary size (little endian)
+ 5 8 Uncompressed size (little endian). -1 means unknown size
+ 13 Compressed data
+
+
+ANSI-C LZMA Decoder
+~~~~~~~~~~~~~~~~~~~
+
+Please note that interfaces for ANSI-C code were changed in LZMA SDK 4.58.
+If you want to use old interfaces you can download previous version of LZMA SDK
+from sourceforge.net site.
+
+To use ANSI-C LZMA Decoder you need the following files:
+1) LzmaDec.h + LzmaDec.c + Types.h
+LzmaUtil/LzmaUtil.c is example application that uses these files.
+
+
+Memory requirements for LZMA decoding
+-------------------------------------
+
+Stack usage of LZMA decoding function for local variables is not
+larger than 200-400 bytes.
+
+LZMA Decoder uses dictionary buffer and internal state structure.
+Internal state structure consumes
+ state_size = (4 + (1.5 << (lc + lp))) KB
+by default (lc=3, lp=0), state_size = 16 KB.
+
+
+How To decompress data
+----------------------
+
+LZMA Decoder (ANSI-C version) now supports 2 interfaces:
+1) Single-call Decompressing
+2) Multi-call State Decompressing (zlib-like interface)
+
+You must use external allocator:
+Example:
+void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); }
+void SzFree(void *p, void *address) { p = p; free(address); }
+ISzAlloc alloc = { SzAlloc, SzFree };
+
+You can use p = p; operator to disable compiler warnings.
+
+
+Single-call Decompressing
+-------------------------
+When to use: RAM->RAM decompressing
+Compile files: LzmaDec.h + LzmaDec.c + Types.h
+Compile defines: no defines
+Memory Requirements:
+ - Input buffer: compressed size
+ - Output buffer: uncompressed size
+ - LZMA Internal Structures: state_size (16 KB for default settings)
+
+Interface:
+ int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen,
+ const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode,
+ ELzmaStatus *status, ISzAlloc *alloc);
+ In:
+ dest - output data
+ destLen - output data size
+ src - input data
+ srcLen - input data size
+ propData - LZMA properties (5 bytes)
+ propSize - size of propData buffer (5 bytes)
+ finishMode - It has meaning only if the decoding reaches output limit (*destLen).
+ LZMA_FINISH_ANY - Decode just destLen bytes.
+ LZMA_FINISH_END - Stream must be finished after (*destLen).
+ You can use LZMA_FINISH_END, when you know that
+ current output buffer covers last bytes of stream.
+ alloc - Memory allocator.
+
+ Out:
+ destLen - processed output size
+ srcLen - processed input size
+
+ Output:
+ SZ_OK
+ status:
+ LZMA_STATUS_FINISHED_WITH_MARK
+ LZMA_STATUS_NOT_FINISHED
+ LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK
+ SZ_ERROR_DATA - Data error
+ SZ_ERROR_MEM - Memory allocation error
+ SZ_ERROR_UNSUPPORTED - Unsupported properties
+ SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src).
+
+ If LZMA decoder sees end_marker before reaching output limit, it returns OK result,
+ and output value of destLen will be less than output buffer size limit.
+
+ You can use multiple checks to test data integrity after full decompression:
+ 1) Check Result and "status" variable.
+ 2) Check that output(destLen) = uncompressedSize, if you know real uncompressedSize.
+ 3) Check that output(srcLen) = compressedSize, if you know real compressedSize.
+ You must use correct finish mode in that case. */
+
+
+Multi-call State Decompressing (zlib-like interface)
+----------------------------------------------------
+
+When to use: file->file decompressing
+Compile files: LzmaDec.h + LzmaDec.c + Types.h
+
+Memory Requirements:
+ - Buffer for input stream: any size (for example, 16 KB)
+ - Buffer for output stream: any size (for example, 16 KB)
+ - LZMA Internal Structures: state_size (16 KB for default settings)
+ - LZMA dictionary (dictionary size is encoded in LZMA properties header)
+
+1) read LZMA properties (5 bytes) and uncompressed size (8 bytes, little-endian) to header:
+ unsigned char header[LZMA_PROPS_SIZE + 8];
+ ReadFile(inFile, header, sizeof(header)
+
+2) Allocate CLzmaDec structures (state + dictionary) using LZMA properties
+
+ CLzmaDec state;
+ LzmaDec_Constr(&state);
+ res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc);
+ if (res != SZ_OK)
+ return res;
+
+3) Init LzmaDec structure before any new LZMA stream. And call LzmaDec_DecodeToBuf in loop
+
+ LzmaDec_Init(&state);
+ for (;;)
+ {
+ ...
+ int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen,
+ const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode);
+ ...
+ }
+
+
+4) Free all allocated structures
+ LzmaDec_Free(&state, &g_Alloc);
+
+For full code example, look at C/LzmaUtil/LzmaUtil.c code.
+
+
+How To compress data
+--------------------
+
+Compile files: LzmaEnc.h + LzmaEnc.c + Types.h +
+LzFind.c + LzFind.h + LzFindMt.c + LzFindMt.h + LzHash.h
+
+Memory Requirements:
+ - (dictSize * 11.5 + 6 MB) + state_size
+
+Lzma Encoder can use two memory allocators:
+1) alloc - for small arrays.
+2) allocBig - for big arrays.
+
+For example, you can use Large RAM Pages (2 MB) in allocBig allocator for
+better compression speed. Note that Windows has bad implementation for
+Large RAM Pages.
+It's OK to use same allocator for alloc and allocBig.
+
+
+Single-call Compression with callbacks
+--------------------------------------
+
+Check C/LzmaUtil/LzmaUtil.c as example,
+
+When to use: file->file decompressing
+
+1) you must implement callback structures for interfaces:
+ISeqInStream
+ISeqOutStream
+ICompressProgress
+ISzAlloc
+
+static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); }
+static void SzFree(void *p, void *address) { p = p; MyFree(address); }
+static ISzAlloc g_Alloc = { SzAlloc, SzFree };
+
+ CFileSeqInStream inStream;
+ CFileSeqOutStream outStream;
+
+ inStream.funcTable.Read = MyRead;
+ inStream.file = inFile;
+ outStream.funcTable.Write = MyWrite;
+ outStream.file = outFile;
+
+
+2) Create CLzmaEncHandle object;
+
+ CLzmaEncHandle enc;
+
+ enc = LzmaEnc_Create(&g_Alloc);
+ if (enc == 0)
+ return SZ_ERROR_MEM;
+
+
+3) initialize CLzmaEncProps properties;
+
+ LzmaEncProps_Init(&props);
+
+ Then you can change some properties in that structure.
+
+4) Send LZMA properties to LZMA Encoder
+
+ res = LzmaEnc_SetProps(enc, &props);
+
+5) Write encoded properties to header
+
+ Byte header[LZMA_PROPS_SIZE + 8];
+ size_t headerSize = LZMA_PROPS_SIZE;
+ UInt64 fileSize;
+ int i;
+
+ res = LzmaEnc_WriteProperties(enc, header, &headerSize);
+ fileSize = MyGetFileLength(inFile);
+ for (i = 0; i < 8; i++)
+ header[headerSize++] = (Byte)(fileSize >> (8 * i));
+ MyWriteFileAndCheck(outFile, header, headerSize)
+
+6) Call encoding function:
+ res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable,
+ NULL, &g_Alloc, &g_Alloc);
+
+7) Destroy LZMA Encoder Object
+ LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc);
+
+
+If callback function return some error code, LzmaEnc_Encode also returns that code
+or it can return the code like SZ_ERROR_READ, SZ_ERROR_WRITE or SZ_ERROR_PROGRESS.
+
+
+Single-call RAM->RAM Compression
+--------------------------------
+
+Single-call RAM->RAM Compression is similar to Compression with callbacks,
+but you provide pointers to buffers instead of pointers to stream callbacks:
+
+HRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen,
+ CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark,
+ ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig);
+
+Return code:
+ SZ_OK - OK
+ SZ_ERROR_MEM - Memory allocation error
+ SZ_ERROR_PARAM - Incorrect paramater
+ SZ_ERROR_OUTPUT_EOF - output buffer overflow
+ SZ_ERROR_THREAD - errors in multithreading functions (only for Mt version)
+
+
+
+Defines
+-------
+
+_LZMA_SIZE_OPT - Enable some optimizations in LZMA Decoder to get smaller executable code.
+
+_LZMA_PROB32 - It can increase the speed on some 32-bit CPUs, but memory usage for
+ some structures will be doubled in that case.
+
+_LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler and long is 32-bit.
+
+_LZMA_NO_SYSTEM_SIZE_T - Define it if you don't want to use size_t type.
+
+
+_7ZIP_PPMD_SUPPPORT - Define it if you don't want to support PPMD method in AMSI-C .7z decoder.
+
+
+C++ LZMA Encoder/Decoder
+~~~~~~~~~~~~~~~~~~~~~~~~
+C++ LZMA code use COM-like interfaces. So if you want to use it,
+you can study basics of COM/OLE.
+C++ LZMA code is just wrapper over ANSI-C code.
+
+
+C++ Notes
+~~~~~~~~~~~~~~~~~~~~~~~~
+If you use some C++ code folders in 7-Zip (for example, C++ code for .7z handling),
+you must check that you correctly work with "new" operator.
+7-Zip can be compiled with MSVC 6.0 that doesn't throw "exception" from "new" operator.
+So 7-Zip uses "CPP\Common\NewHandler.cpp" that redefines "new" operator:
+operator new(size_t size)
+{
+ void *p = ::malloc(size);
+ if (p == 0)
+ throw CNewException();
+ return p;
+}
+If you use MSCV that throws exception for "new" operator, you can compile without
+"NewHandler.cpp". So standard exception will be used. Actually some code of
+7-Zip catches any exception in internal code and converts it to HRESULT code.
+So you don't need to catch CNewException, if you call COM interfaces of 7-Zip.
+
+---
+
+http://www.7-zip.org
+http://www.7-zip.org/sdk.html
+http://www.7-zip.org/support.html