Tech Stuff - Audio/Video Files, Codecs and Containers

We can never find stuff in one place - here is a collection of audio/video file formats together with some basic notes and links to additional information. Helped us, perhaps it will help you. If not .... well .... sad, but not terminal.

<Big Deal> September, 2012 marked the publication of RFC 6716 which defined a new royalty-free codec which goes by the name of Opus, and represents, essentially, the merger of work done by Skype and Xiph.org (developers of ogg, speex and other codecs). While we have had royalty-free codes for some time (notably, speex and ogg vorbis) this is the first time we have seen one with the imprimatur of the IETF. This is a big deal. Opus offers a wide range of bit rates (6Kbit/s to 510Kbit/s), frame sizes (2.5ms to 120ms) and sampling rates (8 kHz to 48KHz) making it suitable for interactive (VoIP), streaming and audio playback. Quality is comparable to HE-AAC/Orbis at the top end and very close to the best proprietary narrowband codecs at the bottom end. Not only is it free, it's also very good. </Big Deal>

Overview - Is it a file, a container or a codec
File Types and Extensions
Codecs
Audio/Video Containers
- AIFF/AIFF-C Container
- ADTS Container
- WAVE Container
- OGG Container
Meta or Tag Data

Is it a File, a Container or a Codec

So you have a sound file with the name sound.mp3 or sound.wav - does this describe a file format, a container format or a codec?

It depends on the file extension. For example the file sound.mp3 contains MP3 data that can only be interpreted and played by a MP3 codec (except, confusingly, it can also contain an ID3 tag frame at the begining or the end of the file). Thus a file with the extension .mp3 has a file format specific to a single, in this case mp3, codec - it does not use a container format.

So what is a container? A container is a standardized envelope that typically includes fields that indicate which codec should be used to play the enclosed audio/video material and may or may not contain a format to decsribe meta (tag) data. As an example, the file sound.wav uses a WAVE container in which the codec to be used is indicated in the format chunk's CODEC field. Thus an application could read a .wav file and select from a number of different codecs to handle the audio material in the data chunk (each codec would clearly have to be able to interpret the data).

So what is a codec? A codec is a widely used generic term used to describe software that handles a specific audio data format. The term originally was a shortened version of coder/decoder and thus refered to software that could provide both encoding (writing) functions and decoding (playback) functions. Today, what is typically called a codec rarely provides both capabilities. Rather, what is called an mp3 codec is typically an mp3 decoder used to playback mp3 sound files. If the software also supports creation (encoding) of mp3 files it will typically use a separate mp3 encoder. The distinction is only important if you are trying to transcode (convert) A/V files.

The following section on AV file extensions defines whether files are codec specific or use a container.

File Types and Extensions

The following table is a (currently incomplete) list of commonly used file extensions. It gives a brief description of the contents, codecs and any container formats used.

.aac	International standard. Files may contain raw data (no container), an ADIF container or ADTS (streaming) container. File contains audio data in MPEG-2 Part 7 AAC format (AAC) format which can only be used by an AAC encoder or decoder. AAC encoded audio is also used by MPEG-4 but in this case appears with an .m4a or .mp4 extension and typically uses different containers. Neither ADIF nor ADTS containers allow for meta (tag) data but some transcoders add an ID3 frame before the audio material, such files may not be readable by all players.
.aiff (.aif/.aifc)	Apple Audio Interchange File Format standard. The latest AIFF specification is dated 1989, however this a tad confusing because there is a (slightly) newer AIFF-C (AIFF - Compressed) specification dating from 1990/1991. AIFF uses an AIFF container but in its basic form this does not allow encoding of meta (tag) data or even selection of codec and is therefore assumed to contain uncompressed audio (LPCM) data that can be proccessed by a AIFF codec. However, files which use the .aiff extension can also be in AIFF-C format (occasionally they use the .aifc extension) which provides sigificant extensions. The AIFF-C extension format does allow for a codec type field in the COMM chunk. AIFF files are widely used by sound professionals.
.flac (.fla)	xiph.org open standard. Indicates an FLAC (Free Lossless Audio Codec) audio file encapsulated (enveloped) in an ogg container. Ogg containers provide the ability to encode meta (tag) data. Since this file type uses an ogg container it could, theoretically, contain other formats, such as, Vorbis. All Xiph.org codecs are royalty-free and open source.
.m4a	International standard (mostly). Defines the container format to be standard MP4 but the content to be audio only (MP4 can support Audio and video). The most common audio format is AAC or AAC+. The name was largely popularised by Apple's iTunes. In general files with the extension .m4a or .mp4 can both be handled by the same players. Because .m4a files use the standard MP4 container, meta (tag) data is provided. In some cases files with the extension .m4a can contain audio data encoded using Apple's Advanced Lossless Audio Codec (ALAC) which is not widely supported outside the Apple ecosystem and such files will typically fail to play on many media players.
.m4p	Defines the container format to be standard MP4 but the content to be audio only (MP4 can support Audio and video) and containing Apple's FairPlay Digital Rights Management (DRM) encoding. Apple announced on 6 January 2009 that all music files would be made DRM free but that FairPlay DRM would remain on movies and television shows.
.mp3	International standard. No container format. File contains audio data in MPEG-1/2 Audio Layer 3 (MP3) format which can only be used by an MP3 encoder or decoder. MP3 files now widely contain an ID3 tag frame used for meta (tag) data such as author, artist etc. The ID3 frame format, while being open, is not part of the MPEG-1/2 standards. It was adopted by popular demand and has become a de-facto standard used in other file and container types as well.
.mp4	International standard. Only defines the container format - a significant number of audio or video formats (codecs) can appear within the container. The most common audio format contained in .mp4 files is AAC or AAC+. In order to disambiguate files which only contain audio data the file extension .m4a is commonly used. Sometimes MP4 files which contain video are given the extension.m4v.
.ogg	Xiph.org open standard. Indicates an vorbis audio file encapsulated (enveloped) in an ogg container. Ogg containers provide the ability to encode meta (tag) data. Since this file type uses an ogg container it could, theoretically, contain other formats, such as, FLAC. While colloqually known as ogg files or even an ogg codec technically ogg is the container and vorbis is the codec. All Xiph.org codecs are royalty-free and open source.
.opus	xiph.org open standard. Indicates an Opus audio file encapsulated (enveloped) in an ogg container. Ogg containers provide the ability to encode meta (tag) data. Since this file type uses an ogg container it could, theoretically, contain other formats, such as, Vorbis or FLAC. All Xiph.org codecs are royalty-free and open source. Note: The Opus audio data format is defined by the IETF (RFC 6716).
.wav	IBM and Microsoft audio only standard. Uses a rudimentary RIFF container format which does not include meta (tag) data - though a number of extensions have been added by various groups including ID3 tags. While the file may contain audio data in variety of formats the almost universal use of .wav files is to contain uncompressed audio data.

Codecs

A codec (originally short for coder/decoder) is a software program or library that knows how to handle audio/visual material in a specific format.

The crucial difference in the various formats is between lossy and lossless. In a lossy format some of the orginal source material is discarded using a variety of sophisticated algorithms to retain as much as possible of the original sound quality (typically using psychoacoustics models). File sizes for lossy formats are typically 10:1 smaller than the original source material. Lossless formats as the name suggests retain all the source material. There are now a number of compressed lossless formats, such as FLAC which maintain all the original material but use classic lossless data compression techniques (such as the LZ family) to compress file size. File sizes tend to be only ~2:1 smaller than the original (though for high volume archival applications this can represent a huge space saving). Some additional points to note: open or proprietary standards and whether or not there are patent/royalty issues involved.

AAC+	a.k.a HE-AAC. AAC+ (Offically MPEG-4 High-Efficiency Advanced Audio Coding) is AAC with bells and whistles. MPEG-4 defines a number of profiles wich combine various Audio Object Types. The HE-AAC profile combines AAC-LC (2) with SBR (5) and packages the file in an MP4 container (though others are allowed by MPEG-4). To make matters a little more interesting there are two versions of HE-AAC, being HE-AACv1 (AAC+) and HE-AACv2 (eAAC+). HE-AACv2 profile combines the Audio Object Types - AAC-LC (1), SBR (5) and PS (29).
AAC	AAC (Advanced Audio Encoding) is used in both MPEG-2 Part 7 (with file extensions of .aac) and MPEG-4 Part 3 (with file extensions .m4a and mp4). AAC is a compressed, lossy, audio format designed to supersede MP3. It comes in a huge variety of flavors not all of which are supported by all AAC codecs. It provides support for a wider range of sampling bit rates (8, 16, 22.05, 24, 32, 44.1, 48 and 96K) and is designed to be more efficient at lower bit rates. It supports essentially the same bit rates as MP3 (32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160, 192, 224, 256 and 320K bits per second) as well as variable bit rate (VBR) output. There are no patent/royalty issues involved with AAC when distributing content but codec suppliers are liable for licensing fees. AAC, while supported by a large number of vendors, is most commonly associated with Apple (iTunes). AAC has a significant number of variations not all of which are supported by all codecs so that it is not enough to say that the codec is AAC but requires further clarification as to which Audio Obect Types it supports. The most common types are AAC Main (1), AAC LC (2) and AAC SSR(3) which are the default modes. When used with MPEG-2 (with file extension .aac) the AAC data may be raw (no container), encapsulated in a ADIF container or an ADTS container. Neither of these container support meta (tag) information. When used with MPEG-4 the most common container is MP4 (file extensions .m4a and .mp4) but it also may be contained in ADIF or ADTS containers like MPEG-2 (this is an optional part of the standards and not all AAC/MP4 codecs can interpret such encapsulation) or Low-overhead Audio Transport Multiplex (LATM) or Low Overhead Audio Stream (LOAS) containers. Note: Of particular interest are AAC-LD (Low Delay mode - type 29) and AAC-ELD (Enhanced Low Delay - type 39) both of which provide very low delay (latencies of ~20ms) at various low bit rates and which make them potentially very useful for interactive uses, such as, telephony.
AIFF	AIFF files use a AIFF container which in its basic form does not support a codec type. AIFF (Audio Interchange File Format) is an uncompressed, lossless audio format developed by Apple and used extensively on the Mac range of computers (as well as others). AIFF audio data is stored in a raw PCM (LPCM) format. A CD also stores the audio in LPCM format (at a sample rate of 44.1KHz and a sample size of 16 bits) but uses a different file standard (The Red Book - IEC 60908). AIFF may be viewed as Apple's equivalent of WAV though Apple would doubtless claim significant advantages for their format. The standard file extension is .aiff (or .aif). Confusingly, files with .aiff (.aif) suffix can also contain an extended AIFF-C format container though such files typically use a .aifc suffix.
FLAC	The FLAC codec data is enclosed in an Ogg container which also provides the ability to encode meta (tag) data. FLAC (Free Lossless Audio Codec) is a compressed but lossless audio format developed by Xiph.org a non-profit organization dedicated to providing open, royalty/patent-free open source standards and software. FLAC supports 4 to 32 bits per sample and sampling rates from 1 Hz to 655,350 Hz (~655 kHz) in 1 Hz increments. Compressed, lossless codecs (audio format) typically achieve file size reductions of ~2:1 with no loss of the source material. A FLAC file may be viewed crudely as a compressed WAV file. FLAC is used in both high-quality playback systems and especially archival applications due to the significant reduction in storage requirements. Xiph.org also provide Theora (video compression standard), speex (a variable bit rate CODEC for low-latency VoIP) and Ogg Vorbis (a compressed, lossy MP3 alternative).
MP3	MPEG-1/2 Audio Layer 3 (that is it may be used in both MPEG-1 and MPEG-2 systems) or MP3 for short. A compressed and lossy (stuff is lost from the original recording) standard for storing audio data. The bit rate (not to be confused with the sampling rate) (supported bit rates are: 32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160, 192, 224, 256 and 320 kbit/s) determines how much data is discarded and therefore the resulting sound quality and file size. The lower the bit rate the lower the audio quality and the smaller the file size. Most systems use a 128K bit rate (increasingly moving to higher rates such as 192K) which gives what is sometimes called radio or FM quality. Depending on the bit rate file sizes are typically 10:1 smaller than if stored in uncompressed format. For comparison uncompressed CD data has a bit rate of 1,411.2 kbit/s. Sampling rates supported are: 16 kHz, 22.05 kHz, 24 kHz, 32 kHz, 44.1 kHz and 48 kHz. While the standard is developed by Internationally recognized bodies there are patent issues related to MP3 technology (Fraunhoffer Institute). You need to buy the MP3 specification and license the resulting products. Oooh. Though there are multiple readers available and in practice the file format is widely known and understood. File suffix is .mp3.
MP4	Defined by MPEG-4 Part 14, MP4 is a container used to encapsulated many different audio (and video) types and it used the file extension .mp4. However, it most commonly contains the audio data for AAC or AAC+ and in this case typically takes the file exension .m4a (though other codecs may appear in such files especially from Apple's iTunes). There is no difference in the container formats of .mp4 and .m4a files. MP4 incorporates ISO Base Media File format ISO/IEC 14496-12:2004. The MP4 container does allow for the defintion of meta (tag).
Ogg/Ogg Vorbis/Vorbis	The quaintly named Ogg Vorbis (colloquially shortened to just ogg) technically defines a vorbis codec encapsulated in an ogg container. Vorbis is a compressed, lossy, variable bit rate standard (from 45K to 500K) developed by Xiph.org Foundation a non-profit organization dedicated to providing open, royalty/patent-free open source standards and software. There are now a variety of native applications and plug-ins available to support vorbis codecs for many popular players. The vorbis web site claims that the Ogg Vorbis standard is competitive with MPEG-4 (AAC), and similar to, but higher performance than MPEG-1/2 audio layer 3 (MP3), MPEG-4 audio (TwinVQ), WMA and PAC. The quality (and therefore file size) of the audio stream is based on a Q factor corresponding to: -1 45 kbit/s 0 64 kbit/s 1 80 kbit/s 2 96 kbit/s 3 112 kbit/s 4 128 kbit/s 5 160 kbit/s 6 192 kbit/s 7 224 kbit/s 8 256 kbit/s 9 320 kbit/s 10 500 kbit/s Sampling bit rates are theoretically infinitely variable but due to equipment availability will typically be the same as for MP3 (16K, 22.05K, 24K, 32K, 44.1K and 48K). Xiph.org also provide Theora (video compression standard), speex (a variable bit rate CODEC for low-latency speech such as VoIP) and FLAC (a compressed, lossless audio standard). And not a patent in sight. File suffix is normally .ogg.
Opus	The Opus codec is currently (September 2012) the only codec defined by the IETF (RFC 6716). Opus is a royalty-free, open source, compressed, lossy, variable bit rate standard. This is a new standard with a reference encoder/decoder (released under a simplified BSD license) and it has been implemented in some (September 2012) popular players, audio libraries and web browsers. Frame rates can be 2.5ms, 5ms, 10ms, 20ms (default) or 60ms thus allowing its use in a wide range of latencies including VoIP. Bit-rates are infinitely variable from 6 kbit/s to 510 kbit/s but the standard defines "sweet-spot" ranges depending on the available bandwith, for example, for full bandwidth stereo the 'sweet-spot' is in the range 64 - 128 kbit/s. To achieve the impressively wide range of bit-rate support the codec uses two separate algorithms. SILK at low bit-rates and CELT at high bit-rates with a hybrid mode in the mid-ranges (all invisible to the user). A single sample size of 16 bits is supported. Sample rates supported are 8 kHz, 12 kHz, 16 kHz, 24 kHz and 48 kHz. When running at 48 kHz the codec does not encode any material above 20 kHz (such material is in any case superfluous since it lies outside the limits of - good - human hearing). The codec does not support the classic 44.1 kHz CD sample rate, however most modern A/D convertors (in PC's for example) support 48 kHz by default. When transcoding from other formats 44.1 kHz can be upsampled to 48 kHz with no loss of material. File suffix is .opus when it is encapsulated in an ogg container which supports meta (tag) data - though other container formats could be supported, at this time (September 2012) none have been explicitly defined or announced. There is currently (September 2012) no file suffix for unencapsulated Opus files (though we note that .opu is apparently available for use).
WAVE	The term WAV/WAVE refers to the WAVE container and not a single codec. However, the term WAV codec is widely understood to mean lossless uncompresed audio files as captured from the source input. WAV audio data is stored in a raw PCM (LPCM) format. A CD also stores the audio in LPCM format (at a sample rate of 44.1KHz and a sample size of 16 bits) but uses a different file standard (The Red Book - IEC 60908). Almost all audio players support WAV format and indeed some normalize input formats into WAV before playing or transcoding. See also FLAC (a compressed, lossless audio format) and AIFF Apple's equivalent to WAV.

Containers

Containers are simply envelopes (or frameworks) that describe the contents of the audio/video material and typically, but not always, define which codec should be used to handle the encapsulated A/V data. Containers may, or may not, include a standard method of providing meta (tag) data.

AIFF and AIFF-C Container

The AIFF and AIFF-C containers are based on the RIFF format and use a series of chunks, each with a unique Chunk ID, to describe various sections of the container. It is similar in format to the WAV container with one vital exception - AIFF/AIFF-C format containers use big-endian format, whereas WAV containers use little-endian multi-byte format.

The basic AIFF format does not provide a codec field in the COMM chunk, whereas the AIFF-C format does. Neither the AIFF nor AIFF-C containers allow for the definition of meta (tag) information though the use of the ID3 chunk is widespread.

A number of other chunk types are defined within the AIFF/AIFF-C specifications and typically have specialized usage. In addition third parties have defined chunks for a variety of purposes.

An AIFF container must have at least a FORM chunk, a COMM chunk and a SSND chunk. Theoretically the chunks can be in any order but this leads to very inefficient file processing.

FORM Chunk (12 bytes):

Offset	Length	Contents	Notes
0	4	RIFF	Chunk ID. 4 ASCII characters "FORM" identifying the container definition chunk
4	4	Chunk Length	In the FORM case this should be the entire length of the file from the end of this field on (and is thus the total file size minus 8).
8	4	AIFF (or AIFC)	4 ASCII chars "AIFF" to identify the AIFF container type or "AIFC" to identify the AIFF-C container type. NOTE: In AIFF and AIFF-C containers all multi-byte values are in big-endian order.

COMM Chunk:

Offset	Length	Contents	Notes
0	4	COMM	Chunk ID. 4 ASCII characters "COMM" identifying a Common (sound characteristics) chunk
4	4	Chunk Length	In the COMM case (and all other non-FORM chunks) this is the entire length of this chunk from the end of this field on (that is the length value does NOT include the chunk ID and its length). Chunks are always multiples of word (16 bits or 2 bytes) so if the data enclosed in the chunk is an odd number a pad byte must be added. The chunk lenth does NOT include this pad byte. The first 8 bytes of each chunk are standard (except as noted for the FORM chunk). Software which does not recognize a chunk ID simply adds the chunk length value (rounded up to the next word multiple) to skip over the chunk. In the case of writing AIFF/AIFF-C files the chunk length is known and the chunk data should be treated as raw (binary) data.
8	2	Channels	AIFF and AIFF-C. The number of channels 1 = mono, 2 = stereo (order: left, right), 3 channels (order: left, right, center), 4 channels (order: left, center, right, surround), 6 channel (order: left, left center, center, right, right center, surround
10	4	Sample Frames	AIFF and AIFF-C. The number of sample frames in the SSND chunk such that Sample Frames * Channels will give the size of SSND chunk.
14	2	Bits per sample	AIFF and AIFF-C. A.k.a. Sample Size. The number of significant bits in each sample. Typically 8, 16, 24, 32. If an non-byte value is used (12 and 20 are also common) then the sample is placed in a 2 or 3 byte aligned field with the top bits set to zero. Thus a 12 bit sample will be placed in a 2 byte field and the top 4 bits set to 0. Samples are always integral byte normalized. If AIFF-C is being used with compressed data this is the bit size before compression.
16	10	Sample Rate	AIFF and AIFF-C. 80 bit IEEE Standard 754 floating point number in Hz thus 44.1KHz will be 44100
26	4	CODEC	The CODEC type consisting of 4 characters and may take the following values NONE (big-endian uncompressed LPCM data in SSND chunk) ff32 (32 bit floating point) ff64 (64 bit floating point) alaw (a-law encoded Logarithmic PCM (ITU G.711)) ulaw (μ-law encoded Logarithmic PCM (mu-law) (ITU G.711)) sowt (little-endian uncompressed LPCM data in SSND chunk) There are other codec types supported but there appears to be no definitive list available from Apple or any web resource.
30	-	string	A string of characters that may be used to generate a human readable message describing the codec. If not present the value x'0 should be used. If this string is present but contains an odd number of characters it should be padded with a x'0 byte.

FVER Chunk (12 bytes):(Mandatory for AIFF-C files)

Offset	Length	Contents	Notes
0	4	FVER	Chunk ID. 4 ASCII characters "FVER" identifying a Format Version chunk. The FVER chunk must be present for AIFF-C containers and must NOT be present for AIFF containers.
4	4	Chunk Length	In the FVER case (and all other non-FORM chunks) this is the entire length of the this chunk from the end of this field on. Chunks are always multiples of word (16 bits or 2 bytes) so if the data enclosed in the chunk is an odd number a pad byte must be added. The chunk lenth does NOT include this pad byte. The first 8 bytes of each chunk are standard (except as noted for the FORM chunk). Software which does not recognize a chunk ID simply adds the chunk length value to skip over the chunk. In the case or writing WAV files the chunk length is known and the chunk data should be treated as raw (binary) data.
8	4	Timestamp	The Apple format timestamp (seconds since midnight 1st January 1904) when the AIFF-C file was written.

SSND Chunk (variable):

Offset	Length	Contents	Notes
0	4	data	Chunk ID. 4 ASCII characters "SSND" identifying a sound chunk
4	4	Chunk Length	Chunks are always multiples of words (16 bits or 2 bytes) so if the data enclosed in the chunk is an odd number a pad byte must be added. The chunk lenth does NOT include this pad byte. The first 8 bytes of each chunk are standard (except as noted for the FORM chunk). Only one SSND chunk will be present in most cases and may cross physical block boundaries. In the SSND case this is the entire length of all the samples contained in the file. The data format is unique to the codec and described in the COMM chunk
8	-	-	The raw samples. Each sample consists of an intergral number of bytes and typically consists of interleaved samples captured at the same time from the channels thus, assuming a 2 channel audio stream with a 16 bit sample size the byte representation would look like 112211221122 etc.. Note: In AIFF containers (supporting only uncompressed LPCM data) 16 bit samples are stored in big-endian order. In AIFF-C containers with a CODEC value of NONE 16 bit samples are stored in big-endian order. In AIFF-C containers with a CODEC value of sowt 16 bit samples are stored in little-endian order. 8 bit LPCM is assumed to be unsigned data in the range 0 to 256 (center point is 128), 16 bit data is signed 2's compliment data in the range -32768 to 32767 (center point is 0). A 12 bit sample size is always moved into an integral number of bytes so becomes a 16 bit sample with 2's compliment values.

ADTS Container

Hardly worthy of the name container, the Audio Data Transport Stream is a vestigal envelope defined in MPEG-2 Part 7 (and used with file extension .aac) but optional in MPEG-4 (with file extensions .m4a or mp4), thus while it is permitted to encode MPEG-4 using ADTS not all MPEG-4 codecs can handle it. ADTS is optimized for streaming media, however it can be, and frequently is, also used for playback files. It does not provide a method to encode meta (tag) information nor does it provide any codec selection though it does, of necessity, allow selection of one of the many AAC options.

The ADTS header (7 or 9 bytes) is transmitted with every ADTS frame or packet (the header defines the frame length). In the table below bits are numbered left to right starting from 0.

Bit no.	Length in bits	Contents	Notes
0	12	Sync word	Set to all 1 values (x'fff) to indicate this is a ADTS header start.
12	1	ID	Indicates whether the source was MPEG-2 (1) or MPEG-4 (0).
13	2	layer	always 0 for ADTS.
15	1	Protection absense	1 indicates there is no CRC for the block, 0 indicates there is a CRC for the block.
16	2	profile	Indicates the type of audio being supplied and takes the value of the Audio Object Type minus 1. Thus from the list below (incomplete but covers the most common values) a value of x'2 in the ADTS header would indicate AAC SSR (not all the obects below are AAC types) 0 = Null (invalid) 1 = AAC Main 2 = AAC LC (Low Complexity) 3 = AAC SSR (Scalable Sample Rate) 4 = AAC LTP (Long Term Prediction) 5 = SBR (Spectral Band Replication) 6 = AAC Scalable 7 = TwinVQ 8 = ELP (Code Excited Linear Prediction) 9 = HXVC (Harmonic Vector eXcitation Coding) 10 = Reserved 11 = Reserved 12 = TTSI (Text-To-Speech Interface) 13: Main Synthesis 14: Wavetable Synthesis 15: General MIDI 16: Algorithmic Synthesis and Audio Effects 17: ER (Error Resilient) AAC LC 18: Reserved 19: ER AAC LTP 20: ER AAC Scalable 21: ER TwinVQ 22: ER BSAC (Bit-Sliced Arithmetic Coding) 23: ER AAC LD (Low Delay) 24: ER CELP 25: ER HVXC 26: ER HILN (Harmonic and Individual Lines plus Noise) 27: ER Parametric 28: SSC (SinuSoidal Coding) 29: PS (Parametric Stereo) 30: MPEG Surround 31: (Escape value) 32: Layer-1 33: Layer-2 34: Layer-3 35: DST (Direct Stream Transfer) 36: ALS (Audio Lossless) 37: SLS (Scalable LosslesS) 38: SLS non-core 39: ER AAC ELD (Enhanced Low Delay) 40: SMR (Symbolic Music Representation) Simple 41: SMR Main 42: USAC (Unified Speech and Audio Coding) (no SBR) 43: SAOC (Spatial Audio Object Coding) 44: LD MPEG Surround 45: USAC Most AAC codecs support a very small subset of the above list, which is in turn a small subset of the available audio objects in ADTS
18	4	Sample Rate	Index value from the following supported sample rates. Thus a value of x'4 = 44100 (44.1K samples per second). 0 = 96000 Hz 1 = 88200 Hz 2 = 64000 Hz 3 = 48000 Hz 4 = 44100 Hz 5 = 32000 Hz 6 = 24000 Hz 7 = 22050 Hz 8 = 16000 Hz 9 = 12000 Hz 10 = 11025 Hz 11 = 8000 Hz 12 = 7350 Hz 13 = Reserved 14 = Reserved 15 = invalid in ADTS
22	1	private	always 0 for ADTS.
13	2	layer	always 0 for ADTS.
23	3	Channels	0 = Defined in AOT Specifc Config (not valid for ADTS) 1 = 1 channel: front-center 2 = 2 channels: front-left, front-right 3 = 3 channels: front-center, front-left, front-right 4 = 4 channels: front-center, front-left, front-right, back-center 5 = 5 channels: front-center, front-left, front-right, back-left, back-right 6 = 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel (surround) 7 = 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel (surround) 8-15: Reserved (not valid)
26	1	origin	always 0 for ADTS.
27	1	home	always 0 for ADTS.
28	1	copyright	copyrighted stream (1), not copyrighted (0). If set whole stream copyrighted.
29	1	copyright start	copyrighted (1), not copyrighted (0). Allows for copy protection of only parts of the stream.
30	13	block length	always 0 for ADTS.
43	11	ADTS buffer fullness	-
45	2	raw data	no raw data in frame.
45	16	CRC	always 0 for ADTS.

OGG Container

OGG Container format used for FLAC, Orbis and Opus audio files.

WAVE Container

IBM and Microsoft standard but has been extended by others. The format of a WAV container is a subset of the generic RIFF (Resource Interchange File Format - also used by AIFF files but with different chunk IDs) format as shown below. The current specification is version 3.0 though an extended multi-channel format is available. The basic definition consists of three standard 'Chunks' - the HEADER chunk (identifying the file type), the FORMAT chunk (identifying key characteristics of the payload) and the DATA chunk containing the data (or file payload). If a non PCM codec is being used a FACT chunk must be included. Due to the format of the file additional chunks can be added. Software that knows about such chunks can handle them, but sensible software that does not can simply skip over unrecognized chunks until it reaches the audio data (in the data chunk. There are a number of well known chunks (typically used to incorporate meta data (tags)) and the chunk we use to incorporate OGG COMMENT tags in both WAV and AIFF format files.

HEADER Chunk (12 bytes):

Offset	Length	Contents	Notes
0	4	RIFF	Chunk ID. 4 ASCII characters "RIFF" identifying the container definition chunk
4	4	Chunk Length	In the HEADER case this should be the entire length of the file from the end of this field on (and is thus the total file size minus 8).
8	4	WAVE	4 ASCII chars "WAVE" to identify the container type

NOTE: All multi-byte values are in little-endian order. If big-endian (network) order is being used the value RIFX will replace RIFF in the HEADER chunk.

FORMAT Chunk (24+ bytes):

Offset	Length	Contents	Notes
0	4	fmt	Chunk ID. 4 ASCII characters "fmt " identifying a format chunk
4	4	Chunk Length	In the FORMAT case (and all other non-HEADER chunks) this is the entire length of this chunk from the end of this field on (that is the length value does NOT include the chunk ID and its length). Chunks are always multiples of word (16 bits or 2 bytes) so if the data enclosed in the chunk is an odd number a pad byte must be added. The chunk lenth does NOT include this pad byte. The first 8 bytes of each chunk are standard (except as noted for the HEADER chunk). Software which does not recognize a chunk ID simply adds the chunk length value (rounded up to the next word multiple) to skip over the chunk. In the case or writing WAV files the chunk length is known and the chunk data should be treated as raw (binary) data. In the case of uncompressed LPCM (CODEC = 1) this value is always 16 (x'10), but additional FORMAT data can be added (CODEC dependent) and is reflected in the size value of this field.
8	2	CODEC	The CODEC type 00 = unknown 01 = uncompressed LPCM 06 = a-law encoded Logarithmic PCM (ITU G.711) 07 = μ-law encoded Logarithmic PCM (mu-law) (ITU G.711) 20 = ITU G.723 ADPCM 49 = GSM 6.10 64 = ITU G.721 ADPCM There is a remarkably complete list of types at this site.
10	2	Channels	The number of channels 1 = mono, 2 = stereo
12	4	Sample Rate	in Hz thus 44.1KHz will be 44100
16	4	Average Bytes per second	Sample Rate * bytes per sample
20	2	Bytes per sample	Includes all channels. Assuming the sample size is 16 and there are two channels this would be 4. However a 16 bit mono sample and an 8 bit stereo sample would both give the value 2. The next field is used to disambiguate the two cases.
22	2	Bits per sample	A.k.a. Sample Size. The number of significant bits in each sample. Typically 8, 16, 24, 32. If an non-byte value is used (12 and 20 are also common) then the sample is placed in a 2 or 3 byte aligned field with the top bits set to zero. Thus a 12 bit sample will be placed in a 2 byte field and the top 4 bits set to 0. Samples are always integral byte normalized.
24	-	Additional Data	Optional CODEC dependent headers - not present for CODEC type 1 (uncompressed LPCM)

FACT Chunk (variable):

Offset	Length	Contents	Notes
0	4	fact	Chunk ID. 4 ASCII characters "fact" identifying a fact chunk. Fact chunks are required if the value of the CODEC field in the format chunk is NOT uncompressed LPCM (01).
4	4	Chunk Length	In the FORMAT case (and all other non-HEADER chunks) this is the entire length of the this chunk from the end of this field on. Chunks are always multiples of word (16 bits or 2 bytes) so if the data enclosed in the chunk is an odd number a pad byte must be added. The chunk lenth does NOT include this pad byte. The first 8 bytes of each chunk are standard (except as noted for the HEADER chunk). The first 8 bytes of each chunk are standard (except as noted for the HEADER chunk). Software which does not recognize a chunk ID simply adds the chunk length value to skip over the chunk. In the case or writing WAV files the chunk length is known and the chunk data should be treated as raw (binary) data.
8	4	Samples	The number of samples in the file.

DATA Chunk (variable):

Offset	Length	Contents	Notes
0	4	data	Chunk ID. 4 ASCII characters "data" identifying a data chunk
4	4	Chunk Length	Chunks are always multiples of word (16 bits or 2 bytes) so if the data enclosed in the chunk is an odd number a pad byte must be added. The chunk lenth does NOT include this pad byte. The first 8 bytes of each chunk are standard (except as noted for the HEADER chunk). Only one DATA chunk will be present in most cases and may cross physical block boundaries. In the DATA case this is the entire length of all the samples contained in the file. The data format is unique to the codec and described in the FORMAT chunk
8	-	-	The raw samples. Each sample consists of an intergral number of bytes and typically consists of interleaved samples captured at the same time from each of the channels thus, assuming a 2 channel audio stream with a 16 bit sample size the byte representation would look like 112211221122 etc.. Note: 16 bit samples are stored in little-endian order. It is thus possible to start playing a WAV file immediately without reading the whole file. The end of file is reached when the value in the DATA chunk length field is exhausted. 8 bit LPCM is assumed to be unsigned data in the range 0 to 256 (center point is 128), 16 bit data is signed 2's compliment data in the range -32768 to 32767 (center point is 0). A 12 bit sample size is always moved into an integral number of bytes so becomes a 16 bit sample with 2's compliment values.

OGG COMMENT Chunk Format

OGG Comment Chunk (variable). This is a completely non-standard chunk which we use in WAVE and AIFF files to encapsulate Ogg style comments. Serious implementations of AIFF or WAV playback systems will skip unknown chunks (the container format allows for this). Crummy implementations will croak:

Offset	Length	Contents	Notes
0	4	ogg	4 ASCII characters "ogg " identifying an ogg format comment chunk
4	4	Chunk Length	Variable. The entire length of the chunk from the end of this field to end. The following information defines a standard (not extended) Ogg comment field. When transcoding to an ogg file it would normally be parsed and each comment added via the appropriate library function e.g. vorbis_comment_add_tag.
8	4	Vendor string length	The length of the vendor string (normally the library reference being used to encode the file) and is not the same as any ENCODER=data string. It is permissable to have a length of zero which indicates there is no vendor string.
12	V	Vendor string of length defined by vendor string length	UTF-8 with no termination nulls. It is a character string, not a C string.
V	4	Total Length of all Comments	Sum of all comments such that adding this value to the end of this field will skip all comments
V	4	Comment Length	Defines the length of the following comment string in name=data format
V	V	Comment string	UTF-8 string. This is a character string and is not null terminated. It is not a C string.
Repeated as many times as defined by Total Length of all Comments field.

Well Known Chunks

There are a number of other chunk types that can appear in either WAV or AIFF containers. Because of the file format used in both containers serious software can easily skip over unrecognized chunks using the chunk length field. If files are being copied, sensible and serious software should always write all chunks even if does not support/recognize them thus the possiblity of possibly alien chunk loss should be remote. The following is an incomplete list of additional chunks:

iXML: The chunk ID "iXML" indicates the presence of an iXML frame a widely supported open, freely available meta data (tag) specification based on XML used by many audio equipment manufacturers.
ID3: The chunk ID "ID3 " indicates the presence of a standard ID3v2.3 or v2.4 tag frame as used in mp3 files.
BWF: The Broadcast Wave Format defined by European Broadcast Union (EBU) Technical Report 3285 defines a number of chunk types for use in WAV format files.

Meta Data - Audio Tags

It is a great idea. The media/audio file is self describing by containing tagging or meta data. You send the audio file. The player can both play the file and extract a ton of meta data like who is performing, the name of the work etc. No need to supply separate identification information. However, as you will see below the world has gone a bit crazy with a huge number of tags being made available or proposed. In our view the whole idea has gone off the rails. Audio files are not databases. Instead, again in our view, the media/audio file should contain essential and summary infomation only - perhaps limited to identification (artist/title), rights, and maybe audio parameters (gain etc.). Everything else should be defined in a simple MORE= tag containing the URL of a networked resource which can supply unlimited information about the material. This is our proposal.

MP3 Meta Data (ID3)

The MP3 standard does not use a container format thus does not provide the ability to encode meta (tag) data. What are colloquially known as ID3 tags came into existence - bolted on afterwards and now almost universally supported - to allow mp3 files to describe the contents of the audio file. Initially released as version 1 now updated to the completely different version 2.x format. Currently this is at version 2.4, though 2.3 still seems the most widely implemented. The list of tags defined by www.id3.org is pretty daunting (currently 94 and counting) and includes the shoe sizes of all the musicians involved. In practice, only a trivial subset is widely supported - perhaps 10 - 20 - the other tags tend to be for specialized usage. So now you want to transcode from/to MP3, WMA and Vorbis or whatever. And the world suddenly gets a lot murkier.

Ogg Meta Data

Ogg does not support ID3. Instead it uses a flexible and extensible name=data tag format (unfortunately generically called a comment). Ogg are currently promoting a list of name types. The most important characteristic about this list is that is seems to be widely ignored (see below) - in some cases because the list does not have an ID3 (still the dominant standard) equivalent. FLAC - from the Ogg stable also uses the Vorbis comment format. Numerous alternative proposals exist on the web for uses of this flexible tag format and taken together they make the 94 items of ID3 look very modest indeed. Never-the-less, IOHO, it's a more useful format than ID3 and has the overriding merit of being human readable even if the player/reader software is unable to interpret the tags. An extended comment field format to include images has been made available by Ogg (Ogg Extended comment field format proposal).

Wave/AIFF Meta Data

WAVE and AIFF both use the same RIFF format which allows for new chunks to be added without breaking the container format - unless, of course, your player/reader is unbelievably stupid. Software can just skip over the unknown chunks so at least the audio file can be played. Thus it is possible to embed any tag format using a possibly proprietary chunk format. However, when transcoding the player/writer must understand the meta data format in order to map it to an equivalent format. Now this is where it gets really stupid. AIF defines an ID3 chunk whereas Wave does not. WAVE has the Broadcast WAVE Format (BWF) which is optimized for, as you would expect , broadcasters and does allow a meta (tag) data but this has very specialized data for use by broadcasters and is not equivalent, in any way, to typical ID3 tags.

While there are differences between WAVE and AIF these are mostly in the area of extended capabilities. We use a common CODEC for both thus, as far as we are concerned, WAVE has embedded id3 capability using the ID3 chunk format. However, for reasons mentioned we like the flexibility of OGG and have started to make serious use of both Vorbis and FLAC files. To handle this problem we use the OGG chunk format defined above to encapsulate the standard Ogg comment fields when we are forced to use either AIF or WAV files. And if you need to export the files. Add both the id3 and ogg chunks. Both .aif(.aiff) and .wav files are pretty big - what's a few (hundred) extra K here or there.

MP4 Meta Data

The standard MP4 container (used in .mp4 and .m4a files) allows definition of meta (tag) data within the container.

Placeholder - additional detail to be added

AAC Meta Data

When AAC audio is encapsulated in an MP4 container (MPEG-4 Part 3) using the extensions .mp4 and .m4a meta (tag) data is allowed. However, AAC can also be encapsulated (as MPEG-2 Part 7) in raw (no container), or ADIF or ADTS containers (all with file extension .aac) and in this case meta (tag) data is not defined by the container format.

Placeholder - additional detail to be added

WMA Meta Data

The standard WMA container provides a Content Description Object which allows a subset of meta (tag) data to be defined and additionally provides a flexible Extended Content Description Object which allows almost unlimited meta (tag) data.

Placeholder - additional detail to be added

Meta Tag Mapping

The following table shows of the widely supported ID3 tags and one possible mapping to Ogg container alternatives which could be used during transcoding.

Tag	Purpose	MP3 Writers	Vorbis/FLAC
Tag/Meta format	-	ID3	Vorbis comments. Free format value=data
TIT2 Tag	Work Title	Widely	TITLE=data
TPE1 Tag	Artist Name	Widely	ARTIST=data
TRCK Tag	Track Number	Widely	TRACKNUMBER=data.
TALB Tag	Album name	Widely	ALBUM=data
TDRC Tag	Year of Recording	Widely supported but mostly used for publication date not recording date. No consistency.	No real equivalent in Ogg list so widespread use of YEAR=data. Nearest Ogg value is DATE=data which is the day, month and year of recording.
TDAT Tag	Day and month of recording (DDMM)	Not widely supported/used.	No real equivalent in Ogg list. Nearest Ogg value is DATE=data which is the day, month and year of recording.
TENC Tag	Encoded by	Widely supported/used.	No equivalent in Ogg list so widespraed use of ENCODED=data.
COMMENTS Tag	Non specific use	Widely	No real equivalent in Ogg list so COMMENTS=data seems widely used. Nearest Ogg value is perhaps DESCRIPTION=data
TCON Tag	Genre name	Widely	GENRE=data
TCOP Tag	Copyright text	Varies	COPYRIGHT=data

Other tags seem to be supported to a greater or lesser extent, depending on the software. Certainly you can be sure of only one thing - inconsistency.

Problems, comments, suggestions, corrections (including broken links) or something to add? Please take the time from a busy life to 'mail us' (at top of screen), the webmaster (below) or info-support at zytrax. You will have a warm inner glow for the rest of the day.