mail us  |  mail this page

products  |  company  |  support  |  training  |  contact us

Digital Audio/Video - Files, Codecs and Containers

We can never find stuff in one place - here is a collection of audio/video file formats together with some basic notes and links to additional information. Helped us, perhaps it will help you. If not .... well ....

Contents

Overview - Is it a file, a container or a codec
File Types and Extensions
Codecs
Audio/Video Containers
Meta or Tag Data

Is is a File, a Container or a Codec

So you have a sound file with the name sound.mp3 or sound.wav - does this describe a file format, a container format or a codec?

It depends on the file extension. For example the file sound.mp3 contains MP3 data that can only be interpreted and played by a MP3 codec (except, confusingly, it can also contain an ID3 tag frame at the begining or the end of the file). Thus a file with the extension .mp3 has a file format specific to a single, in this case mp3, codec - it does not have a container format.

So what is a container? A container is a standardized envelope that typically includes fields that indicate which codec should be used to play the enclosed audio/video material and may or may not contain a format to decsribe meta (tag) data. As an example, the file sound.wav uses a WAVE container in which the codec to be used is indicated in the format chunk's CODEC field. Thus an application could read a .wav file and select from a number of different codecs to handle the audio material in the data chunk (each codec would clearly have to be able to interpret the data).

So what is a codec? A codec is a widely used generic term used to describe software that handles a specific audio data format. The term originally was a shortened version of coder/decoder and thus refered to software that could provide both encoding (writing) functions and decoding (playback) functions. Modern codecs rarely provide both capabilities instead what is called an mp3 codec is typically an mp3 decoder used to playback mp3 sound files while if the software also supports creation (writing) of mp3 files it will typically use a separate mp3 encoder. The distinction is only important is you are trying to transcode A/V files.

The following section on AV file extensions defines whether files are codec specific or use a container.

File Types and Extensions

The following table is a (currently incomplete) list of commonly used file extensions. It gives a brief description of the contents, codecs and any container formats used.

.aiff (.aif/.aifc) Apple Audio Interchange File Format standard. The latest AIFF specification is dated 1989, however this a tad confusing because there is a newer AIFF-C (AIFF - Compressed) specification. AIFF uses an AIFF container but in its basic form this does not allow encoding of meta (tag) data or even selection of codec and is therefore assumed to contain uncompressed audio (LPCM) data that can be proccessed by a AIFF codec. However, files which use the .aiff extension can also be in AIFF-C format (occasionally they use the .aifc extension) which provides important sigificant extensions. The AIFF-C extension format does allow for a codec type field in the COMM chunk. AIFF files are widely used by sound professionals.
.flac (.fla) xiph.org open standard. Uses the ogg container format which does include the ability to encode meta (tag) data. File contains audio data in Free Lossless Audio Codec (FLAC) format which can only be used by an FLAC encoder or decoder (though since this uses a container it could, theorectically, contain other formats). All Xiph.org codecs are royalty-free and open source.
.mp3 International standard. No container format. File contains audio data in MPEG-1/2 Audio Layer 3 (MP3) format which can only be used by an MP3 encoder or decoder. MP3 files now widely contain an ID3 tag frame used for meta (tag) data such as author, artist etc. The ID3 frame format, while being open, is not part of the MPEG-1/2 standards. It was adopted by popular demand and has become a de-facto standard used in other file and container types as well.
.ogg Xiph.org open standard. Uses the ogg container format which does include the ability to encode meta (tag) data. File contains audio data in vorbis format which can only be used by an Ogg Vorbis encoder or decoder (though since this uses a container it could, theorectically, contain other formats). While colloqually known as ogg files or even an ogg codec techically ogg is the container and vorbis the codec in .ogg format files. All Xiph.org codecs are royalty-free and open source.
.wav IBM and Microsoft audio only standard. Uses a rudimentary RIFF container format which does not include meta (tag) data - though a number of extensions have been added by various groups. While the file may contain audio data in variety of formats the almost universal use of .wav files is to contain uncompressed audio data.

Codecs

A codec (originally short for coder/decoder) is a software program or library that knows how to handle audio/visual material in a specific format.

The crucial difference in the various formats is between lossy and lossless. In a lossy format some of the orginal source material is discarded using a variety of sophisticated algorithms to retain as much as possible of the original sound quality (typically using psychoacoustics models). File sizes for loosy formats are typically 10:1 smaller than the original source material. Lossless formats as the name suggests retain all the source material. There are now a number of compressed lossless formats, such as FLAC, though file sizes tend to be only ~2:1 smaller than the original. Some additional points to note: open or proprietary standards and whether or not there are patent/royalty issues involved.

AAC+ a.k.a HE-AAC. AAC+ (Offically MPEG-4 High-Efficiency Advanced Audio Coding) is a compressed lossy audio format alternative to AAC and standardized under MPEG-4. It provides support for the same range of sampling bit rates (8, 16, 22.05, 24, 32, 44.1, 48 and 96K) as AAC but is designed to be more efficient, especially at lower bit rates, and hence provide a higher quality at any given bit rate than AAC. It supports the same bit rates as AAC and MP3 (32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160, 192, 224, 256 and 320K bits per second) as well as variable bit rate output. There are Patent/Royalty issues involved with HE-AAC. To make matters a little more interesting there are two version of HE-AAC being HE-AACv1 and HE-AACv2.
AAC AAC (Offically MPEG-4 Advanced Audio Coding) is a compressed, lossy, audio format designed to supersede MP3. It provides support for a wider range of sampling bit rates (8, 16, 22.05, 24, 32, 44.1, 48 and 96K) and is designed to more efficient at lower bit rates. It supports essentially the same bit rates as MP3 (32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160, 192, 224, 256 and 320K bits per second) as well as variable bit rate output. There are no patent/royalty issues involved with AAC. AAC, while supported by a large number of vendors, is most commonly associated Apple (iPod).
AIFF AIFF files use a AIFF container which in its basic form does not support a codec type. AIFF (Audio Interchange File Format) is an uncompressed, lossless audio format developed by Apple and used extensively on the Mac range of computers (as well as others). AIFF audio data is stored in a raw PCM (LPCM) format. A CD also stores the audio in LPCM format (at a sample rate of 44.1KHz and a sample size of 16 bits) but uses a different file standard (The Red Book - IEC 60908). AIFF may be viewed as Apple's equivalent of WAV though Apple would doubtless claim significant advantages for their format. The standard file extension is .aiff (or .aif). Confusingly, files with this .aiff (.aif) suffix can also contain an extended AIFF-C format container which is significantly different.

The SDII (Sound Designer II) audio file format is also widely supported and used on Macs.

FLAC The FLAC codec data is enclosed in an Ogg container which also provides the ability to encode meta (tag) data. FLAC (Free Lossless Audio Codec) is a compressed but lossless audio format developed by Xiph.org a non-profit organization dedicated to providing open, royalty/patent-free open source standards and software. Compressed, lossless codecs (audio format) typically achieve file size reductions of ~2:1 with no loss of the source material. A FLAC file may be viewed crudely as a compressed WAV file. FLAC is used in both high-quality playback systems and especially archival applications due to the significant reduction in storage requirements. Xiph.org also provide Theora (video compression standard), speex (a variable bit rate CODEC for low-latency VoIP) and Ogg Vorbis (a compressed, lossy MP3 alternative).
MP3 MPEG-1/2 Audio Layer 3 or MP3 for short. A compressed and lossy (stuff is lost from the original recording) standard for storing audio data. The bit rate (not to be confused with the sampling rate) bit rate (32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160, 192, 224, 256 and 320K bits per second) determines how much data is discarded and therefore the resulting sound quality and file size. The lower the bit rate the lower the audio quality and the smaller the file size. Most systems use a 128K bit rate (increasingly moving to higher rates such as 192K) which gives what is sometimes called radio or FM quality and typically results in file that is 10:1 smaller than if stored in uncompressed format. For comparison uncompressed CD data has a bit rate of 1,411.2 kbit/s. Sampling bit rates supported are 16K, 22.05K, 24K, 32K, 44.1K and 48K. While the standard is developed by Internationally recognized bodies there are patent issues related to MP3 technology (Fraunhoffer Institute). You need to buy the MP3 specification and license the resulting products. Oooh. Though there are multiple readers available and in practice the file format is widely known and understood. File suffix is .mp3.
Ogg/Ogg Vorbis/Vorbis The quaintly named Ogg Vorbis (colloquially shortened to just ogg) technically defines a vorbis codec encapsulated an ogg container. Vorbis is a compressed, lossy, variable bit rate standard (from 45K to 500K) developed by Xiph.org Foundation a non-profit organization dedicated to providing open, royalty/patent-free open source standards and software. There are now a variety of native applications and plug-ins available to support vorbis codecs for many popular players. The vorbis web site claims that the Ogg Vorbis standard is competitive with MPEG-4 (AAC), and similar to, but higher performance than MPEG-1/2 audio layer 3 (MP3), MPEG-4 audio (TwinVQ), WMA and PAC. The quality (and therefore file size) of the audio stream is based on a Q factor corresponding to:
-1 45 kbit/s
0 64 kbit/s
1 80 kbit/s
2 96 kbit/s
3 112 kbit/s
4 128 kbit/s
5 160 kbit/s
6 192 kbit/s
7 224 kbit/s
8 256 kbit/s
9 320 kbit/s
10 500 kbit/s
Sampling bit rates are theorectically variable but due to equipment availability will typically be the same as for MP3 (16K, 22.05K, 24K, 32K, 44.1K and 48K). Xiph.org also provide Theora (video compression standard), speex (a variable bit rate CODEC for low-latency speech such as VoIP) and FLAC (a compressed, lossless audio standard). And not a patent in sight. File suffix is normally .ogg.
WAVE

The term WAV/WAVE refers to the WAVE conatiner and not a single codec. However, the term WAV codec is widely understood to mean lossless uncompresed audio files as captured from the source input. WAV audio data is stored in a raw PCM (LPCM) format. A CD also stores the audio in LPCM format (at a sample rate of 44.1KHz and a sample size of 16 bits) but uses a different file standard (The Red Book - IEC 60908). Almost all audio players support WAV format and indeed some normalize input formats into WAV before playing or transcoding. See also FLAC (a compressed, lossless audio format) and AIFF Apple's equivalent to WAV.

Containers

Containers are simply envelopes (or frameworks) that describe the contents of the audio/video material and typically, but not always, define which codec should be used to handle the encapsulated A/V data. Containers may or may not include a standard method of providing meta (tag) data.

AIFF and AIFF-C Container

The AIFF and AIFF-C containers are based on the RIFF format and use a series of chunks, each with a unique Chunk ID, to describe various sections of the container. It is similar in format to the WAV container with one vital exception - AIFF/AIFF-C format containers use big-endian format, whereas WAV containers use little-endian multi-byte format.

The basic AIFF format does not provide a codec field in the COMM chunk, whereas the AIFF-C format does. Neither the AIFF nor AIFF-C containers allow for the definition of meta (tag) information though the use of the ID3 chunk is widespread.

A number of other chunk types are defined within the AIFF/AIFF-C specifications and typically have specialized usage. In addition third parties have defined chunks for a variety of purposes.

An AIFF container must have at least a FORM chunk, a COMM chunk and a SSND chunk. An AIFF container must have at least a FORM chunk, a COMM chunk and a SSND chunk. Theoretically the chunks can be in any order but this leads to very inefficient file processing.

FORM Chunk (12 bytes):

Offset Length Contents Notes
0 4 RIFF Chunk ID. 4 ASCII characters "FORM" identifying the container definition chunk
4 4 Chunk Length In the FORM case this should be the entire length of the file from the end of this field on (and is thus the total file size minus 8).
8 4 AIFF (or AIFC) 4 ASCII chars "AIFF" to identify the AIFF container type or "AIFC" to identify the AIFF-C container type. NOTE: In AIFF and AIFF-C containers all multi-byte values are in big-endian order.

COMM Chunk:

Offset Length Contents Notes
0 4 COMM Chunk ID. 4 ASCII characters "COMM" identifying a Common (sound characteristics) chunk
4 4 Chunk Length In the COMM case (and all other non-FORM chunks) this is the entire length of this chunk from the end of this field on (that is the length value does NOT include the chunk ID and its length). Chunks are always multiples of word (16 bits or 2 bytes) so if the data enclosed in the chunk is an odd number a pad byte must be added. The chunk lenth does NOT include this pad byte. The first 8 bytes of each chunk are standard (except as noted for the HEADER chunk). Software which does not recognize a chunk ID simply adds the chunk length value (rounded up to the next word multiple) to skip over the chunk. In the case of writing AIFF/AIFF-C files the chunk length is known and the chunk data should be treated as raw (binary) data. In the case of uncompressed LPCM (CODEC = 1) this value is always 16 (x'10), but additional FORMAT data can be added (CODEC dependent) and is reflected in the size value of this field.
8 2 Channels AIFF and AIFF-C. The number of channels 1 = mono, 2 = stereo
10 4 Sample Frames AIFF and AIFF-C. The number of sample frames in the SSND chunk such that Sample Frames * Channels will give the size of SSND chunk.
14 2 Bits per sample AIFF and AIFF-C. A.k.a. Sample Size. The number of significant bits in each sample. Typically 8, 16, 24, 32. If an non-byte value is used (12 and 20 are also common) then the sample is placed in a 2 or 3 byte aligned field with the top bits set to zero. Thus a 12 bit sample will be placed in a 2 byte field and the top 4 bits set to 0. Samples are always integral byte normalized. If AIFF-C is being used with compressed data this is the bit size before compression.
16 10 Sample Rate AIFF and AIFF-C. 80 bit IEEE Standard 754 floating point number in Hz thus 44.1KHz will be 44100
26 4 CODEC

The CODEC type consisting of 4 characters and may take the following values
NONE (big-endian uncompressed LPCM data in SSND chunk)
ff32 (32 bit floating point)
ff64 (64 bit floating point)
alaw (a-law encoded Logarithmic PCM (ITU G.711))
ulaw (μ-law encoded Logarithmic PCM (mu-law) (ITU G.711))
sowt (little-endian uncompressed LPCM data in SSND chunk)

There are other codec types supported but there appears to be no definitive list available from Apple or any web resource.

30 - string A string of characters that may be used to generate a human readable message describing the codec. If not present the value x'0 should be used. If this string is present but contains an odd number of characters it should be padded with a x'0 byte.

FVER Chunk (12 bytes):(Mandatory for AIFF-C files)

Offset Length Contents Notes
0 4 FVER Chunk ID. 4 ASCII characters "FVER" identifying a Format Version chunk. The FVER chunk must be present for AIFF-C containers and must NOT be present for AIFF containers.
4 4 Chunk Length In the FORMAT case (and all other non-HEADER chunks) this is the entire length of the this chunk from the end of this field on. Chunks are always multiples of word (16 bits or 2 bytes) so if the data enclosed in the chunk is an odd number a pad byte must be added. The chunk lenth does NOT include this pad byte. The first 8 bytes of each chunk are standard (except as noted for the FORM chunk). The first 8 bytes of each chunk are standard (except as noted for the HEADER chunk). Software which does not recognize a chunk ID simply adds the chunk length value to skip over the chunk. In the case or writing WAV files the chunk length is known and the chunk data should be treated as raw (binary) data.
8 4 Timestamp The Apple format timestamp (seconds since midnight 1st January 1904) when the AIFF-C file was written.

SSND Chunk (variable):

Offset Length Contents Notes
0 4 data Chunk ID. 4 ASCII characters "SSND" identifying a sound chunk
4 4 Chunk Length Chunks are always multiples of words (16 bits or 2 bytes) so if the data enclosed in the chunk is an odd number a pad byte must be added. The chunk lenth does NOT include this pad byte. The first 8 bytes of each chunk are standard (except as noted for the HEADER chunk). Only one DATA chunk will be present in most cases and may cross physical block boundaries. In the DATA case this is the entire length of all the samples contained in the file. The data format is unique to the codec and described in the COMM chunk
8 - -

The raw samples. Each sample consists of an intergral number of bytes and typically consists of interleaved samples captured at the same time from the channels thus, assuming a 2 channel audio stream with a 16 bit sample size the byte representation would look like 112211221122 etc.. Note: In AIFF containers (supporting only uncompressed LPCM data) 16 bit samples are stored in big-endian order. In AIFF-C containers with a CODEC value of NONE 16 bit samples are stored in big-endian order. In AIFF-C containers with a CODEC value of sowt 16 bit samples are stored in little-endian order.

8 bit LPCM is assumed to be unsigned data in the range 0 to 256 (center point is 128), 16 bit data is signed 2's compliment data in the range -32768 to 32767 (center point is 0). A 12 bit sample size is always moved into an integral number of bytes so becomes a 16 bit sample with 2's compliment values.

WAVE Container

IBM and Microsoft standard but has been extended by others. The format of a WAV container is a subset of the generic RIFF (Resource Interchange File Format - also used by AIFF files but with different chunk IDs) format as shown below. The current specification is version 3.0 though an extended multi-channel format is available. The basic definition consists of three standard 'Chunks' - the HEADER chunk (identifying the file type), the FORMAT chunk (identifying key characteristics of the payload) and the DATA chunk containing the data (or file payload). If a non PCM codec is being used a FACT chunk must be included. Due to the format of the file additional chunks can be added. Software that knows about such chunks can handle them, but sensible software that does not can simply skip over unrecognized chunks until it reaches the audio data (in the data chunk. There are a number of well known chunks (typically used to incorporate meta data (tags)) and the chunk we use to incorporate OGG COMMENT tags in both WAV and AIFF format files.

HEADER Chunk (12 bytes):

Offset Length Contents Notes
0 4 RIFF Chunk ID. 4 ASCII characters "RIFF" identifying the container definition chunk
4 4 Chunk Length In the HEADER case this should be the entire length of the file from the end of this field on (and is thus the total file size minus 8).
8 4 WAVE 4 ASCII chars "WAVE" to identify the container type

NOTE: All multi-byte values are in little-endian order. If big-endian (network) order is being used the value RIFX will replace RIFF in the HEADER chunk.

FORMAT Chunk (24+ bytes):

Offset Length Contents Notes
0 4 fmt Chunk ID. 4 ASCII characters "fmt " identifying a format chunk
4 4 Chunk Length In the FORMAT case (and all other non-HEADER chunks) this is the entire length of this chunk from the end of this field on (that is the length value does NOT include the chunk ID and its length). Chunks are always multiples of word (16 bits or 2 bytes) so if the data enclosed in the chunk is an odd number a pad byte must be added. The chunk lenth does NOT include this pad byte. The first 8 bytes of each chunk are standard (except as noted for the HEADER chunk). Software which does not recognize a chunk ID simply adds the chunk length value (rounded up to the next word multiple) to skip over the chunk. In the case or writing WAV files the chunk length is known and the chunk data should be treated as raw (binary) data. In the case of uncompressed LPCM (CODEC = 1) this value is always 16 (x'10), but additional FORMAT data can be added (CODEC dependent) and is reflected in the size value of this field.
8 2 CODEC The CODEC type
00 = unknown
01 = uncompressed LPCM
06 = a-law encoded Logarithmic PCM (ITU G.711)
07 = μ-law encoded Logarithmic PCM (mu-law) (ITU G.711)
20 = ITU G.723 ADPCM
49 = GSM 6.10
64 = ITU G.721 ADPCM

There is a remarkably complete list of types at this site.

10 2 Channels The number of channels 1 = mono, 2 = stereo
12 4 Sample Rate in Hz thus 44.1KHz will be 44100
16 4 Average Bytes per second Sample Rate * bytes per sample
20 2 Bytes per sample Includes all channels. Assuming the sample size is 16 and there are two channels this would be 4. However a 16 bit mono sample and an 8 bit stereo sample would both give the value 2. The next field is used to disambiguate the two cases.
22 2 Bits per sample A.k.a. Sample Size. The number of significant bits in each sample. Typically 8, 16, 24, 32. If an non-byte value is used (12 and 20 are also common) then the sample is placed in a 2 or 3 byte aligned field with the top bits set to zero. Thus a 12 bit sample will be placed in a 2 byte field and the top 4 bits set to 0. Samples are always integral byte normalized.
24 - Additional Data Optional CODEC dependent headers - not present for CODEC type 1 (uncompressed LPCM)

FACT Chunk (variable):

Offset Length Contents Notes
0 4 fact Chunk ID. 4 ASCII characters "fact" identifying a fact chunk. Fact chunks are required if the value of the CODEC field in the format chunk is NOT uncompressed LPCM (01).
4 4 Chunk Length In the FORMAT case (and all other non-HEADER chunks) this is the entire length of the this chunk from the end of this field on. Chunks are always multiples of word (16 bits or 2 bytes) so if the data enclosed in the chunk is an odd number a pad byte must be added. The chunk lenth does NOT include this pad byte. The first 8 bytes of each chunk are standard (except as noted for the HEADER chunk). The first 8 bytes of each chunk are standard (except as noted for the HEADER chunk). Software which does not recognize a chunk ID simply adds the chunk length value to skip over the chunk. In the case or writing WAV files the chunk length is known and the chunk data should be treated as raw (binary) data. In the case of uncompressed LPCM (CODEC = 1) this value is always 16 (x'10), but additional FORMAT data can be added (CODEC dependent) and is reflected in the size value of this field.
8 4 Samples The number of samples in the file.

DATA Chunk (variable):

Offset Length Contents Notes
0 4 data Chunk ID. 4 ASCII characters "data" identifying a data chunk
4 4 Chunk Length Chunks are always multiples of word (16 bits or 2 bytes) so if the data enclosed in the chunk is an odd number a pad byte must be added. The chunk lenth does NOT include this pad byte. The first 8 bytes of each chunk are standard (except as noted for the HEADER chunk). Only one DATA chunk will be present in most cases and may cross physical block boundaries. In the DATA case this is the entire length of all the samples contained in the file. The data format is unique to the codec and described in the FORMAT chunk
8 - -

The raw samples. Each sample consists of an intergral number of bytes and typically consists of interleaved samples captured at the same time from each of the channels thus, assuming a 2 channel audio stream with a 16 bit sample size the byte representation would look like 112211221122 etc.. Note: 16 bit samples are stored in little-endian order. It is thus possible to start playing a WAV file immediately without reading the whole file. The end of file is reached when the value in the DATA chunk length field is exhausted.

8 bit LPCM is assumed to be unsigned data in the range 0 to 256 (center point is 128), 16 bit data is signed 2's compliment data in the range -32768 to 32767 (center point is 0). A 12 bit sample size is always moved into an integral number of bytes so becomes a 16 bit sample with 2's compliment values.

OGG COMMENT Chunk Format

OGG Comment Chunk (variable). This is a completely non-standard chunk used for WAVE and AIFF files to encapsulate Ogg style comments:

Offset Length Contents Notes
0 4 ogg 4 ASCII characters "ogg " identifying an ogg format comment chunk
4 4 Chunk Length Variable. The entire length of the chunk from the end of this field to end. The following information defines a standard (not extended) Ogg comment field. When transcoding to an ogg file it would normally be parsed and each comment added via the appropriate library function e.g. vorbis_comment_add_tag.
8 4 Vendor string length The length of the vendor string (normally the library reference being used to encode the file) and is not the same as any ENCODER=data string. It is permissable to have a length of zero which indicates there is no vendor string.
12 V Vendor string of length defined by vendor string length UTF-8 with no termination nulls. It is a character string, not a C string.
V 4 Total Length of all Comments Sum of all comments such that adding this value to the end of this field will skip all comments
V 4 Comment Length Defines the length of the following comment string in name=data format
V V Comment string UTF-8 string. This is a character string and is not null terminated. It is not a C string.
Repeated as many times as defined by Total Length of all Comments field.

Well Known Chunks

There are a number of other chunk types that can appear in either WAV or AIFF containers. Because of the file format used in both containers software can easily skip over unrecognized chunks using the chunk length field. If files are being copied, sensible software should always write all chunks even if does not support them thus the possiblity of possibly alien chunk loss should be remote. The following is an incomplete list of additional chunks:

Meta Data - Audio Tags

It is a great idea. The audio file is self describing by containing tagging or meta data. You send the audio file. The player can both play the file and extract a ton of meta data like who is performing, the name of the work etc. No need to supply separate identification information. However, as you will see below the world has gone a bit crazy with a huge number of tags being made available or proposed. In our view the whole idea has gone off the rails. Audio files are not databases. Instead the audio file should contain essential and summary infomation only perhaps limited to identification, rights, perhaps audio parameters. Everything else should be defined in a simple MORE= tag containing a URL to a networked resource which can supply unlimited information about the material. This is our proposal.

MP3 Meta Data (ID3)

The MP3 standard does not use a container formar and thus does not provide the ability to encode meta (tag) data. What are colloquially known as ID3 tags came into existence - bolted on afterwards and now almost universally supported - to allow mp3 files to describe the contents of the audio file. Initially released as version 1 now updated to the completely different version 2.x format. Currently this is at version 2.4, though 2.3 still seems the most widely implemented. The list of tags defined by www.id3.org is pretty daunting (currently 94 and counting) and includes the shoe sizes of all the musicians involved. In practice only a trivial subset is widely supported - perhaps 10 - 20 - the other tags tend to be for specialized usage. So now you want to transcode from/to MP3, WMA and Vorbis or whatever. And the world suddenly gets a lot murkier.

Ogg Meta Data

Ogg does not support ID3. Instead it uses a flexible and extensible name=data tag format (unfortunately generically called a comment). Ogg are currently promoting a list of name types. The most important characteristic about this list is that is seems to be widely ignored (see below) - in some cases because the list does not have an ID3 (still the dominant standard) equivalent. FLAC - from the Ogg stable also uses the Vorbis comment format. Numerous alternative proposals exist on the web for uses of this flexible tag format and taken together they make the 94 items of ID3 look very modest indeed. Never-the-less, IOHO, it's a more useful format than ID3 and has the overriding merit of being human readable even if the player/reader software is unable to interpret the tags. An extended comment field format to include images has been made available by Ogg (Ogg Extended comment field format proposal).

Wave/AIFF Meta Data

WAVE and AIFF both use the same RIFF format which allows for new chunks to be added without breaking the container format - unless, of course, your player/reader is unbelievably stupid. Software can just skip over the unknown chunks so at least the audio file can be played. Thus it is possible to embed any tag format. However, when transcoding the player/writer must understand the meta data format in order to map it to an equivalent format. Now this is where it gets really stupid. AIF defines an id3 chunk whereas Wave does not. WAVE has the Broadcast WAVE Format (BWF) which is optimized for, as you would expect , broadcasters and is not really an alternative to normal audio file meta data.

While there are differences between WAVE abd AIF these are mostly in the area of extended capabilities. We use a common CODEC for both thus, as far as we are concerned, WAVE has embedded id3 capability. However, for reasons mentioned we like the flexibility of OGG and have started to make serious use of both Vorbis and FLAC files. To handle this problem we use the OGG chunk format defined above to encapsulate the standard Ogg comment fields. And if you need to export the files. Add both the id3 and ogg chunks. These are pretty big files - what's a few (hundred) K here or there.

MP4 Meta Data

AAC Meta Data

WMA Meta Data

Meta Tag Mapping

The following table shows of the widely supported ID3 tags and one possible mapping to alternative ones during transcoding.

Tag Purpose MP3 Writers Vorbis/FLAC
Tag/Meta format - ID3 Vorbis comments. Free format value=data
TIT2 Tag Work Title Widely TITLE=data
TPE1 Tag Artist Name Widely ARTIST=data
TRCK Tag Track Number Widely TRACKNUMBER=data.
TALB Tag Album name Widely ALBUM=data
TDRC Tag Year of Recording Widely supported but mostly used for publication date not recording date. No consistency. No real equivalent in Ogg list so widespread use of YEAR=data. Nearest Ogg value is DATE=data which is the day, month and year of recording.
TDAT Tag Day and month of recording (DDMM) Not widely supported/used. No real equivalent in Ogg list. Nearest Ogg value is DATE=data which is the day, month and year of recording.
TENC Tag Encoded by Widely supported/used. No equivalent in Ogg list so widespraed use of ENCODED=data.
COMMENTS Tag Non specific use Widely No real equivalent in Ogg list so COMMENTS=data seems widely used. Nearest Ogg value is perhaps DESCRIPTION=data
TCON Tag Genre name Widely GENRE=data
TCOP Tag Copyright text Varies COPYRIGHT=data

Other tags seem to be supported to a greater or lesser extent, depending on the software. Certainly you can be sure of only one thing - inconsistency.



Problems, comments, suggestions, corrections (including broken links) or something to add? Please take the time from a busy life to 'mail us' (at top of screen), the webmaster (below) or info-support at zytrax. You will have a warm inner glow for the rest of the day.

Tech

tech home
audio stuff
web stuff
dom stuff
css stuff
language stuff
regex stuff
rfc stuff
protocol stuff
cable stuff
lan wiring
rs232 wiring
howto stuff
survival stuff
wireless stuff
ascii codes
data rate stuff
telephony stuff
mechanical stuff
pc stuff
electronic stuff
tech links
open guides
RSS Feed Icon RSS Feed

If you are happy it's OK - but your browser is giving a less than optimal experience on our site. You could, at no charge, upgrade to a W3C STANDARDS COMPLIANT browser such as Mozilla

web zytrax.com
add page to facebook add page to technorati.com add page to digg.com add page to del.icio.us add page to furl.net add page to stumbleupon add page to reddit.com mail this page feature print this page

Software Stuff

Audacity
MoreAmp

Theory Stuff

Filters, FFT and more
Signal Processing
Audio glossary
Numeric Algorithms
UNSW Acoustics Info
UNSW - Dreaded Decibel
Acoustic Explanations
FFT Primer and Book
FFT Output
FFTW

Interesting Stuff

MP3/OGG Stuff
Instrument Samples

Our Stuff

Intro and Overview
A/V Formats
Note Frequencies
Sound Primer
Digital Sound
Equalization, Meters, FFT
Acoustic Calculator
Audio Meta Data
Audio Glossary

printer friendly

Print Page

SPF Record Conformant Domain Logo

Copyright © 1994 - 2012 ZyTrax, Inc.
All rights reserved. Legal and Privacy
site by zytrax
Hosted by super.net.sg
web-master at zytrax
Page modified: July 11 2011.