mail us  |  mail this page

products  |  company  |  support  |  downloads  |  isp services  |  contact us

Digital Audio/Video - Formats

We can never find stuff in one place - here is a collection of audio/video file formats together with some basic notes and links to additional information. Helped us, perhaps it will help you. If not .... well ....sad really.

The crucial difference in the various formats is between lossy and lossless. In a lossy format some of the orginal source material is discarded using a variety of sophisticated algorithms to retain as much as possible of the original sound quality (typically using psychoacoustics models). File sizes for loosy formats are typically 10:1 smaller than the original source material. Lossless formats as the name suggests retain all the source material. There are now a number of compressed lossless formats, such as FLAC, though file sizes tend to be only ~2:1 smaller than the original. Some additional points to note: open or proprietary standards and whether or not there are patent/royalty issues involved.

The Vexed Question of Song Tags (Meta Data)

So life was already pretty complicated what with all those file formats and then we started to look at tags when transcoding audio files. Phew. What a mess. ID3 and all that Jazz.

AAC+ aka HE-AAC. AAC+ (Offically MPEG-4 High-Efficiency Advanced Audio Coding) is a compressed lossy audio format alternative to AAC and standardized under MPEG-4. It provides support for the same range of sampling bit rates (8, 16, 22.05, 24, 32, 44.1, 48 and 96K) as AAC but is designed to be more efficient, especially at lower bit rates, and hence provide a higher quality at any given bit rate than AAC. It supports the same bit rates as AAC and MP3 (32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160, 192, 224, 256 and 320K bits per second) as well as variable bit rate output. There are Patent/Royalty issues involved with HE-AAC. To make matters a little more interisting there are two version of HE-AAC being HE-AACv1 and HE-AACv2.
AAC AAC (Offically MPEG-4 Advanced Audio Coding) is a compressed lossy audio format designed to supersede MP3. It provides support for a wider range of sampling bit rates (8, 16, 22.05, 24, 32, 44.1, 48 and 96K) and is designed to more efficient at lower bit rates. It supports essentially the same bit rates as MP3 (32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160, 192, 224, 256 and 320K bits per second) as well as variable bit rate output. There are no patent/royalty issues involved with AAC. AAC, while supported by a large number of vendors, is most commonly associated Apple (iPod).
AIFF AIFF (Audio Interchange File Format) is an uncompressed lossless audio format developed by Apple and used extensively on the Mac range of computers (as well as others). AIFF audio data is stored in a raw PCM (LPCM) format. A CD also stores the audio in LPCM format (at a sample rate of 44.1KHz and a sample size of 16 bits) but uses a different file standard (The Red Book - IEC 60908). AIFF may be viewed as Apple's equivalent of WAV though Apple would doubtless claim significant advantages for their format. The SDII (Sound Designer II) audio file format is also widely supported and used on Macs.
FLAC FLAC (Free Lossless Audio Codec) is a compressed but lossless audio format developed by Xiph.org a non-profit organization dedicated to providing open, royalty/patent-free open source standards and software. Compressed, lossless codecs (audio format) typically achieve compression ratios of ~2:1 with no loss of the source material. A FLAC file may be viewed simply as a compressed WAV file. FLAC is used in both high-quality playback systems and especially archival applications due to the significant reduction in storage requirements. Xiph.org also provide Theora (video compression standard), speex (a variable bit rate CODEC for low-latency VoIP) and Ogg Vorbis (a compressed, lossy MP3 alternative).
MP3 MPEG-1/2 Audio Layer 3 or MP3 for short. A compressed and lossy (stuff is lost from the original recording) standard for storing audio data. The bit rate (not to be confused with the sampling rate) bit rate (32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160, 192, 224, 256 and 320K bits per second) determines how much data is discarded and therefore the resulting sound quality and file size. The lower the bit rate the lower the audio quality and the smaller the file size. Most systems use a 128K bit rate (increasingly moving to higher rates such as 192K) which gives what is sometimes called radio quality and typically results in file that is 10:1 smaller than if stored in uncompressed format. For comparison uncompressed CD has a bit rate of 1,411.2 kbit/s. Sampling bit rates supported are 16K, 22.05K, 24K, 32K, 44.1K and 48K. While the standard is developed by Internationally recognized bodies there are patent issues related to MP3 technology (Fraunhoffer Institute). You need to buy the MP3 specification and license the resulting products. Oooh. Though there are multiple readers available and in practice the file format is widely known and understood.
Ogg Vorbis The quaintly named Ogg Vorbis (increasingly seem to be trying to use the term Vorbis only) is a compressed, lossy variable bit rate standard (from 45K to 500K) developed by Xiph.org Foundation a non-profit organization dedicated to providing open, royalty/patent-free open source standards and software. The vorbis web site claims that the Ogg Vorbis standard is competitive with MPEG-4 (AAC), and similar to, but higher performance than MPEG-1/2 audio layer 3 (MP3), MPEG-4 audio (TwinVQ), WMA and PAC. The quality (and therefore file size) of the audio stream is based on a Q factor corresponding to:
-1 45 kbit/s
0 64 kbit/s
1 80 kbit/s
2 96 kbit/s
3 112 kbit/s
4 128 kbit/s
5 160 kbit/s
6 192 kbit/s
7 224 kbit/s
8 256 kbit/s
9 320 kbit/s
10 500 kbit/s
Sampling bit rates are theorectically variable but due to equipment availability will typically be the same as for MP3 (16K, 22.05K, 24K, 32K, 44.1K and 48K). Xiph.org also provide Theora (video compression standard), speex (a variable bit rate CODEC for low-latency speech such as VoIP) and FLAC (a compressed, lossless audio standard). And not a patent in sight. File suffix is normally .ogg.
WAV

WAV (Waveform Audio Format) is an uncompressed lossless audio format developed by IBM and Microsoft. WAV audio data is stored in a raw PCM (LPCM) format. A CD also stores the audio in LPCM format (at a sample rate of 44.1KHz and a sample size of 16 bits) but uses a different file standard (The Red Book - IEC 60908). Modern versions of WAV do allow for compression of the audio stream but the term WAV is widely understood to mean lossless uncompresed audio files as captured from the source input. Almost all audio players support WAV format and indeed some normalize input formats into WAV before playing or transcoding. See also FLAC (a compressed, lossless audio format).

The format of a WAV file is a subset of the generic RIFF (Resource Interchange File Format) format as shown below and consists of three 'Chunks' - the HEADER chunk (identifying the file type), the FORMAT chunk (identifying key characteristics of the payload) and the DATA chunk containing the data (or file payload):

HEADER Chunk (12 bytes):

Offset Length Contents Notes
0 4 RIFF 4 ASCII characters "RIFF" identifying the file format
4 4 Chunk Length In the HEADER case this is the entire length of the file from the end of this field on (and is thus total file size minus 8)
8 4 WAVE 4 ASCII chars "WAVE" to identify the file type

NOTE: All multi-byte values are in little-endian order. If big-endian (network) order is being used the value RIFX will replace RIFF in the HEADER chunk.

FORMAT Chunk (24 bytes):

Offset Length Contents Notes
0 4 fmt_ 4 ASCII characters "fmt_" identifying a format chunk
4 4 Chunk Length In the FORMAT case this is the entire length of the header chunk from the end of this field on. In the case of uncompressed LPCM (CODEC = 1) this value is always 16 (x'10), but additional FORMAT data can be added (CODEC dependent) and is reflected in the size value of this field.
8 2 CODEC The CODEC type
00 = unknown
01 = uncompressed LPCM
06 = a-law encoded Logarithmic PCM (ITU G.711)
07 = μ-law encoded Logarithmic PCM (mu-law) (ITU G.711)
20 = ITU G.723 ADPCM
49 = GSM 6.10
64 = ITU G.721 ADPCM

There is a remarkably complete list of types at this site.

10 2 Channels The number of channels 1 = mono, 2 = stereo
12 4 Sample Rate in Hz thus 44.1KHz will be 44100
16 4 Average Bytes per second Sample Rate * bytes per sample
20 2 Bytes per sample Includes all channels. Assuming the sample size is 16 and there are two channels this would be 4. However a 16 bit mono sample and an 8 bit stereo sample would both give the value 2. The next field is used to disambiguate the two cases.
22 2 Bits per sample A.k.a. Sample Size. The number of significant bits in eavh sample. Typically 8, 16, 24, 32. If an non-byte value is used (12 and 20 are also common) then the sample is placed in a 2 or 3 byte aligned field with the top bits set to zero. Thus a 12 bit sample will be placed in a 2 byte field and the top 4 bits set to 0. Samples are always integral byte normalized.
24 - Additional Data Optional CODEC dependent headers - not present for CODEC type 1 (uncompressed LPCM)

OGG Comment Chunk (variable). This a completely non-standard chaunk that we use for WAVE and AIF files to encapsulate Ogg style comments fields:

Offset Length Contents Notes
0 4 ogg_ 4 ASCII characters "ogg " identifying an ogg format comment chunk
4 4 Chunk Length Variable. The entire length of the chunk from the end of this field to end. The following information defines a standard (not extended) Ogg comment field. When adding to an ogg file it would normally be parsed and each comment added via the appropriate library function e.g. vorbis_comment_add_tag.
8 4 Vendor string length The length of the vendor string (normally the library reference being used to encode the file) and is not the same as any ENCODER=data string.
12 V Vendor string of length defines by by vendor string length UTF-8 no termination nulls. It is a character string not a C string.
V 4 Total Length of all Comments Sum of all comments such that by adding this value to the end of this field will skip all comments
V 4 Comment Length Defines the length of the following comment string in name=data format
V V Comment string UTF-8 string. This is a character string and is not null terminated. It is not a C string.
Repeated as many times as defined by Total Length of all Comments field.

DATA Chunk:

Offset Length Contents Notes
0 4 data 4 ASCII characters "data" identifying a data chunk
4 4 Chunk Length Only one DATA chunk will be present in most cases and will cross physical block boundaries. In the DATA case this is the entire length of the all the samples contained in the file.
8 - -

The raw samples. Each sample consists of an intergral number of bytes and consists of interleaved samples captured at the same time from each of the channels thus assuming a 2 channel audio stream with a 16 bit sample size the byte representation would look like 112211221122 etc.. It is thus possible to start playing a WAV file immediately without reading the whole file. The end of file is reached when the value in the DATA chunk length field is exhausted.

8 bit LPCM is assumed to be unsigned data in the range 0 to 256 (center point is 128), 16 bit data is signed 2' compliment data in the range -32768 to 32767 (center point is 0). A 12 bit sample size is always moved into an integral number of bytes so becomes a 16 bit sample with 2's compliment values.

Meta Data - Audio Tags

It was a great idea. The audio file is self describing by containing tagging or meta data. You send the audio file. The player can both play the file and extract a ton of meta data like who is performing, the name of the work etc. No need to supply separate identification information. However, as you will see below the world has gone a bit crazy with a huge number of tags being made available or proposed. In our view the whole idea has gone off the rails. Audio files are not databases. Instead the audio file should contain essential and summary infomation only perhaps limited to identification, rights, perhaps audio parameters. Everything else should be defined in a simple MORE= tag containing a URL to a networked resource which can supply unlimited information about the material. This is our proposal.

MP3 Meta Data

So what became known as ID3 tags came into existence - bolted on afterwards and not part of the MP3 standard. Initially in version 1 now in the completely different version 2.x format. Currently this is at version 2.4, though 2.3 seems the most widely implemented. The list of tags defined by www.id3.org is pretty daunting (currently 94 and counting) and includes the shoe sizes of all the musicians involved. In practice only a trivial subset is widely supported - perhaps 10 - 20. So now you want to transcode from/to MP3, WMA and Vorbis or whatever. And the world suddenly gets a lot murkier.

Ogg Meta Data

Ogg does not support ID3. Instead it uses a flexible and extensible name=data tag format (unfortunately generically called a comment). Ogg are currently promoting a list of name types. The most important characteristic about this list is that is seems to be widely ignored (see below) - in some cases because the list does not have an ID3 (still the dominant standard) equivalent. FLAC - from the Ogg stable also uses the Vorbis comment format. Numerous alternative proposals exist on the web for uses of this flexible tag format and taken together they make the 94 items of ID3 look very modest indeed. Never-the-less, IOHO, it's a more useful format than ID3 and has the overriding merit of being human readable even if the player/reader software is unable to interpret the tags. An extended comment field format to include images has been made available by Ogg (Ogg Extended comment field format proposal).

Wave/AIF Meta Data

WAVE and AIF both use the same RIFF format which allows for new chunks to be added without breaking the container format - unless of course your player/reader is unbelievably stupid. Software can just skip over the unknown chunks so at least the audio file can be played. Thus it is possible to embed any tag format. However, when transcoding the player/writer must understand the meta data format in order to map it to an equivalent format. Now this is where it gets really stupid. AIF defines an id3 chunk whereas Wave does not. WAVE has the Broadcast WAVE Format (BWF) which is optimized for, as you would expect , broadcasters and is not really an alternative to normal audio file meta data.

While there are differences between WAVE abd AIF these are mostly in the area of extended capabilities. We use a common CODEC for both thus, as far as we are concerned, WAVE has embedded id3 capability. However, for reasons mentioned we like the flexibility of OGG and have started to make serious use of both Vorbis and FLAC files. To handle this problem we use the OGG chunk format defined above to encapsulate the standard Ogg comment fields. And if you need to export the files. Add both the id3 and ogg chunks. These are pretty big files - what's a few (hundred) K here or there.

MP4 Meta Data

AAC Meta Data

WMA Meta Data

Meta Tag Mapping

The following table shows of the widely supported ID3 tags and one possible mapping to alternative ones during transcoding.

Tag Purpose MP3 Writers Vorbis/FLAC
ID3 Support - Widely Free format value=data
TIT2 Tag Track title Widely TITLE=data
TPE1 Tag Artist name Widely ARTIST=data
TRCK Tag track number Widely TRACKNUMBER=data.
TALB Tag Album name Widely ALBUM=data
TDRC Tag Year of recording Widely supported but also used for publication date not recording date. No consistency. No real equivalent in Ogg list so eveyone uses YEAR=data. Nearest Ogg value is DATE=data which is the day, month and year of recording.
TDAT Tag Day and month of recording (DDMM) Not widely supported/used. No real equivalent in Ogg list so eveyone uses YEAR=data. Nearest Ogg value is DATE=data which is the day, month and year of recording.
TENC Tag Encoded by Widely supported/used. No equivalent in Ogg list so many use ENCODED=data.
COMMENTS Tag Non specific use Widely No real equivalent in Ogg list so COMMENTS=data seems widely used. Nearest Ogg value is perhaps DESCRIPTION=data
TCON Tag Genre name Widely GENRE=data
TCOP Tag Copyright text Varies COPYRIGHT=data

Other tags seem to be supported to a greater or lesser extent, depending on the software. Certainly you can be sure of only one thing - inconsistency.



Problems, comments, suggestions, corrections (including broken links) or something to add? Please take the time from a busy life to 'mail us' (at top of screen), the webmaster (below) or info-support at zytrax. You will have a warm inner glow for the rest of the day.

Tech

tech home
audio stuff
web stuff
dom stuff
css stuff
language stuff
regex stuff
rfc stuff
protocol stuff
cable stuff
lan wiring
rs232 wiring
howto stuff
survival stuff
wireless stuff
ascii codes
data rate stuff
telephony stuff
mechanical stuff
pc stuff
electronic stuff
tech links
open guides
RSS Feed Icon RSS Feed

If you are happy it's OK - but your browser is giving a less than optimal experience on our site. You could, at no charge, upgrade to a W3C STANDARDS COMPLIANT browser such as Mozilla

web zytrax.com
add page to facebook add page to technorati.com add page to digg.com add page to del.icio.us add page to furl.net add page to stumbleupon add page to reddit.com mail this page feature print this page

Software Stuff

Audacity
MoreAmp

Theory Stuff

Filters, FFT and more
Signal Processing
Audio glossary
Numeric Algorithms
UNSW Acoustics Info
UNSW - Dreaded Decibel
Acoustic Explanations
FFTW

Interesting Stuff

MP3/OGG Stuff
Instrument Samples

Our Stuff

Intro and Overview
A/V Formats
Note Frequencies
Sound Primer
Digital Sound
Equalization
Acoustic Calculator
Audio Meta Data
Audio Glossary

printer friendly

Print Page

SPF Record Conformant Domain Logo

Copyright © 1994 - 2010 ZyTrax, Inc.
All rights reserved. Legal and Privacy
site by zytrax
Hosted by super.net.sg
web-master at zytrax
Page modified: November 19 2009.