Advanced Streaming Format (ASF)
Specification
February 26, 1998
Public Specification Version 1.0
Co-authored by Microsoft Corporation and RealNetworks, Inc.
© 1997-1998 Microsoft Corporation. All rights reserved.
Errata
*1 Introduction
*1.1 Disclaimer
*1.2 What is ASF?
*1.3 Design Goals
*1.4 Scope
*2 ASF Features
*2.1 Extensible Media Types
*2.2 Component Download
*2.3 Scalable Media Types
*2.4 Author-specified Stream Prioritization
*2.5 Multiple Languages
*2.6 Bibliographic Information
*3 File Format Organization
*3.1 ASF Object definition
*3.2 High-level File Structure
*3.3 ASF Header Object
*3.4 ASF Data Object
*3.5 ASF Index Object
*3.6 Minimal Implementation
*4 Additional Considerations
*4.1 Time Units
*4.2 Send Time vs. Presentation Time
*4.3 Scalable Media Types
*4.4 Multimedia Composition
*5 ASF Header Object
*5.1 Header Object
*5.2 File Properties Object
*5.3 Stream Properties Object
*5.3.1 Data Unit Extension Object
*5.4 Content Description Object
*5.5 Script Command Object
*5.6 Marker Object
*5.7 Component Download Object
*5.8 Stream Group Object
*5.9 Scalable Object
*5.10 Prioritization Object
*5.11 Mutual Exclusion Object
*5.12 Inter-Media Dependency Object
*5.13 Rating Object
*5.14 Index Parameters Object
*5.15 Color Table Object
*5.16 Language List Object
*6 Data Object
*6.1 ASF Data Unit Definition
*6.2 ASF Data Unit Examples
*6.2.1 Complete Key Frame Example:
*6.2.2 Partial JPEG Example:
*6.2.3 Three Delta Frames Example
*7 Index Object
*8 Standard ASF Media Types
*8.1 Audio Media Type
*8.1.1 Scrambled Audio
*8.2 Video Media Type
*8.3 Image Media Type
*8.4 Timecode Media Type
*8.5 Text Media Type
*8.6 MIDI Media Type
*8.7 Command Media Type
*8.8 Media-Objects (Hotspot) Media Type
*9 Bibliography
*Appendix A: ASF GUIDs
*Appendix B: Bit Stream Types
*ASCII
*FILETIME
*GUID
*UINT
*UNICODE
*Appendix C: GUIDs and UUIDs
*Introduction
*Motivation
*Specification
*C.1 Format
*C.2 Algorithms for Creating a GUID
*C.3 String Representation of GUIDs
*C.4 Comparing GUIDs
*C.5 Node IDs when no IEEE 802 network card is available
*C.6 References
*
The following changes have been made to the November 12, 1997 version of this specification:
The final paragraph of Section 1.2 was modified to indicate that the semantics of the ASF Header Object must be received before the ASF Data Object can be interpreted. Previous versions inadvertently implied that the header object format itself must be transmitted which would have precluded the use of session announcement protocols to convey this information.
The text was changed in Section 3.2 to indicate that the Header Object should be the first object in an ASF file.
Clarifications were made within Section 5.3 (Stream Properties Object) concerning the implementation of the following fields:
In Section 5.6 (Marker Object) it was stated that the same invalid offset value as that used in the Index Object (Section 7) is also used in the Marker Object to signify invalid offsets.
The Index Entry Time Interval of Section 5.14 (Index Parameters Object) was changed to be a UINT 32 (instead of 16) to make it conform to the Index Entry Time Interval of the Index (i.e., Section 7). The text was also changed in to indicate the indexing indices are in terms of presentation time.
The text was altered to state in Section 6.1 (ASF Data Unit) that if an object containing a clean point flag is fragmented, the clean point flag is set for all fragments of that object.
An explanation is given to what the Block Position and Index Entry Count fields refer in the Index Object of Section 7. An invalid offset value is defined for sparse indexes. Also, made explicit that the Entry offsets are ordered according to the ordering specified by the Index Parameters Object, thereby permitting the same stream to be potentially indexed by multiple Index Types (e.g., Nearest Clean Point, Nearest Object, Nearest Data Unit).
Clarified that the Seek to Marker command (of Section 8.8) was in reference to indices to the Markers field of the Marker Object defined in Section 5.6.
Corrected the typographical error within Section 8.8 that gave the same definition for “vertical resolution” as was previously given for “horizontal resolution”. Also, an editorial comment referring to earlier versions of the specification was removed from the Command Entry Structure Notes.
Open issues:
The following changes have been made to the September 30, 1997 version of this specification:
It should be explicitly noted that length fields of Unicode Strings indicate the number of Unicode “characters” within the field, while length fields of ASCII or character strings indicate the number of bytes within the field.
In Section 5.4 (Content Description Object):
In Section 6.1 (ASF Data Unit):
In Appendix A (GUID values):
This document presumes a basic level of multimedia and networking knowledge on the part of the reader. Anyone not familiar with basic multimedia concepts such as audio and video compression, multimedia synchronization, and so on. may misunderstand some of the terminology or arguments presented in this document.
Advanced Streaming Format (ASF) is an extensible file format designed to store synchronized multimedia data. It supports data delivery over a wide variety of networks and protocols while still proving suitable for local playback. The explicit goal of ASF is to provide a basis for industry-wide multimedia interoperability, with ASF being adopted by all major streaming solution providers and multimedia authoring tool vendors.
Each ASF file is composed of one or more media streams. The file header specifies the properties of the entire file, along with stream-specific properties. Multimedia data, stored after the file header, references a particular media stream number to indicate its type and purpose. The delivery and presentation of all media stream data is synchronized to a common timeline.
The ASF file definition includes the specification of some commonly used media types (see Section 8). The explicit intention is that if an implementation supports media types from within this set of standard media types (in other words, audio, video, image, timecode, text, MIDI, command, or media object), then that media type must be supported in the manner described in Section 8 if the resulting content is to be considered to be “content compliant” with the ASF specification. Implementations are free to support other media types (in addition to the currently defined standard media types) in any way they see fit.
Finally, ASF is said to support the transmission of “live content” over a network. This refers to multimedia content which may or may not ever become recorded upon a persistent media source (for example, a disk, CD-ROM, DVD, etc). This use explicitly and solely means that information describing the multimedia content must have been received before the multimedia data itself is received (in order to interpret the multimedia data), and that this information must convey the semantics of the ASF Header Object. Similarly, the received data must conform to the format of the ASF data units. No additional information should be conveyed by this term. Specifically, this use explicitly does not refer to (or contain) any information about network control protocols or network transmission protocols. It refers solely to the order of information arrival (header semantics before data) and the data format .
ASF was designed with the following goals:
ASF is a multimedia presentation file format. It supports live and on-demand multimedia content. It can be used as a vehicle to record or play back H.32X (for example, H.323 and H.324) or MBONE conferences. ASF files may be edited. ASF data is specifically designed for streaming and/or local playback.
ASF is not:
ASF files permit authors to easily define new media types. The ASF format provides sufficient flexibility to allow the definition of new media stream types that conform to the file format definition. Each stored media stream is logically independent from all others unless a relationship to another media stream has been explicitly established in the file header.
Stream-specific information about playback components (for example, decompressors and renderers) can be stored in the file header. This information enables each client implementation to retrieve the appropriate version of the required playback component if it is not already present on the client machine.
ASF is designed to express the dependency relationships between logical “bands” of scalable media types. It stores each band as a distinct media stream. Dependency information among these media streams is stored in the file header, providing sufficient information for clients to interpret scalability options (such as spatial, temporal, or quality scaling for video) in a compression-independent manner.
Modern multimedia delivery systems can dynamically adjust to changing constraints (for example, available bandwidth). Authors of multimedia content must be able to express their preferences in terms of relative stream priorities as well as a minimum set of streams to deliver. Stream prioritization is complicated by the presence of scalable media types, since it is not always possible to determine the order of stream application at authoring time. ASF allows content authors to effectively communicate their preferences, even when scalable media streams are present.
ASF is designed to support multiple languages. Media streams can optionally indicate the language of the contained media. This feature is typically used for audio or text streams. A multilingual ASF file indicates that a set of media streams contains different language versions of the same content, allowing an implementation to choose the most appropriate version for a given client.
ASF provides the capability to maintain extensive bibliographic information in a manner that is highly flexible and very extensible. All bibliographic information is stored in the file header in Unicode and is designed for multiple language support, if needed. Bibliographic fields can either be predefined (for example, author and title) or author-defined (for example, search terms). Bibliographic entries can apply to either the whole file or a single media stream.
The base unit of organization for ASF files is called the ASF Object. It consists of a 128-bit globally unique identifier (GUID) for the object, a 64-bit integer object size, and variable length object data. The value of the object size field is the sum of 24 bytes plus the size of the object data in bytes.

Figure 1 ASF Object
This unit of file organization is similar to the Resource Interchange File Format (RIFF) chunk, which is the basis for AVI and WAV files. The ASF object enhances the design of the RIFF chunk in two ways. First, there is no need for a central authority to manage the object identifier system, since any computer with a network card can generate valid, unique GUIDs (see Appendix C). Second, the object size has been chosen to be large enough to handle the very large files needed for high-bandwidth multimedia content.
All ASF objects and structures (including data unit headers) are stored in little-endian byte order (the inverse of network byte order). However, ASF files can contain media stream data in either byte order within the data unit.
ASF files are logically composed of three top-level objects: the Header Object, the Data Object, and the Index Object. The Header Object is mandatory and must be placed at the very beginning of every ASF file. The Data Object is also mandatory, and should normally follow the Header Object. The Index Object is optional, but it is strongly recommended that it be used.
Implementations will support files containing out-of-order objects, but in certain cases the resulting ASF files will not be usable from certain sources such as HTTP servers. Also, additional top-level objects may be defined by implementations and inserted into ASF files. It is recommended that they follow the Index Object (in object placement order).
A requirement of ASF is that the Header Object must have been received for the contents of the Data Object to be interpreted. ASF does not address how this information arrives at the client. Rather, “arrival mechanisms” are deemed to be a “local implementation issue,” which is explicitly out of the scope of the file specification. It is similarly a local implementation issue whether or not the Header Object is transferred “in band” or “out of band” (vis-a-vis the Data Object’s data units) or whether the Header Object is sent once or is repeatedly sent. Implementations may choose to meet this order requirement (in other words, the Header Object must arrive before ASF data units can be interpreted) in many possible ways including: (A) include the Header Object information as part of the “session announcement”; (B) send the Header Object in a different “channel” (for example, link) than the data object; (C) send the Header Object immediately before the ASF data units; and so on.

Figure 2. High-level ASF File Structure
Of the three top-level ASF objects, the Header Object is the only one that contains other ASF objects. The header object may include many objects including the following:
The role of the Header Object is to provide a well-known byte sequence at the beginning of ASF files (its GUID) and to contain all other header information. This information provides global information about the file as a whole as well as specific information about the multimedia data stored within the Data Object.
ASF Data ObjectThe Data Object contains all the multimedia data of an ASF file. This data is stored in the form of ASF data units. Each ASF Data Unit is of variable length, and contains data for only one media stream. Data units are sorted within the Data Object based on the time at which they should be delivered (send time). This sorting results in an interleaved data format.
The Index Object contains a time-based index into the multimedia data of an ASF file. The time interval that each index entry represents is set at authoring time and stored in the Index Object. Since it is not required to index into every media stream in a file, a list of the media streams that are indexed follows the time interval value.
Each index entry consists of one data unit offset per media stream being indexed. This information allows stream-specific index operations to occur.
A minimal ASF implementation consists of a Header Object containing only a File Properties Object, one Stream Properties object, and one Language List Object, as well as a Data Object containing only a single ASF data unit.
Additional Considerations Time UnitsAll time fields in ASF objects and ASF data units use the same timeline, which begins at time zero. Send Times (see Section 4.2) are expressed in granularities of milliseconds. Presentation Times (see Section 4.2) are expressed in Rational Time units. Other timecode systems (such as SMPTE) are supported through the use of a timecode media stream that binds alternate timecode values to each data unit (see Section 8.4). This stream binding is achieved using the Inter-Media Dependency Object. This allows authoring and editing tools to keep alternate timestamps while permitting client/server implementations to ignore them. In all cases, all time references are to the same timeline.
ASF Data Units all contain a millisecond timestamp, which is called the data unit’s send time. This is the time on the ASF timeline at which this data unit should be delivered to the client. Sometimes, the media stream can explicitly store the fixed delta between send time and presentation time in the Stream Properties Object. If so, every data unit for that stream is presented at exactly the same amount of time after being sent. If this delta is zero, then the send time is equivalent to the presentation time. Otherwise, the data unit stores the presentation time in the data unit itself as either a delta value from the send time or as an explicit presentation timestamp. Using data unit-specific presentation times provides increased flexibility to authoring tools to reduce a stream’s maximum bandwidth requirement by sending data before it is needed.
Unlike Send Time, Presentation Time is specified in Rational Time units, thereby permitting finer time granularities than is possible for millisecond units. The numerator and denominator values by which the specific Rational Time units are computed for each media stream are established in that media stream’s Stream Properties Object.
Information about each scalable media source (for example, audio or video) is stored in a Scalable Object in the header. If multiple types of scalable media are present in one ASF file, the header will contain multiple Scalable Objects.
Each Scalable Object contains the dependency information for all media streams that comprise bands of the same media source. Also included within the Scalable Object is an author-specified default sequence in which the media stream bands should be applied. This information is useful if a client is unable or unwilling to resolve the user’s scalability preferences. The sequence also specifies the enhancement type of each media stream band. For scalable video, there are three common enhancement types: spatial (increasing frame size), temporal (increasing frame rate), and quality (increasing image quality without resizing). Similarly, scalable audio has number of channels (for example, stereo), frequency response, and quality. Additional user-defined enhancement types may also be defined.
One of ASF’s design goals is to be independent of any particular multimedia composition system. No information is provided in the ASF format concerning three-dimensional positions of streams or relative positioning information between streams. Using the Stream Group Object, ASF provides a general mechanism to group logically related media streams. Implementations will then determine how to render these streams (for example, the relative positioning of the grouped streams, stream mixing, Z-ordering and all other compositional issues, etc) by a mechanism that is outside scope of this file specification. This determination may be based on “out-of-band” techniques such as end user input, the client environment itself, or information contained within the media streams themselves (for example, MPEG-4, streaming Dynamic HTML content, and so on.).
It is anticipated that several different composition approaches can coexist and leverage the same piece of ASF content. An example is a TV scenario in which two video streams are grouped separately. One contains a large image of the anchorperson against a backdrop, and the other contains smaller footage of a news story. While the size of each rendering site could be calculated based on the natural size of each video stream in the group, the fact that the news story should be overlaid on the top right corner of the anchorperson video can not be determined without external composition information.
This section defines the various objects that comprise the ASF Header Object.
Mandatory: Yes
Quantity: 1 only
|
Field Name |
Field Type |
Size (bits) |
|
Object ID |
GUID |
128 |
|
Object Size |
UINT |
64 |
Notes:
The Header Object is a container that can hold any combination of the following standard objects. Only the File Properties Object and the Stream Properties Object are required to be present. In addition, (non-standard) header objects that conform to the ASF Object Structure (see Section 3.1) may also be optionally defined and used as extension mechanisms for local implementations. Unlike the standard header objects defined below, there is no guarantee that the non-standard objects will be interpretable across vendor implementations. Implementations should ignore any non-standard object that they do not understand.
Mandatory: Yes
Quantity: 1 only
This object defines the global characteristics of the combined media streams found within the Data Object.
Object Structure:
|
Field Name |
Field Type |
Size (bits) |
||
|
Object ID |
GUID |
128 |
||
|
Object Size |
UINT |
64 |
||
|
File ID |
GUID |
128 |
||
|
Creation Date |
FILETIME |
64 |
||
|
Content Expiration Date |
FILETIME |
64 |
||
|
Last Send Time |
UINT |
64 |
||
|
Play Duration |
UINT |
64 |
||
|
Flags |
UINT |
32 |
||
|
Live Flag |
1 (LSB) |
|||
|
Huge Data Units Flag |
1 |
|||
|
Reserved |
30 |
|||
|
Minimum Bit Rate |
UINT |
32 |
||
|
Maximum Bit Rate |
UINT |
32 |
||
|
Average Data Unit Size |
UINT |
32 |
||
|
Maximum Data Unit Size |
UINT |
32 |
||
|
Total Data Units |
UINT |
32 |
||
|
Stream Count |
UINT |
16 |
||
Notes:
The Object ID field is the GUID for the File Properties Object (see Appendix A). The Object Size field is the size (in bytes) of the File Properties Object.
The value of the File ID field should be regenerated every time the file is edited. It provides a unique identification for this ASF file.
The Creation Date contains the date and time of the initial creation of the file.
Content Expiration Date indicates the date after which the author doesn’t want the file to be used. This time can be “never” (value of zero).
Both the Last Send Time (formerly known as Send Duration) and the Play Duration fields have millisecond granularities. Both of these fields are invalid if the live Flag bit is set. Last Send Time is the send time of the last data unit within the file. Play Duration is the maximum End Time (of any of the SPOs) minus the minimum Start Time (of any of the SPOs).
The following are the meanings of the Flags:
Minimum Bit Rate is in bits per second and indicates the total of the average bandwidth of all the mandatory streams.
Maximum Bit Rate is in bits per second and indicates the total of the maximum bandwidth of all of the non-excluded streams.
The Average Data Unit Size is in bytes. This field is invalid if the Live Flag is set.
The Maximum Data Unit Size is in bytes. This indicates the longest ASF Data Unit within the Data Object. This field is invalid if the Live Flag is set.
The Total Data Units field contains the number of ASF Data Unit entries that exist within the Data Object. This field is invalid if the Live Flag is set.
Stream Count field indicates the number of Stream Properties Objects (SPOs) that exist in this file. Each media stream is required to have its own SPO.
Invalid fields should have a value of zero for writing and should be ignored when reading.
Quantity: 1 per media stream
This object defines the specific properties and characteristics of a media stream. It defines how a multimedia stream within the Data Object is to be interpreted as well as the specific format (of elements) of the ASF Data Unit itself (see Section 6.1) for that media stream. One instance of this object is required for each media stream in the file, including each of the separate streams formed by a scalable media type.
Unlike most other ASF objects, the Stream Properties Object (SPO) is a "container object": it can optionally include additional ASF Objects (see Section 3.1) within itself in a manner similar to the Header Object. The size of these objects is included within the Object Size field and contained objects, if any, are appended after the Type-Specific Data field within the object structure below. This provision dramatically enhances the scalability and expandability capabilities of ASF, since it permits the rapid introduction of innovations and support for technology evolution. Currently, only one ASF Object targeted to be optionally contained within the SPO has been defined within this specification: the Data Unit Extension Object (See Section 5.3.1). Other ASF objects (for example, alternative approaches to scalable media, a QoS (RSVP) information object, extra RTP information, or MPEG-4 enhancements) may subsequently be defined and included within the SPO as needed. In this way the SPO can be enhanced over time to embrace new technologies and innovations.
Object Structure:
|
Field Name |
Field Type |
Size (bits) |
||
|
Object ID |
GUID |
128 |
||
|
Object Size |
UINT |
64 |
||
|
Stream Type |
GUID |
128 |
||
|
Start Time |
UINT |
64 |
||
|
End Time |
UINT |
64 |
||
|
Average Bit Rate |
UINT |
32 |
||
|
Maximum Bit Rate |
UINT |
32 |
||
|
Average Data Unit Size |
UINT |
32 |
||
|
Maximum Data Unit Size |
UINT |
32 |
||
|
Preroll |
UINT |
32 |
||
|
Flags |
UINT |
32 |
||
|
Reliable Flag |
1 (LSB) |
|||
|
Recordable Flag |
1 |
|||
|
Seekable Flag |
1 |
|||
|
Presentation Time Flags |
2 |
|||
|
Reserved |
27 |
|||
|
Presentation Time Delta |
UINT |
0 or 32 |
||
|
Presentation Time Numerator |
UINT |
0 or 32 |
||
|
Presentation Time Denominator |
UINT |
0 or 32 |
||
|
Stream Number |
UINT |
16 |
||
|
Stream Language ID Index |
UINT |
16 |
||
|
Stream Name Count |
UINT |
16 |
||
|
Stream Names |
See below |
? |
||
|
MIME Type Length |
UINT |
8 |
||
|
MIME Type |
ASCII (UINT8) |
? |
||
|
Type-Specific Data Length |
UINT |
16 |
||
|
Type-Specific Data |
UINT8 |
? |
||
Stream Name:
|
Field Name |
Field Type |
Size (bits) |
|
Language ID Index |
UINT |
16 |
|
Stream Name Length |
UINT |
16 |
|
Stream Name |
Unicode (UINT16) |
? |
Notes:
The Object ID field is the GUID for the Stream Properties Object (see Appendix A). The Object Size field is the size (in bytes) of this Stream Properties Object instance (including the sizes of all contained objects).
Start Time and End Time are presentation times in millisecond granularities. Both fields are invalid if the Live Flag of the File Properties Object has been set. The Start Time is the presentation time of the first object. The End Time is the presentation time of the last object plus the duration of play. The time reference in both cases is relative to the the ASF file’s timeline. These fields exist, therefore, to indicate where this media stream is located within the context of the timeline of the file as a whole.
Invalid fields should have a value of 0 (zero) for writing and should be ignored when reading.
The Average Bit Rate and the Maximum Bit Rates are in bits per second. Both fields solely refer to this media stream’s Bit Rates. The Maximum Bit Rate is computed by identifying the maximum rate in any one-second period. The Maximum Bit Rate means that the Bit Rate for this stream must not ever exceed this value. This may be thought of as running a one second “sliding window” over this media stream’s contents and noting the specific one second interval in which the greatest number of bits-per-second occurred. This value must be non-zero. The Average Bit Rate is the approximation one would obtain by dividing the total bits sent within this media stream by the time (in seconds) during which those bits are being sent (i.e., one plus the send time of the last Data Unit of that stream minus the send time of first data unit of that stream).
The Average Data Unit Size and the Maximum Data Unit Size are in bytes and refer to the ASF Data Units for this media’s data types within the Data Object. The Average Data Unit Size is computed by dividing the total size of all of the ASF Data Units of that stream by the number of ASF Data Units of that stream. The Maximum Data Unit Size is the size in bytes of the largest ASF-DU for this media stream. A value of zero means “unknown”. These values are aids to the server for making network fragmentation and packetization decisions.
Preroll is the minimum delay factor in milliseconds that a client should use between starting a particular stream and starting the clock for the client’s timeline. It is used to compute the buffering requirements at the client in order to mitigate against network jitter. Specifically, when a data unit is received whose send time value is greater than the preroll value for that stream, the client’s timeline clock is started. Rendering is subsequently determined by the Data Unit’s presentation time for that (i.e., the client’s) timeline.
The default preroll value is zero.The following is the significance of the various flags in the Flags field:
|
Value |
Meaning |
Explanation: |
|
00 |
Not Used |
The Presentation Time field is not used within the ASF Data Unit (see Section 6.1) for this media stream. The Presentation Time Delta, Presentation Time Numerator, and the Presentation Time Denominator fields are also not used within this object. |
|
01 |
Fixed Delta |
The Presentation Time field is not used within the ASF Data Unit (see Section 6.1) for this media stream. However, the presentation time is known to be a fixed delta (in Rational Units) off of the send time. This delta is established by the Presentation Time Delta field within this object (in other words, this is the only case in which the Presentation time Delta field is used within this object). |
|
10 |
Delta in Data Units |
A 16-bit Presentation Time field (in Rational Units) is used within the ASF Data Unit (see Section 6.1) for this media stream. That field identifies the presentation time as a delta off of the send time. The Presentation Time Delta field is not used within this object. |
|
11 |
Full Data Unit Presentation Time |
A 32-bit Presentation Time field (in Rational Units) is used within ASF Data Unit (see Section 6.1) for this media stream. That field identifies the actual presentation time for that data unit. The Presentation Time Delta field is not used within this object. |
The Presentation Time Delta, Presentation Time Numerator, and Presentation Time Denominator fields do not exist if the Presentation Time Flags have a zero value. The Presentation Time Delta field also does not exist if the Presentation Time Flags have 10 or 11 values (in other words, it only exists if the flags have an 01 value; see above). Otherwise these fields are 32 bits long.
Presentation Time Delta is in Rational Time Units. It indicates that a fixed time delta (in Rational Units) between the presentation time and the send time should be applied to the entirety of this stream’s data units (see the ASF Data Unit definition in Section 6.1). The Presentation Time flags determine whether or not this field is used.
Rational Time Units signify a media-stream specific time unit within the ASF file’s intrinsic timeline. Rational Time Units are for Presentation Times only. They are determined by dividing the Presentation Time Numerator by the Presentation Time Denominator. The default Presentation Time Numerator value is 1 and the default Presentation Time Denominator value is 1000. Therefore, the default Rational Time Units are in milliseconds.
The Stream Number provides a reference to identify which media streams (in the ASF Data Unit’s Stream Number field) are defined by a given Stream Properties Object instance. Zero is an invalid stream number (i.e., other Header Objects use stream number zero to refer to the entire file as a whole rather than to a specific media stream within the file).
The Stream Language ID Index field refers to the contents of the stream itself (in other words, the language, if any, which the stream uses/assumes).
Please see the Language List Object (Section 5.16) for the details concerning how the Stream Language ID Index and Language ID Index fields should be used.
The Stream Name Count field tells how many Stream Names are present. Each stream name instance is potentially a localization into a specific language. The Language ID Index field indicates the language in which the Stream Name has been written in Unicode values.
The Stream Name Length field indicates the number of Unicode “characters” that are found within the Stream Name field. The MIME Type Length field indicates the number of bytes found within the MIME Type field.
The Stream Name, MIME Type, and Stream Type are each mechanisms to identify the Media Stream (in Unicode, MIME type, and GUID, respectively).
The structure for the Type Specific Data field varies by media type. The structure for this field for the Standard ASF Media Types is detailed in Section 8.
Mandatory: No
Quantity: 0 - n
The Data Unit Extension Object is an optional provision to include application (or implementation)-specific data within each ASF Data Unit (see Section 6.1) instance of this media stream.
Object Structure:
|
Field Name |
Field Type |
Size (bits) |
|
Object ID |
GUID |
128 |
|
Object Size |
UINT |
64 |
|
Extension System |
GUID |
128 |
|
Data Unit Extension Size |
UINT |
16 |
|
Extension System Info Size |
UINT |
32 |
|
Extension System Info |
UINT8 |
?? |
Notes:
Extension System is a GUID identifier of the type of information being stored within the Extension Data field of the ASF Data Unit (see Section 6.1).
The Data Unit Extension Size field indicates the number of bytes of extension information that are present within the Extension Data field of the ASF Data Unit (see Section 6.1) for this media stream. If the Data Unit Extension Size field has a value of 0xFFFF (65535 decimal), then the Extension Data field is variable length and the first byte of the Extension Data field gives the length of the (following) extension data for that particular ASF Data Unit instance. For example, if the first byte of a variable sized entry has the value of “2,” then two additional extension data bytes will be present in that instance of the Extension Data field.
The number, order, and size of the data elements within the ASF Data Unit's Extension Data field directly correspond to the order in which the Data Unit Extension Objects occur within the SPO for this media stream. For example, assume that three Data Unit Extension Objects are included within a stream's SPO. Assume that the first specifies a fixed length of 4 bytes, the second specifies a variable length field, and the third specifies a fixed length of 2 bytes. Therefore, the Extension Data field of each ASF Data Unit for this stream will consist of 4 bytes (extension #1), followed by 1 length byte plus up to 255 data bytes (extension #2), and finally 2 bytes (extension #3).
The Extension System Information field is an optional field providing additional definitions or parameters (if any) of the Extension System.
Mandatory: No
Quantity: 0 or 1
This object permits authors to record human-readable, pertinent data about the file and its contents. This content is readily expandable to satisfy varying bibliographic needs. Authors can supplement (or ignore) the “standard” bibliographic information (for example, title, author, copyright, and description) with content designations of their own choosing. Each individual field name and value can be stored in as many different languages as are preferred by the author, and can be stream-specific or pertinent to the whole file.
Object Structure:
|
Field Name |
Field Type |
Size (bits) |
|
Object ID |
GUID |
128 |
|
Object Size |
UINT |
64 |
|
Description Record Count |
UINT |
16 |
|
Description Records |
See below |
? |
Description Record:
|
Field Name |
Field Type |
Size (bits) |
|
Field Type |
UINT |
8 |
|
Language ID Index |
UINT (see S5.16) |
16 |
|
Stream Number |
UINT |
16 |
|
Name Length |
UINT |
16 |
|
Value Length |
UINT |
16 |
|
Name |
Unicode (UINT16) |
? |
|
Value |
Unicode (UINT16) |
? |
The Object ID field contains the GUID for the Stream Properties Object (see Appendix A). The Object Size is the length in bytes of this object.
Description Record Count indicates the number of Description Records.
The Field Type field contains unsigned integer values.
Please consult references [5], [6], and [7] for an interpretation of the meanings of their field types.
The values of the Field Type field are:
|
1 = Author |
2 = Title |
3 = Copyright |
4 = Description |
5 = Tool Name |
|
6 = Tool Version |
7 =Tool GUID |
8 = Date of Last Modification |
9 = Original Date Created |
10 = ISRC |
|
11 = ISWC |
12 = UPC/EAN |
13 = LCCN (10) |
14 = ISBN (20) |
15 = ISSN (22) |
|
16 = Cataloging Source, Leader (40) |
17 = Main Entry --Personal Name (100) |
18 = Main Entry – Corporate Name (110) |
19 = Edition Statement (250) |
20 = Main Uniform Title (130) |
|
21 = Uniform Title (240) |
22 = Title Statement (245) |
23 = Varying Form Title (246) |
24 = Publication, Distribution, and so on (260) |
25 = Physical Description (300) |
|
26 = Added Entry Title (440) |
27 = Series Statement (490) |
28 = General Note (500) |
29 = Bibliography Note (504) |
30 = Contents Note (505) |
|
31 = Creation Credit (508) |
32 = Citation (510) |
33 = Participant (511) |
34 = Summary (520) |
35 = Target Audience (521) |
|
36 = Added Form Available (530) |
37 = System Details (538) |
38 = Awards (586) |
39 = Added Entry Personal Name (600) |
40 = Added Entry Topical Term (650) |
|
41 = Added Entry Geographic (651) |
42 = Index Term, Genre (655) |
43 = Tag Index Term, Curriculum (658) |
44 = Added Entry Uniform Title (730) |
45 = Added Entry Related (740) |
|
46 = Series Statement Personal Name (800) |
47 = Series Statement Uniform Title (830) |
48 = Electronic Location and Access (856) |
49 = Added Entry – Personal Name (700) |
50 = Coverage |
|
51 = Date |
52 = Resource Type |
53 = Format |
54 = Resource Identifier |
55 = Source |
|
56 = Language |
57 = Relation |
58 = Coverage |
59 = Subject |
60 = Contributor |
|
61 = CNAME |
62 = NAME |
63 = EMAIL |
64 = PHONE |
65 = LOC |
|
66 = TOOL |
67 = NOTE |
68 = PRIV |
69 = APP |
70 = SSRC |
|
71 = Initial RTP Timestamp value |
72 = Initial RTP Sequence Number |
73= RTP Version Number |
Values between 74 and 99 (inclusive) are reserved. Values >= 100 are user-defined. |
|
The Stream Number indicates whether the entry applies to a specific media stream or whether it applies to the whole file. A value of zero in this field indicates that it applies to the whole file; otherwise, the entry applies only to the indicated stream number.
Name is in Unicode. This field may be blank if the Field Type value is less than 100, unless the author explicitly wants to localize the name of the field type.
The Name Length field indicates the number of Unicode “characters” that are found within Name field. The Value Length field indicates the number of Unicode “characters” that are found within Value field.
As a space optimization, a 16-bit Language ID Index field has been used. See the Language List Object (Section 5.16) for more details.
Mandatory: No
Quantity: 0 or 1
This object provides a list of Type/Parameter pairs of Unicode strings that are synchronized to the ASF file’s timeline. Types can include “URL” or “FILENAME.” These semantics and use of types are identical to the Command Media Type (see Section 8.7). Other Type values may also be freely defined and used. The semantics and treatment of this latter set of Types are defined by the local implementations. The Parameter value (referred to as “Commands” below) is specific to the type field. This Type/Parameter pairing can be used for many purposes, including sending URLs to be "launched" by a client into an HTML frame (in other words, the “URL” type) or launching another ASF file for chained “continuous play” audio or video presentations (in other words, the “FILENAME” type). This object can also be used as an alternative method to stream text (in addition to the Text Media Type) as well as to provide “script commands” that can be used to control elements within the client environment.
Object Structure:
|
Field Name |
Field Type |
Size (bits) |
|
Object ID |
GUID |
128 |
|
Object Size |
UINT |
64 |
|
Type Count |
UINT |
16 |
|
Command Count |
UINT |
16 |
|
Types |
See below |
? |
|
Commands |
See below |
? |
Type:
|
Field Name |
Field Type |
Size (bits) |
|
Type Name Length |
UINT |
16 |
|
Type Name |
Unicode (UINT16) |
? |
Command:
|
Field Name |
Field Type |
Size (bits) |
|
Presentation Time |
UINT |
32 |
|
Type Index |
UINT |
16 |
|
Command Name Length |
UINT |
16 |
|
Command Name |
Unicode (UINT16) |
? |
Notes:
Presentation Time is given in millisecond granularities.
Types are stored as an array of Unicode strings, since they will typically be reused. Commands specify their type using a zero-based index into the array of Types.
The Type Name Length field indicates the number of Unicode “characters” that are found within the Type Name field. The Command Name Length field indicates the number of Unicode “characters” that are found within the Command Name field.
Mandatory: No
Quantity: 0 or 1
This object contains a small, specialized index which is used to provide named “jump points” within a file. This allows a content author to divide a piece of content into logical sections such as song boundaries in an entire CD or topic changes during a long presentation, and to assign a human-readable name to each section of a file. This index information is then available to the client to permit the user to “jump” directly to those points within the presentation.
Object Structure:
|
Field Type |
Size (bits) |
|
|
Object ID |
GUID |
128 |
|
Object Size |
UINT |
64 |
|
Index Specifier Count |
UINT |
16 |
|
Marker Count |
UINT |
16 |
|
Index Specifiers |
See Section 5.14 |
? |
|
Markers |
See below |
? |
Marker:
|
Field Name |
Field Type |
Size (bits) |
|
Presentation Time |
UINT |
32 |
|
Offsets |
UINT64 |
? |
|
Marker Name Count |
UINT |
16 |
|
Marker Names |
See below |
? |
Marker Name:
|
Field Name |
Field Type |
Size (bits) |
|
Language ID Index |
UINT |
16 |
|
Marker Name Length |
UINT |
16 |
|
Marker Name |
Unicode (UINT16) |
? |
Notes:
The Index Specifiers are defined within the Index Parameters Object (Section 5.14).
The Presentation Time is in millisecond granularities. This value does not wrap around, which means that markers can only refer to the first 49.7 days of information contained within an ASF file.
Potentially multiple Offsets entries are listed within the Marker structure. The number is determined by the requirement that there must be one Offsets entry in each Marker structure for each Index Specifier entry. Thus, the total size in bits of the Marker’s Offsets field is 64 bits times the value of the Index Specifier Count field. An offset value of 0xFFFFFFFFFFFFFFFF signifies that the entry contains an invalid offset value.
As a space optimization, a 16-bit Language ID Index field has been used. See the Language List Object (Section 5.16) for more details.
The Marker Name Length field indicates the number of Unicode “characters” which are found within Marker Name field.
Mandatory: No
Quantity: 0 or 1
This object provides a list of components (including version information) required for the proper rendering of each stream in the file. Each listed component has a human-readable name, a category identifying the component type (which is usually either “codec” or “renderer”), a component ID used to uniquely identify a specific component, and version information for that component.
This object presupposes that the Component ID will be the primary mechanism used to find the proper component to download. This object purposefully does not use URLs to find these objects, for the following reasons:
Object Structure:
|
Field Name |
Field Type |
Size (bits) |
|
Object ID |
GUID |
128 |
|
Object Size |
UINT |
64 |
|
Component Count |
UINT |
32 |
|
Component Records |
See below |
? |
Component Record:
|
Field Name |
Field Type |
Size (bits) |
|
Category |
GUID |
128 |
|
Component ID |
GUID |
128 |
|
Version |
UINT |
64 |
|
Stream Number |
UINT |
16 |
|
Component Name Length |
UINT |
16 |
|
Component Name |
Unicode (UINT16) |
? |
Notes:
The Component ID is a GUID that can use mappings for ACM and VCM codecs, for example.
The Version field stores a “dotted quad” version stamp using the highest 16 bits for the product version, the next 16 bits for the incremental version, the next 16 bits for the revision, and the lowest 16 bits for the build number. The value 0.0.0.0 should be used for the versions of ACM and VCM codecs. This value means “any version” and is needed because there are no valid versioning numbers for ACM/VCM codecs, since the “versioning information” is actually contained within the Component ID’s GUID value itself for these codec types. Other entities that do not have valid version numbers should also use 0.0.0.0 in this field.
Stream Number identifies the multimedia stream associated with this component. A 0 (zero) value means “all streams.”
The Component Name is a human-readable display name for this component.
Mandatory: No
Quantity: 0 or 1
This object provides lists of “associated” streams that are grouped into related presentation contexts. Each of these contexts contains a Group Name by which these contexts may be referenced. This permits the client to make implementation-specific composition and rendering decisions affecting those streams. For associated image/video streams, these decisions can include the number, size, and location of image/video rendering windows, and their relative positions in three-dimensional space. For audio streams, these decisions will impact the potential mixing of associated audio streams that occur simultaneously (stream start & end time can be determined using the Stream Properties Object).
The following are additional examples of potential uses of this object:
The default behavior if no Stream Group Object is present within the File Header (and therefore no stream groups are defined) is to assume that all streams are grouped together.
Object Structure:
List of stream groupings, each of which contains a list of stream numbers for that grouping. Each stream grouping is optionally assigned a Group Name that can serve as a “handle” by which the group as a whole may be referenced. This name may be localized into different languages.
|
Field Name |
Field Type |
Size (bits) |
|
Object ID |
GUID |
128 |
|
Object Size |
UINT |
64 |
|
Stream Group Count |
UINT |
16 |
|
Stream Groups |
See below |
? |
Stream Group:
|
Field Name |
Field Type |
Size (bits) |
|
Group Name Count |
UINT |
16 |
|
Group Names |
See below |
? |
|
Stream Count |
UINT |
16 |
|
Stream Numbers |
UINT16 |
? |
Group Name:
|
Field Name |
Field Type |
Size (bits) |
|
Language ID Index |
UINT |
16 |
|
Group Name Length |
UINT |
16 |
|
Group Name |
Unicode (UINT16) |
? |
Notes:
See the Language List Object (Section 5.16) for more details concerning how to use the Language ID Index field.
Media streams, which have been grouped into Group Names-named logical units, are grouped by enumerating their stream numbers in the Stream Numbers field. The Stream Count field identifies how many media streams are enumerated within the Stream Numbers field.
The Group Name Length field indicates the number of Unicode “characters” that are found within Group Name field.
Scalable ObjectMandatory: No
Quantity: 0 - n
This object stores the dependency relationships between all of the media streams that comprise logical bands of the same scalable media. It can be used for scalable audio and video, as well as other types of scalable streams. Along with the dependency relationships among the streams, this object stores a default sequence in which the streams should be used when implementations are doing dynamic bandwidth scaling.
Object Structure:
The object consists of a list of Dependency Info “structures" for each stream that comprises a logical band of the same scalable stream.
A Dependency Info “structure” (in other words, the Dependency Record) contains:
The object also contains an author-determined default sequence (in other words, the Default Sequence Record) that indicates the preferential order in which the streams should be used (in other words, items listed first should, by default, be used first). Each entry in this list consists of the following two fields:
|
Field Name |
Field Type |
Size (bits) |
|
Object ID |
GUID |
128 |
|
Object Size |
UINT |
64 |
|
Record Count |
UINT |
16 |
|
Default Sequence Records |
See below |
? |
|
Dependency Records |
See below |
? |
Default Sequence Record:
|
Field Name |
Field Type |
Size (bits) |
|
Stream Number |
UINT |
16 |
|
Enhancement Type |
GUID |
128 |
Dependency Record:
|
Field Name |
Field Type |
Size (bits) |
|
Stream Number |
UINT |
16 |
|
Dependent Stream Count |
UINT |
16 |
|
Dependent Stream Numbers |
UINT16 |
? |
The Record Count field stores both the number of Default Sequence Records and the number of Dependency Records (in other words, the same number of each). This number is equivalent to the number of streams involved in this scaleability relationship.
Possible Enhancement GUID Values are None, Unknown, Temporal, Spatial, Quality, Stereo (Audio), and Frequency Response (Audio).
Mandatory: No
Quantity: 0 or 1
This object indicates the author’s intentions as to which streams should or should not be dropped in response to varying network congestion situations. There may be special cases where this preferential order may be ignored (for example, the user hits the “mute” button). However, generally it is expected that implementations will try to honor the author’s preference.
Priority determinations are made solely with reference to base streams (in other words, this includes non-scalable streams and the base layer only of scalable streams). The author can indicate their preference as to what should happen to enhancement layer streams by means of the bandwidth restriction field.
The priority of each stream is indicated by how early in the list that stream’s stream number is listed (in other words, the list is ordered in terms of decreasing priority). Two additional fields provide associated information:
Streams in a mutual exclusion relationship with each other (for example, languages) should all be listed in adjacent order (in other words, priority n, n+1, n+2, and so on), sorted in decreasing order of maximum stream bandwidth. When bandwidth calculations are made, only the bandwidth used by the selected stream in a mutual exclusion relationship will be computed; each non-selected stream in such a relationship will be ignored. This combination of prioritization and mutual exclusion can be used to create scalable content even though scalable codecs have not been used by means of creating multiple distinct media stream instances of the “same content,” each at different bandwidths.
Object Structure:
|
Field Name |
Field Type |
Size (bits) |
|
Object ID |
GUID |
128 |
|
Object Size |
UINT |
64 |
|
Priority Record Count |
UINT |
16 |
|
Priority Records |
See below |
? |
Priority Record:
|
Field Name |
Field Type |
Size (bits) |
||
|
Stream Number |
UINT |
16 |
||
|
Priority Flags |
UINT |
16 |
||
|
Mandatory |
1 (LSB) |
|||
|
Reserved |
15 |
|||
|
Bandwidth Restriction |
UINT |
32 |
||
Notes:
Priority Records are listed in order of decreasing priority.
The Stream Number should only specify the base stream (if it is scalable).
Bandwidth Restriction is in bits per second. A value of 0 (zero) indicates “no restriction.”
Mandatory: No
Quantity: 0 - n
This object identifies streams that have a mutual exclusion relationship to each other (in other words, only one of the streams within such a relationship can be streamed – the rest are ignored). There should be one instance of this object for each set of objects that contain a mutual exclusion relationship. The exclusion type is used so that implementations can allow user selection of common choices, such as language.
Object Structure:
|
Field Name |
Field Type |
Size (bits) |
|
Object ID |
GUID |
128 |
|
Object Size |
UINT |
64 |
|
Exclusion Type |
GUID |
128 |
|
Stream Number Count |
UINT |
16 |
|
Stream Numbers |
UINT16 |
? |
Notes:
The Exclusion Type identifies the nature of that mutual exclusion relationship (for example, language).
The Stream Number Count indicates how many Stream Numbers are in the Stream Numbers list. Each of the media streams in this list is in a mutual exclusion relationship with the others.
Mandatory: No
Quantity: 0 or 1
This object provides the capability for an author to identify dependencies between different media types. An example of such a relationship would be to specify that a video effects stream will be presented only if a certain enhancement layer of a video codec is also currently being presented. Another example is binding a timecode media stream to another media stream to provide alternate timecodes for that other stream’s data.
Object Structure:
List of Dependency Info “structures” for any stream involved in an inter-media dependency relationship.
|
Field Name |
Field Type |
Size (bits) |
|
Object ID |
GUID |
128 |
|
Object Size |
UINT |
64 |
|
Dependency Record Count |
UINT |
16 |
|
Dependency Records |
See Section 5.9 |
? |
Notes:
The Dependency Record structure is given in Section 5.9.
The Dependency Record Count indicates the number of Dependency Records present.
Should multiple dependencies be listed within the Dependent Stream Numbers fields of a single Dependency Record, these dependencies are in a Boolean AND relationship to each other (in other words, the stream number is dependent upon x AND y). Boolean OR relationships (in other words, the stream number is dependent upon x OR y) are indicated by having multiple Dependency Record entries, each having the same Stream Number value in the Stream Number field of the Dependency Record. Streams that are dependent upon either one stream or another, or optionally both, are said to be in an OR dependency relationship.
Mandatory: No
Quantity: 0 or 1
This object contains W3C-defined Platform for Internet Content Selection (PICS) information (see references [1] and [2]). PICS establishes Internet conventions for label formats. It thus provides a basis for specifying the rating of the multimedia content within an ASF file. This object does not specify the specific rating service that is to be used. The content creator is consequently able to use the rating service of their choice, as long as it is specified according to the PICS conventions.
Object Structure:
|
Field Name |
Field Type |
Size (bits) |
|
Object ID |
GUID |
128 |
|
Object Size |
UINT |
64 |
|
PICS Data |
UINT8 |
? |
Note:
PICS information is stored as opaque data in an RFC 822-conformant format (see reference [3]).
Mandatory: Yes if index is present in file; Otherwise no.
Quantity: 0 or 1
This object supplies a sufficient amount of information to regenerate the index for an ASF file should the original index have been omitted or deleted. It includes only information about those streams that are actually indexed (there must be at least one stream in an index).
Object Structure:
|
Field Name |
Field Type |
Size (bits) |
|
Object ID |
GUID |
128 |
|
Object Size |
UINT |
64 |
|
Index Entry Time Interval |
UINT |
32 |
|
Index Specifier Count |
UINT |
16 |
|
Index Specifiers |
See below |
? |
Index Specifier:
|
Field Name |
Field Type |
Size (bits) |
|
Stream Number |
UINT |
16 |
|
Index Type |
UINT |
16 |
Notes:
The Index Entry Time Interval is in milliseconds.
The Index Specifier Count field identifies how many Index Specifier entries exist within the Index Specifiers field.
Every Index Type requires all index entry offsets to be to a data unit boundary of an ASF Data Unit containing data for the specified Stream Number. Also, the send time of that data unit must not exceed the time of the index entry, which is a presentation time.
Index Type values are as follows: 1 = Nearest Data Unit, 2 = Nearest Object, and 3 = Nearest Clean Point. The Nearest Data Unit indexes point to the data unit whose presentation time is closest to the index entry time. The Nearest Object indexes point to the closest data unit containing an entire object or first fragment of an object. The Nearest Clean Point indexes point to the closest data unit containing an entire object (or first fragment of an object) that has the Clean Point Flag set.

Mandatory: No
Quantity: 0 to n
This object contains a color table that is used by one or more media streams. For purposes of reference, each color table is given a unique identifier for reference purposes.
Object Structure:
|
Field Name |
Field Type |
Size (bits) |
|
Object ID |
GUID |
128 |
|
Object Size |
UINT |
64 |
|
Color Table ID |
GUID |
128 |
|
Color Table Record Count |
UINT |
16 |
|
Color Table Record |
See below |
? |
Color Table Record:
|
Field Name |
Field Type |
Size (bits) |
|
Red |
UINT |
8 |
|
Green |
UINT |
8 |
|
Blue |
UINT |
8 |