Skip to content

Audio::Scan - Fast Perl XS metadata and tag reader for all common audio file formats

License

Notifications You must be signed in to change notification settings

nabertrand/Audio-Scan

 
 

Repository files navigation

NAME
    Audio::Scan - Fast C metadata and tag reader for all common audio file
    formats

SYNOPSIS
        use Audio::Scan;

        my $data = Audio::Scan->scan('/path/to/file.mp3');

        # Just file info
        my $info = Audio::Scan->scan_info('/path/to/file.mp3');

        # Just tags
        my $tags = Audio::Scan->scan_tags('/path/to/file.mp3');
    
        # Scan without reading (possibly large) artwork into memory.
        # Instead of binary artwork data, the size of the artwork will be returned instead.
        {
            local $ENV{AUDIO_SCAN_NO_ARTWORK} = 1;
            my $data = Audio::Scan->scan('/path/to/file.mp3');
        }
    
        # Scan a filehandle
        open my $fh, '<', 'my.mp3';
        my $data = Audio::Scan->scan_fh( mp3 => $fh );
        close $fh;
    
        # Scan and compute an audio MD5 checksum
        my $data = Audio::Scan->scan( '/path/to/file.mp3', { md5_size => 100 * 1024 } );
        my $md5 = $data->{info}->{audio_md5};

DESCRIPTION
    Audio::Scan is a C-based scanner for audio file metadata and tag
    information. It currently supports MP3, MP4, Ogg Vorbis, FLAC, ASF, WAV,
    AIFF, Musepack, Monkey's Audio, and WavPack.

    See below for specific details about each file format.

METHODS
  scan( $path, [ \%OPTIONS ] )
    Scans $path for both metadata and tag information. The type of scan
    performed is determined by the file's extension. Supported extensions
    are:

        MP3:  mp3, mp2
        MP4:  mp4, m4a, m4b, m4p, m4v, m4r, k3g, skm, 3gp, 3g2, mov
        AAC (ADTS): aac
        Ogg:  ogg, oga
        FLAC: flc, flac, fla
        ASF:  wma, wmv, asf
        Musepack:  mpc, mpp, mp+
        Monkey's Audio:  ape, apl
        WAV: wav
        AIFF: aiff, aif
        WavPack: wv

    This method returns a hashref containing two other hashrefs: info and
    tags. The contents of the info and tag hashes vary depending on file
    format, see below for details.

    An optional hashref may be provided. Currently this supports one item:

        md5_size => $audio_bytes_to_checksum

    An MD5 will be computed of the first N audio bytes. Any tags in the file
    are automatically skipped, so this is a useful way of determining if a
    file's audio content is the same even if tags may have been changed. The
    hex MD5 value is returned in the $info->{audio_md5} key. This option
    will reduce performance, so choose a small enough size that works for
    you, you should probably avoid using more than 64K for example.

  scan_info( $path, [ \%OPTIONS ] )
    If you only need file metadata and don't care about tags, you can use
    this method.

  scan_tags( $path, [ \%OPTIONS ] )
    If you only need the tags and don't care about the metadata, use this
    method.

  scan_fh( $type => $fh, [ \%OPTIONS ] )
    Scans a filehandle. $type is the type of file to scan as, i.e. "mp3" or
    "ogg". Note that FLAC does not support reading from a filehandle.

  find_frame( $path, $timestamp_in_ms )
    Returns the byte offset to the first audio frame starting from the given
    timestamp (in milliseconds).

    MP3, Ogg, FLAC, ASF, MP4
        The byte offset to the data packet containing this timestamp will be
        returned. For file formats that don't provide timestamp information
        such as MP3, the best estimate for the location of the timestamp
        will be returned. This will be more accurate if the file has a Xing
        header or is CBR for example.

    WAV, AIFF, Musepack, Monkey's Audio, WavPack
        Not yet supported by find_frame.

  find_frame_return_info( $mp4_path, $timestamp_in_ms )
    The header of an MP4 file contains various metadata that refers to the
    structure of the audio data, making seeking more difficult to perform.
    This method will return the usual $info hash with 2 additional keys:

        seek_offset - The seek offset in bytes
        seek_header - A rewritten MP4 header that can be prepended to the audio data
                      found at seek_offset to construct a valid bitstream. Specifically,
                      the following boxes are rewritten: stts, stsc, stsz, stco

    For example, to seek 30 seconds into a file and write out a new MP4 file
    seeked to this point:

        my $info = Audio::Scan->find_frame_return_info( $file, 30000 );
    
        open my $f, '<', $file;
        sysseek $f, $info->{seek_offset}, 1;

        open my $fh, '>', 'seeked.m4a';
        print $fh $info->{seek_header};

        while ( sysread( $f, my $buf, 65536 ) ) {
            print $fh $buf;
        }

        close $f;
        close $fh;

  find_frame_fh( $type => $fh, $offset )
    Same as "find_frame", but with a filehandle.

  find_frame_fh_return_info( $type => $fh, $offset )
    Same as "find_frame_return_info", but with a filehandle.

  has_flac()
    Deprecated. Always returns 1 now that FLAC is always enabled.

  is_supported( $path )
    Returns 1 if the given path can be scanned by Audio::Scan, or 0 if not.

  get_types()
    Returns an array of strings of the file types supported by Audio::Scan.

  extensions_for( $type )
    Returns an array of strings of the file extensions that are considered
    to be the file type *$type*.

  type_for( $extension )
    Returns file type for a given extension. Returns *undef* for unsupported
    extensions.

MP3
  INFO
    The following metadata about a file may be returned:

        id3_version (i.e. "ID3v2.4.0")
        song_length_ms (duration in milliseconds)
        layer (i.e. 3)
        stereo
        samples_per_frame
        padding
        audio_size (size of all audio frames)
        audio_offset (byte offset to first audio frame)
        bitrate (in bps, determined using Xing/LAME/VBRI if possible, or average in the worst case)
        samplerate (in kHz)
        vbr (1 if file is VBR)

        If a Xing header is found:
        xing_frames
        xing_bytes
        xing_quality

        If a VBRI header is found:
        vbri_delay
        vbri_frames
        vbri_bytes
        vbri_quality

        If a LAME header is found:
        lame_encoder_version
        lame_tag_revision
        lame_vbr_method
        lame_lowpass
        lame_replay_gain_radio
        lame_replay_gain_audiophile
        lame_encoder_delay
        lame_encoder_padding
        lame_noise_shaping
        lame_stereo_mode
        lame_unwise_settings
        lame_source_freq
        lame_surround
        lame_preset

  TAGS
    Raw tags are returned as found. This means older tags such as ID3v1 and
    ID3v2.2/v2.3 are converted to ID3v2.4 tag names. Multiple instances of a
    tag in a file will be returned as arrays. Complex tags such as APIC and
    COMM are returned as arrays. All tag fields are converted to upper-case.
    All text is converted to UTF-8.

    Sample tag data:

        tags => {
              ALBUMARTISTSORT => "Solar Fields",
              APIC => [ "image/jpeg", 3, "", <binary data snipped> ],
              CATALOGNUMBER => "INRE 017",
              COMM => ["eng", "", "Amazon.com Song ID: 202981429"],
              "MUSICBRAINZ ALBUM ARTIST ID" => "a2af1f31-c9eb-4fff-990c-c4f547a11b75",
              "MUSICBRAINZ ALBUM ID" => "282143c9-6191-474d-a31a-1117b8c88cc0",
              "MUSICBRAINZ ALBUM RELEASE COUNTRY" => "FR",
              "MUSICBRAINZ ALBUM STATUS" => "official",
              "MUSICBRAINZ ALBUM TYPE" => "album",
              "MUSICBRAINZ ARTIST ID" => "a2af1f31-c9eb-4fff-990c-c4f547a11b75",
              "REPLAYGAIN_ALBUM_GAIN" => "-2.96 dB",
              "REPLAYGAIN_ALBUM_PEAK" => "1.045736",
              "REPLAYGAIN_TRACK_GAIN" => "+3.60 dB",
              "REPLAYGAIN_TRACK_PEAK" => "0.892606",
              TALB => "Leaving Home",
              TCOM => "Magnus Birgersson",
              TCON => "Ambient",
              TCOP => "2005 ULTIMAE RECORDS",
              TDRC => "2004-10",
              TIT2 => "Home",
              TPE1 => "Solar Fields",
              TPE2 => "Solar Fields",
              TPOS => "1/1",
              TPUB => "Ultimae Records",
              TRCK => "1/11",
              TSOP => "Solar Fields",
              UFID => [
                    "http://musicbrainz.org",
                    "1084278a-2254-4613-a03c-9fed7a8937ca",
              ],
        },

MP4
  INFO
    The following metadata about a file may be returned:

        audio_offset (byte offset to start of mdat)
        audio_size
        compatible_brands
        file_size
        leading_mdat (if file has mdat before moov)
        major_brand
        minor_version
        song_length_ms
        timescale
        tracks (array of tracks in the file)
            Each track may contain:
        
            audio_type
            avg_bitrate
            bits_per_sample
            channels
            duration
            encoding
            handler_name
            handler_type
            id
            max_bitrate
            samplerate

  TAGS
    Tags are returned in a hash with all keys converted to upper-case. Keys
    starting with 0xA9 (copyright symbol) will have this character stripped
    out. Sample tag data:

        tags => {
           AART              => "Album Artist",
           ALB               => "Album",
           ART               => "Artist",
           CMT               => "Comments",
           COVR              => <binary data snipped>,
           CPIL              => 1,
           DAY               => 2009,
           DESC              => "Video Description",
           DISK              => "1/2",
           "ENCODING PARAMS" => "vers\0\0\0\1acbf\0\0\0\2brat\0\1w\0cdcv\0\1\6\5",
           GNRE              => "Jazz",
           GRP               => "Grouping",
           ITUNNORM          => " 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000",
           ITUNSMPB          => " 00000000 00000840 000001E4 00000000000001DC 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000",
           LYR               => "Lyrics",
           NAM               => "Name",
           PGAP              => 1,
           SOAA              => "Sort Album Artist",
           SOAL              => "Sort Album",
           SOAR              => "Sort Artist",
           SOCO              => "Sort Composer",
           SONM              => "Sort Name",
           SOSN              => "Sort Show",
           TMPO              => 120,
           TOO               => "iTunes 8.1.1, QuickTime 7.6",
           TRKN              => "1/10",
           TVEN              => "Episode ID",
           TVES              => 12,
           TVSH              => "Show",
           TVSN              => 12,
           WRT               => "Composer",
        },

AAC (ADTS)
  INFO
    The following metadata about a file is returned:

        audio_offset
        audio_size
        bitrate (in bps)
        channels
        file_size
        profile (Main, LC, or SSR)
        samplerate (in kHz)
        song_length_ms (duration in milliseconds)

OGG VORBIS
  INFO
    The following metadata about a file is returned:

        version
        channels
        stereo
        samplerate (in kHz)
        bitrate_average (in bps)
        bitrate_upper
        bitrate_nominal
        bitrate_lower
        blocksize_0
        blocksize_1
        audio_offset (byte offset to audio)
        audio_size
        song_length_ms (duration in milliseconds)

  TAGS
    Raw Vorbis comments are returned. All comment keys are capitalized.

FLAC
  INFO
    The following metadata about a file is returned:

        channels
        samplerate (in kHz)
        bitrate (in bps)
        file_size
        audio_offset (byte offset to first audio frame)
        audio_size
        song_length_ms (duration in milliseconds)
        bits_per_sample
        frames
        minimum_blocksize
        maximum_blocksize
        minimum_framesize
        maximum_framesize
        md5
        total_samples

  TAGS
    Raw FLAC comments are returned. All comment keys are capitalized. Some
    data returned is special:

    APPLICATION

        Each application block is returned in the APPLICATION tag keyed by application ID.

    CUESHEET_BLOCK

        The CUESHEET_BLOCK tag is an array containing each line of the cue sheet.

    ALLPICTURES

        Embedded pictures are returned in an ALLPICTURES array.  Each picture has the following metadata:
    
            mime_type
            description
            width
            height
            depth
            color_index
            image_data
            picture_type

ASF (Windows Media Audio/Video)
  INFO
    The following metadata about a file may be returned. Reading the ASF
    spec is encouraged if you want to find out more about any of these
    values.

        audio_offset (byte offset to first data packet)
        audio_size
        broadcast (boolean, whether the file is a live broadcast or not)
        codec_list (array of information about codecs used in the file)
        creation_date (UNIX timestamp when file was created)
        data_packets
        drm_key
        drm_license_url
        drm_protection_type
        drm_data
        file_id (unique file ID)
        file_size
        index_blocks
        index_entry_interval (in milliseconds)
        index_offsets (byte offsets for each second of audio, per stream. Useful for seeking)
        index_specifiers (indicates which stream a given index_offset points to)
        language_list (array of languages referenced by the file's metadata)
        lossless (boolean)
        max_bitrate
        max_packet_size
        min_packet_size
        mutex_list (mutually exclusive stream information)
        play_duration_ms
        preroll
        script_commands
        script_types
        seekable (boolean, whether the file is seekable or not)
        send_duration_ms
        song_length_ms (the actual length of the audio, in milliseconds)

    STREAMS

    The streams array contains metadata related to an individul stream
    within the file. The following metadata may be returned:

        DeviceConformanceTemplate
        IsVBR
        alt_bitrate
        alt_buffer_fullness
        alt_buffer_size
        avg_bitrate (most accurate bitrate for this stream)
        avg_bytes_per_sec (audio only)
        bitrate
        bits_per_sample (audio only)
        block_alignment (audio only)
        bpp (video only)
        buffer_fullness
        buffer_size
        channels (audio only)
        codec_id (audio only)
        compression_id (video only)
        encode_options
        encrypted (boolean)
        error_correction_type
        flag_seekable (boolean)
        height (video only)
        index_type
        language_index (offset into language_list array)
        max_object_size
        samplerate (in kHz) (audio only)
        samples_per_block
        stream_number
        stream_type
        super_block_align
        time_offset
        width (video only)

  TAGS
    Raw tags are returned. Tags that occur more than once are returned as
    arrays. In contrast to the other formats, tag keys are NOT capitalized.
    There is one special key:

    WM/Picture

    Pictures are returned as a hash with the following keys:

        image_type (numeric type, same as ID3v2 APIC)
        mime_type
        description
        image

WAV
  INFO
    The following metadata about a file may be returned.

        audio_offset
        audio_size
        bitrate (in bps)
        bits_per_sample
        block_align
        channels
        file_size
        format (WAV format code, 1 == PCM)
        id3_version (if an ID3v2 tag is found)
        samplerate (in kHz)
        song_length_ms

  TAGS
    WAV files can contain several different types of tags. "Native" WAV tags
    found in a LIST block may include these and others:

        IARL - Archival Location
        IART - Artist
        ICMS - Commissioned
        ICMT - Comment
        ICOP - Copyright
        ICRD - Creation Date
        ICRP - Cropped
        IENG - Engineer
        IGNR - Genre
        IKEY - Keywords
        IMED - Medium
        INAM - Name (Title)
        IPRD - Product (Album)
        ISBJ - Subject
        ISFT - Software
        ISRC - Source
        ISRF - Source Form
        TORG - Label
        LOCA - Location
        TVER - Version
        TURL - URL
        TLEN - Length
        ITCH - Technician
        TRCK - Track
        ITRK - Track

    ID3v2 tags can also be embedded within WAV files. These are returned
    exactly as for MP3 files.

AIFF
  INFO
    The following metadata about a file may be returned.

        audio_offset
        audio_size
        bitrate (in bps)
        bits_per_sample
        block_align
        channels
        compression_name (if AIFC)
        compression_type (if AIFC)
        file_size
        id3_version (if an ID3v2 tag is found)
        samplerate (in kHz)
        song_length_ms

  TAGS
    ID3v2 tags can be embedded within AIFF files. These are returned exactly
    as for MP3 files.

MONKEY'S AUDIO (APE)
  INFO
    The following metadata about a file may be returned.

        audio_offset
        audio_size
        bitrate (in bps)
        channels
        compression
        file_size
        samplerate (in kHz)
        song_length_ms
        version

  TAGS
    APEv2 tags are returned as a hash of key/value pairs.

MUSEPACK
  INFO
    The following metadata about a file may be returned.

        audio_offset
        audio_size
        bitrate (in bps)
        channels
        encoder
        file_size
        profile
        samplerate (in kHz)
        song_length_ms

  TAGS
    Musepack uses APEv2 tags. They are returned as a hash of key/value
    pairs.

WAVPACK
  
    The following metadata about a file may be returned.

        audio_offset
        audio_size
        bitrate (in bps)
        bits_per_sample
        channels
        encoder_version
        file_size
        hybrid (1 if file is lossy) (v4 only)
        lossless (1 if file is lossless) (v4 only)
        samplerate
        song_length_ms
        total_samples

  TAGS
    WavPack uses APEv2 tags. They are returned as a hash of key/value pairs.


THANKS
    Some code from the Rockbox project was very helpful in implementing ASF
    and MP4 seeking.

    Some of the file format parsing code was derived from the mt-daapd
    project, and adapted by Netgear. It has been heavily rewritten to fix
    bugs and add more features.

    The source to the original Netgear C scanner for SqueezeCenter is
    located at
    <http://svn.slimdevices.com/repos/slim/7.3/trunk/platforms/readynas/cont
    rib/scanner>

    The audio MD5 feature uses an MD5 implementation by L. Peter Deutsch,
    <[email protected]>.

SEE ALSO
    ASF Spec
    <http://www.microsoft.com/windows/windowsmedia/forpros/format/asfspec.as
    px>

    MP4 Info:
    <http://standards.iso.org/ittf/PubliclyAvailableStandards/c051533_ISO_IE
    C_14496-12_2008.zip>
    <http://www.geocities.com/xhelmboyx/quicktime/formats/mp4-layout.txt>

AUTHORS
    Andy Grundman, <[email protected]>

    Dan Sully, <[email protected]>

COPYRIGHT AND LICENSE
    Copyright (C) 2010 Logitech, Inc.

    This program is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by the
    Free Software Foundation; either version 2 of the License, or (at your
    option) any later version.

About

Audio::Scan - Fast Perl XS metadata and tag reader for all common audio file formats

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 76.4%
  • Perl 22.2%
  • XS 1.3%
  • Raku 0.1%