Development

Basics

Listing all the files in a directory

The first task is to list all the files in a directory and their sizes:

Step 1. get an IMG_INFO object

img = pytsk3.Img_Info(url)

Step 2. Open the file system

fs = pytsk3.FS_Info(img)

Step 3. Open the directory node this will open the node based on path or inode as specified.

directory = fs.open_dir(path=path, inode=inode)

Step 4. Iterate over all files in the directory and print their name. What you get in each iteration is a proxy object for the TSK_FS_FILE struct - you can further dereference this struct into a TSK_FS_NAME and TSK_FS_META structs.

for f in directory:
    print(f.info.meta.size, f.info.name.name)

The specified url can be any URL that TSK understands. Note that TSK automatically knows about EWF files and a regular dd files. See section below on "Extending Img_Info" to support other image types.

The directory can be opened by either path or inode. If path is None (or unspecified) we use the inode. An inode has to be an integer (All the bound methods implement sanity checking and will raise if you provide the wrong types of args).

You can iterate over the directory to receive all the File objects within it. Each File object is just a proxy for TSK_FS_FILE struct which can be obtained through the "info" member. Note that the TSK_FS_FILE struct contains links to a TSK_FS_META and TSK_FS_NAME structs. We just pick specific members of these structs to print.

Reading a file

Now we want to read a file out and write it to stdout (basically the same as icat).

Step 1: get an IMG_INFO object

img = pytsk3.Img_Info(url)

Step 2: Open the file system

fs = pytsk3.FS_Info(img)

Step 3: Open the file using the inode

f = fs.open_meta(inode = inode)

Step 4: Read all the data and print to stdout

offset = 0
size = f.info.meta.size
BUFF_SIZE = 1024 * 1024

while offset < size:
    available_to_read = min(BUFF_SIZE, size - offset)
    data = f.read_random(offset, available_to_read)
    if not data: break

    offset += len(data)
    print(data)

Note that we go into some length to not read the slack here. This is due to an early bug in TSK which should be fixed by now.

List all the blocks allocated for a file

We want to list all the blocks that a file allocates (kind of like istat).

Step 1: get an IMG_INFO object (url can be any URL that AFF4 can handle)

img = pytsk3.Img_Info(url)

Step 2: Open the file system

fs = pytsk3.FS_Info(img)

Step 3: Open the file using the inode

f = fs.open_meta(inode = inode)

Step 4: List all blocks allocated by this file. Note that in some file systems each file has several attributes and each can allocates multiple blocks. So we really need to iterate over all attributes of each file:

for attr in f:
    print("Attribute %s, type %s, id %s" % (attr.info.name,
                                            attr.info.type,
                                            attr.info.id))
    for run in attr:
        print("   Blocks %s to %s (%s blocks)" % (run.addr, run.addr + run.len, run.len))

Example output:

Attribute N/A, type TSK_FS_ATTR_TYPE_NTFS_SI, id 0
Attribute N/A, type TSK_FS_ATTR_TYPE_NTFS_FNAME, id 3
Attribute N/A, type TSK_FS_ATTR_TYPE_NTFS_FNAME, id 2
Attribute $Data, type TSK_FS_ATTR_TYPE_NTFS_DATA, id 4
   Blocks 89471 to 89477 (6 blocks)
   Blocks 89487 to 89493 (6 blocks)
   Blocks 90023 to 90076 (53 blocks)

Extending Img_Info

Sometimes we want to use image formats that are not available to TSK natively. We have seen that in order to obtain the FS_Info object, we must supply it with a valid Img_Info object. It is possible to extend TSK's support for different image formats by creating a different Img_Info object that TSK can use when opening a file system on it.

The Python wrappers are fully extensible. For example, the following implements an AFF4 image class:

## This is the AFF4 resolver we will use
oracle = pyaff4.Resolver()

class AFF4ImgInfo(pytsk3.Img_Info):
    def __init__(self, url):
        ## Open the image using the AFF4 library
        urn = pyaff4.RDFURN(url)
        self.fd = oracle.open(urn, 'r')
        if not self.fd:
            raise IOError("Unable to open %s" % url)

        ## Call the base class with an empty URL
        pytsk3.Img_Info.__init__(self, '')

    def get_size(self):
        """ This function returns the size of the image """
        return self.fd.size.value

    def read(self, off, length):
        """ This returns byte ranges from the image """
        self.fd.seek(off)
        return self.fd.read(length)

    def close(self):
        """ This is called when we want to close the image """
        self.fd.close()

Step 1: get an IMG_INFO object (url can be any URL that AFF4 can handle)

img = AFF4ImgInfo(url)

Step 2: Open the file system

fs = pytsk3.FS_Info(img, offset=options.offset)

...

As can be seen an Img_Info class simply must define the read and get_size methods to be a fully functional Img_Info. We then instantiate this object, and pass it to FS_Info which automatically uses the Python implementation to access the image.

In this way we can provide the SleuthKit with a virtualized image format, allowing for multiple format support.

Obtaining the version

pytsk3 (as of February 17, 2014) comes with 2 different version numbers:

the SleuthKit version it was built against;
the version of the pytsk3 code.

To obtain the SleuthKit version:

import pytsk3

print(pytsk3.TSK_VERSION_STR)

To obtain the pytsk3 version:

import pytsk3

print(pytsk3.get_version())

Iterating volumes

This is an example of how to replicate the mmls functionality.

import pytsk3

img = pytsk3.Img_Info(url="/path/to/image/file")
volume = pytsk3.Volume_Info(img)

for part in volume:
    print(part.addr, part.desc.decode('utf-8'), part.start, part.len)

Note: at the moment the volume object is an iterator object that will change state if accessed

To retain the state e.g. in a nested construction use:

list(volume)

Note: that part.desc contains an UTF-8 encoded byte stream.

Accessing a file system

To access a file system via pytsk3 run:

import pytsk3

img = pytsk3.Img_Info(url="/path/to/image/file")
fs = pytsk3.FS_Info(img)

When you try to access a file system make sure that the image object (img) contains a file system. If not you can pass pytsk3.FS_Info a byte offset of where the volume containing the file system starts, e.g.

fs_offset = 2048 * 512
fs = pytsk3.FS_Info(img, offset=fs_offset)

Getting a file entry

The term "file entry" is used to indicates any "entry" the file system can define e.g. a file, a directory, a symbolic link, etc.

To access a specific file entry within a file system by path:

file_entry = fs.open("/Windows/MyFile.txt")

A file entry can also be accessed based on its "inode":

file_entry = fs.open_meta(inode=15)

Note: that the term inode applied to the abstraction the SleuthKit provides, which can also apply to file systems that do not define inodes e.g. NTFS.

Note that there is a difference between open and open_meta, from: http://www.sleuthkit.org/sleuthkit/docs/api-docs/fspage.html

The tsk_fs_file_open_meta() function takes a metadata address as an argument and returns a TSK_FS_FILE structure. The TSK_FS_FILE::name pointer will be NULL because the file name was not used to open the file and, for efficiency, TSK does not search the directory tree to locate the file name that points to the metadata address.

Determining the file entry type

The file entry type is stored in the attribute:

file_entry.info.meta.type

Note that not every file entry necessarily has the .info or .info.meta property.

This attribute contains a value of pytsk3.TSK_FS_META_TYPE_ENUM, e.g. to determine if a file entry is a "regular" file.

if file_entry.info.meta.type == pytsk3.TSK_FS_META_TYPE_REG:
    print("A file")
else:
    print("Not a file")

Determining the file entry address

The file entry address is stored in the attribute:

file_entry.info.meta.addr

Note that not every file entry necessarily has the .info or .info.meta property.

The address is also referred to as the inode by the SleuthKit.

Determining the file entry timestamps

The timestamp values contain the number of seconds since January 1, 1970 00:00:00, which often are normalized to UTC but not guaranteed. TODO check if the SleuthKit has an option to pass a timezone.

The file entry access time is stored in the attribute:

file_entry.info.meta.atime

The file entry change time (or entry modification time) is stored in the attribute:

file_entry.info.meta.ctime

The file entry modification time is stored in the attribute:

file_entry.info.meta.mtime

The file entry creation time (or birth time) is stored in the attribute:

file_entry.info.meta.crtime

For file systems that provide a larger granularity there is a nano-attribute available e.g. for the atime:

file_entry.info.meta.atime_nano

Note that in SleuthKit 4.2.0 the granularity of the nano-attribute was granularity from 100 nano seconds to 1 nano seconds.

nano_value = file_entry.info.meta.atime_nano
if pytsk3.TSK_VERSION_NUM < 0x040200ff:
  nano_value *= 100

Note that the SleuthKit can have conversions issues with timestamps outside the range 1970 - 2038 for NTFS.

Also see: https://github.com/sleuthkit/sleuthkit/pull/323

Accessing a directory

for directory_entry in fs.open_dir(path="/Windows"):
    directory_entry = directory_entry.info.name.name
    try:
      print(directory_entry.decode("utf8"))
    except UnicodeError:
      pass

Note that not every directory entry necessarily has the .info or .info.name property.

Note that not every directory entry is allocated. Unallocated directory entries either have no .info.meta member or the .info.name.flags member has TSK_FS_NAME_FLAG_UNALLOC set.

Note that (at the moment January 19, 2014) directory_entry.info.name.name contains a UTF-8 formatted binary string.

Note that this will include the directory entries "." (self), ".." (parent) and entries recovered by the SleuthKit.

Note that the SleuthKit will also expose virtual directories like "/$OrphanFiles"

Accessing a alternate data stream (NTFS)

ads_attribute = None
for attribute in file_entry:
  if attribute.info.name == name_ads:
    ads_attribute = attribute
    break

if ads_attribute:
  file_entry.read_random(
      offset, size, ads_attribute.info.type, ads_attribute.info.id)

The underlying behavior of the SleuthKit for file_entry.read_random() seems to be to read the first available $DATA NTFS attribute in case no default (nameless) $DATA NTFS attribute is available e.g. $Extend$UsnJrnl.

Accessing symbolic links

The file entry symbolic link is stored in the attribute:

file_entry.info.meta.link

It appears that the SleuthKit does not expose NTFS IO_REPARSE_TAG_MOUNT_POINT or IO_REPARSE_TAG_SYMLINK as a link.

Also see

import pytsk3

help(pytsk3)
help(pytsk3.Img_Info)
help(pytsk3.Volume_Info)
help(pytsk3.FS_Info)

Code examples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development

Basics

Listing all the files in a directory

Reading a file

List all the blocks allocated for a file

Extending Img_Info

Obtaining the version

Iterating volumes

Accessing a file system

Getting a file entry

Determining the file entry type

Determining the file entry address

Determining the file entry timestamps

Accessing a directory

Accessing a alternate data stream (NTFS)

Accessing symbolic links

Also see

Clone this wiki locally