-
Notifications
You must be signed in to change notification settings - Fork 24
Development
The first task is to list all the files in a directory and their sizes:
Step 1. get an IMG_INFO object
img = pytsk3.Img_Info(url)
Step 2. Open the file system
fs = pytsk3.FS_Info(img)
Step 3. Open the directory node this will open the node based on path or inode as specified.
directory = fs.open_dir(path=path, inode=inode)
Step 4. Iterate over all files in the directory and print their name. What you get in each iteration is a proxy object for the TSK_FS_FILE struct - you can further dereference this struct into a TSK_FS_NAME and TSK_FS_META structs.
for f in directory:
print(f.info.meta.size, f.info.name.name)
The specified url can be any URL that TSK understands. Note that TSK automatically knows about EWF files and a regular dd files. See section below on "Extending Img_Info" to support other image types.
The directory can be opened by either path or inode. If path is None (or unspecified) we use the inode. An inode has to be an integer (All the bound methods implement sanity checking and will raise if you provide the wrong types of args).
You can iterate over the directory to receive all the File objects within it. Each File object is just a proxy for TSK_FS_FILE struct which can be obtained through the "info" member. Note that the TSK_FS_FILE struct contains links to a TSK_FS_META and TSK_FS_NAME structs. We just pick specific members of these structs to print.
Now we want to read a file out and write it to stdout (basically the same as icat).
Step 1: get an IMG_INFO object
img = pytsk3.Img_Info(url)
Step 2: Open the file system
fs = pytsk3.FS_Info(img)
Step 3: Open the file using the inode
f = fs.open_meta(inode = inode)
Step 4: Read all the data and print to stdout
offset = 0
size = f.info.meta.size
BUFF_SIZE = 1024 * 1024
while offset < size:
available_to_read = min(BUFF_SIZE, size - offset)
data = f.read_random(offset, available_to_read)
if not data: break
offset += len(data)
print(data)
Note that we go into some length to not read the slack here. This is due to an early bug in TSK which should be fixed by now.
We want to list all the blocks that a file allocates (kind of like istat).
Step 1: get an IMG_INFO object (url can be any URL that AFF4 can handle)
img = pytsk3.Img_Info(url)
Step 2: Open the file system
fs = pytsk3.FS_Info(img)
Step 3: Open the file using the inode
f = fs.open_meta(inode = inode)
Step 4: List all blocks allocated by this file. Note that in some file systems each file has several attributes and each can allocates multiple blocks. So we really need to iterate over all attributes of each file:
for attr in f:
print("Attribute %s, type %s, id %s" % (attr.info.name,
attr.info.type,
attr.info.id))
for run in attr:
print(" Blocks %s to %s (%s blocks)" % (run.addr, run.addr + run.len, run.len))
Example output:
Attribute N/A, type TSK_FS_ATTR_TYPE_NTFS_SI, id 0
Attribute N/A, type TSK_FS_ATTR_TYPE_NTFS_FNAME, id 3
Attribute N/A, type TSK_FS_ATTR_TYPE_NTFS_FNAME, id 2
Attribute $Data, type TSK_FS_ATTR_TYPE_NTFS_DATA, id 4
Blocks 89471 to 89477 (6 blocks)
Blocks 89487 to 89493 (6 blocks)
Blocks 90023 to 90076 (53 blocks)
Sometimes we want to use image formats that are not available to TSK natively. We have seen that in order to obtain the FS_Info object, we must supply it with a valid Img_Info object. It is possible to extend TSK's support for different image formats by creating a different Img_Info object that TSK can use when opening a file system on it.
The Python wrappers are fully extensible. For example, the following implements an AFF4 image class:
## This is the AFF4 resolver we will use
oracle = pyaff4.Resolver()
class AFF4ImgInfo(pytsk3.Img_Info):
def __init__(self, url):
## Open the image using the AFF4 library
urn = pyaff4.RDFURN(url)
self.fd = oracle.open(urn, 'r')
if not self.fd:
raise IOError("Unable to open %s" % url)
## Call the base class with an empty URL
pytsk3.Img_Info.__init__(self, '')
def get_size(self):
""" This function returns the size of the image """
return self.fd.size.value
def read(self, off, length):
""" This returns byte ranges from the image """
self.fd.seek(off)
return self.fd.read(length)
def close(self):
""" This is called when we want to close the image """
self.fd.close()
Step 1: get an IMG_INFO object (url can be any URL that AFF4 can handle)
img = AFF4ImgInfo(url)
Step 2: Open the file system
fs = pytsk3.FS_Info(img, offset=options.offset)
...
As can be seen an Img_Info class simply must define the read and get_size methods to be a fully functional Img_Info. We then instantiate this object, and pass it to FS_Info which automatically uses the Python implementation to access the image.
In this way we can provide the SleuthKit with a virtualized image format, allowing for multiple format support.
pytsk3 (as of February 17, 2014) comes with 2 different version numbers:
- the SleuthKit version it was built against;
- the version of the pytsk3 code.
To obtain the SleuthKit version:
import pytsk3
print(pytsk3.TSK_VERSION_STR)
To obtain the pytsk3 version:
import pytsk3
print(pytsk3.get_version())
This is an example of how to replicate the mmls functionality.
import pytsk3
img = pytsk3.Img_Info(url="/path/to/image/file")
volume = pytsk3.Volume_Info(img)
for part in volume:
print(part.addr, part.desc.decode('utf-8'), part.start, part.len)
Note: at the moment the volume object is an iterator object that will change state if accessed
To retain the state e.g. in a nested construction use:
list(volume)
Note: that part.desc contains an UTF-8 encoded byte stream.
To access a file system via pytsk3 run:
import pytsk3
img = pytsk3.Img_Info(url="/path/to/image/file")
fs = pytsk3.FS_Info(img)
When you try to access a file system make sure that the image object (img) contains a file system. If not you can pass pytsk3.FS_Info a byte offset of where the volume containing the file system starts, e.g.
fs_offset = 2048 * 512
fs = pytsk3.FS_Info(img, offset=fs_offset)
The term "file entry" is used to indicates any "entry" the file system can define e.g. a file, a directory, a symbolic link, etc.
To access a specific file entry within a file system by path:
file_entry = fs.open("/Windows/MyFile.txt")
A file entry can also be accessed based on its "inode":
file_entry = fs.open_meta(inode=15)
Note: that the term inode applied to the abstraction the SleuthKit provides, which can also apply to file systems that do not define inodes e.g. NTFS.
Note that there is a difference between open and open_meta, from: http://www.sleuthkit.org/sleuthkit/docs/api-docs/fspage.html
The tsk_fs_file_open_meta() function takes a metadata address as an argument and returns a TSK_FS_FILE structure. The TSK_FS_FILE::name pointer will be NULL because the file name was not used to open the file and, for efficiency, TSK does not search the directory tree to locate the file name that points to the metadata address.
The file entry type is stored in the attribute:
file_entry.info.meta.type
Note that not every file entry necessarily has the .info or .info.meta property.
This attribute contains a value of pytsk3.TSK_FS_META_TYPE_ENUM, e.g. to determine if a file entry is a "regular" file.
if file_entry.info.meta.type == pytsk3.TSK_FS_META_TYPE_REG:
print("A file")
else:
print("Not a file")
The file entry address is stored in the attribute:
file_entry.info.meta.addr
Note that not every file entry necessarily has the .info or .info.meta property.
The address is also referred to as the inode by the SleuthKit.
The timestamp values contain the number of seconds since January 1, 1970 00:00:00, which often are normalized to UTC but not guaranteed. TODO check if the SleuthKit has an option to pass a timezone.
The file entry access time is stored in the attribute:
file_entry.info.meta.atime
The file entry change time (or entry modification time) is stored in the attribute:
file_entry.info.meta.ctime
The file entry modification time is stored in the attribute:
file_entry.info.meta.mtime
The file entry creation time (or birth time) is stored in the attribute:
file_entry.info.meta.crtime
For file systems that provide a larger granularity there is a nano-attribute available e.g. for the atime:
file_entry.info.meta.atime_nano
Note that in SleuthKit 4.2.0 the granularity of the nano-attribute was granularity from 100 nano seconds to 1 nano seconds.
nano_value = file_entry.info.meta.atime_nano
if pytsk3.TSK_VERSION_NUM < 0x040200ff:
nano_value *= 100
Note that the SleuthKit can have conversions issues with timestamps outside the range 1970 - 2038 for NTFS.
Also see: https://github.com/sleuthkit/sleuthkit/pull/323
for directory_entry in fs.open_dir(path="/Windows"):
directory_entry = directory_entry.info.name.name
try:
print(directory_entry.decode("utf8"))
except UnicodeError:
pass
Note that not every directory entry necessarily has the .info or .info.name property.
Note that not every directory entry is allocated. Unallocated directory entries either have no .info.meta member or the .info.name.flags member has TSK_FS_NAME_FLAG_UNALLOC set.
Note that (at the moment January 19, 2014) directory_entry.info.name.name contains a UTF-8 formatted binary string.
Note that this will include the directory entries "." (self), ".." (parent) and entries recovered by the SleuthKit.
Note that the SleuthKit will also expose virtual directories like "/$OrphanFiles"
ads_attribute = None
for attribute in file_entry:
if attribute.info.name == name_ads:
ads_attribute = attribute
break
if ads_attribute:
file_entry.read_random(
offset, size, ads_attribute.info.type, ads_attribute.info.id)
The underlying behavior of the SleuthKit for file_entry.read_random() seems to be to read the first available $DATA NTFS attribute in case no default (nameless) $DATA NTFS attribute is available e.g. $Extend$UsnJrnl.
The file entry symbolic link is stored in the attribute:
file_entry.info.meta.link
It appears that the SleuthKit does not expose NTFS IO_REPARSE_TAG_MOUNT_POINT or IO_REPARSE_TAG_SYMLINK as a link.
import pytsk3
help(pytsk3)
help(pytsk3.Img_Info)
help(pytsk3.Volume_Info)
help(pytsk3.FS_Info)