-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDB support #17
Comments
I need PDB support. |
Would the |
I will look into it. |
Waiting for getsentry/pdb#27 |
Has anything improved on this front? |
@novacrazy I still have no idea how PDB works and I didn't have time to investigate it. Patches are welcome. |
is your pdb dumper from llvm ? |
@serak probably. I'm simply running |
Might be a bit offtopic, but is there a reason for not supporting Windows even if I want to analyze an ELF binary? I wanted to use it in an embedded project that cross-compiles a binary for ARM, but it seems like a no-go. |
@albru123 This can be implemented. |
@albru123 Done. |
MinGW target is supported now. MSVC is still not possible, because PDB isn't documented and I have no idea how to retrieve the symbol sizes. |
I believe the officially supported way to examine PDB contents is through DIA: https://docs.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/debug-interface-access-sdk |
@smmalis37 The problem is not to parse/query PDB, but to figure out what symbols are actually required and how to calculate their sizes. Basically, I have to figure out how to split the |
I just kind of stumbled on this ticket, but have a lot of experience with PE files and debugging APIs in windows. The DIA Symbol interface should have the information you would need, unless I mis-understanding this ticket. For example, here is the COM interface for a symbol in DIA. It has a lot of information about the symbol, including the virtual address offset, as well as the length in bits/bytes. You can even traverse a symbol and visit its children which might be useful for your scenario, if I understand it correctly. Get-VA method (To find the symbol in the .text section of the binary). Get length of symbol in bits/bytes |
@johnhe-indeed Hi! If I understand correctly, this is a C++ API to work with PDB files. So to use it in this project I have to write a bindings first, which complicates things a bit. And I'm not sure if you can built it directly, without installing the SDK. I will try to write a demo app in C++ first, but I prefer a pure rust implementation. |
Hey @RazrFalcon, it's actually a COM api. It can be called from C or C++ and it wouldn't require installing anything afaik. The objects should be present on the default install of windows. The winapi rust crate does provide wrappers to make calling COM objects easier, so that may be useful and would avoid having to write bindings to C/C++ directly. Additionally, the WinAPI crate also has bindings to the dbghelp.dll, which could be an alternative approach. The COM APIs just use this DLL under the hood, iirc. I may take a stab at this, if you think it would be useful for your project (Or I may just contribute to the PDB Crate, depending on what makes more sense after I conduct more research). I'm interested in learning how to call COM/WinAPIs from rust for another project and this would be good practice. |
@johnhe-indeed I would not mind some help. I have a prototype that uses the The task itself is pretty straightforward: split the
|
Okay, I spent a few hours this morning playing with this. I ended up just using the PDB rust library, mentioned earlier in this thread. It should meet all your requirements and is cross-platform. I used the latest main branch, but this should work using 0.5.0, with slightly different type names. The sample, pdb_symbols has all the parts you really need to make this work. Couple of notes, you'll want to walk the private modules dictionary to get the information you're after with the procedure symbols. In the sample, he walks them using this snippet:
walk_symbols just dumps the symbols from the modules. For procedures it's pretty straight forward to get the size, and offset from the section. Note, the symbol may be in a different section, you will need to validate that with section number on the symbol is pointing to symbols .text. I dont know where that map is off the top of my head, but this library seems to have all the pieces. In most cases symbols will be .text, but I think exception handlers are put in .rdata. To get the offset of the various sections such as .text in the actual .exe binary, I saw a PE header def in the library. The actual bytes for the symbol will be the section offset + offset in symbol. See below:
I did a bunch of checks as well to validate the offsets and lengths using other tools. To validate the length was correct, I just loaded the binary in windbg with the symbols, did an x /f <program_name>!, which prints the same length in hex and matches the symbols dumped. To validate the offset, I opened the .exe in IDA Pro, navigated to the function's bytes, and pulled enough of the function bytes to create a unique signature. I then went into 010 Hex editor, searched for the function bytes, got the raw offset inside the file. I also ran the PE header template to get the .text offset. Then it was just adding to the offset generated by the PDB rust library and text section offset and it matched the function bytes at the expected offset. Hope this helps! |
Yeah, this is basically how it works right now. I do filter by section, etc. The hardest part is to actually write and debug it. Also, PDB stores demangled function names, which is problematic, because I have to parse them again and they can have different versions (legacy or v0). I'll push a PDB branch later, so you can play with an actual code. PS: I don't need absolute offset. |
@RazrFalcon in the case of an unsupported format, could this be checked prior to running |
I don't thinks so. I don't know what cargo would be building beforehand. Unless we, somehow, can check what target will be used. |
@johnhe-indeed Any update on this? :) |
Any updates? |
Unless someone will have time for a pull request, this will stay the same. I simply don't have time working on it. |
Okay, what would have to be implemented? I'll be more than happy to integrate this. |
This was already discussed in this issues, see: #17 (comment) The task is "simple": get a list of all functions and their sizes from the binary. |
Cool! I'll use the PDB crate. Hope I can get this working! 🎉 |
Hi @Milo123459 , any update on your progress using the |
I tried. It's hard |
Hi, I wrote a test program to generate (length, demangled_name, Option(mangled_name)) tuples using the pdb crate. I also tried the I cannot find unmangled names for all symbols, since to find them I'm trying to find a matching PublicSymbol record with the unmangled name, and it seems rustc is not generating those for all functions. Here's the output for cargo-bloat itself compiled in debug mode to get an idea of the ratios:
And in release mode:
code: use pdb::FallibleIterator;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let dir = std::path::Path::new("D:\\your\\path\\to\\pdb\\folder");
let file_name = "cargo-bloat";
let exe_path = dir.join(file_name).with_extension("exe");
let exe_size = std::fs::metadata(&exe_path)?.len();
let (_, text_size) = binfarce::pe::parse(&std::fs::read(&exe_path).unwrap())?.symbols()?;
let pdb_path = dir.join(file_name.replace("-", "_")).with_extension("pdb");
let file = std::fs::File::open(&pdb_path)?;
let mut pdb = pdb::PDB::open(file)?;
let dbi = pdb.debug_information()?;
let symbol_table = pdb.global_symbols()?;
let mut total_parsed_size = 0usize;
let mut demangled_total_parsed_size = 0usize;
let mut out_symbols = vec![];
// Collect the PublicSymbols
let mut public_symbols = vec![];
let mut symbols = symbol_table.iter();
while let Ok(Some(symbol)) = symbols.next() {
match symbol.parse() {
Ok(pdb::SymbolData::Public(data)) => {
if data.code || data.function {
public_symbols.push((data.offset, data.name.to_string().into_owned()));
}
if data.name.to_string().contains("try_small_punycode_decode") {
dbg!(&data);
}
}
_ => {}
}
}
let mut modules = dbi.modules()?;
while let Some(module) = modules.next()? {
let info = match pdb.module_info(&module)? {
Some(info) => info,
None => continue,
};
let mut symbols = info.symbols()?;
while let Some(symbol) = symbols.next()? {
if let Ok(pdb::SymbolData::Public(data)) = symbol.parse() {
if data.code || data.function {
public_symbols.push((data.offset, data.name.to_string().into_owned()));
}
if data.name.to_string().contains("try_small_punycode_decode") {
dbg!(&data);
}
}
}
}
let cmp_offsets = |a: &pdb::PdbInternalSectionOffset, b: &pdb::PdbInternalSectionOffset| {
a.section.cmp(&b.section).then(a.offset.cmp(&b.offset))
};
public_symbols.sort_unstable_by(|a, b| cmp_offsets(&a.0, &b.0));
// Now find the Procedure symbols in all modules
// and if possible the matching PublicSymbol record with the mangled name
let mut handle_proc = |proc: pdb::ProcedureSymbol| {
let mangled_symbol = public_symbols
.binary_search_by(|probe| {
let low = cmp_offsets(&probe.0, &proc.offset);
let high = cmp_offsets(&probe.0, &(proc.offset + proc.len));
use std::cmp::Ordering::*;
match (low, high) {
// Less than the low bound -> less
(Less, _) => Less,
// More than the high bound -> greater
(_, Greater) => Greater,
_ => Equal,
}
})
.ok()
.map(|x| &public_symbols[x]);
// Uncomment to verify binary search isn't screwing up anything
/*
let mangled_symbol = public_symbols
.iter()
.filter(|probe| probe.0 >= proc.offset && probe.0 <= (proc.offset + proc.len))
.take(1)
.next();
*/
let demangled_name = proc.name.to_string().into_owned();
out_symbols.push((proc.len as usize, demangled_name, mangled_symbol));
total_parsed_size += proc.len as usize;
if mangled_symbol.is_some() {
demangled_total_parsed_size += proc.len as usize;
}
};
let mut symbols = symbol_table.iter();
while let Ok(Some(symbol)) = symbols.next() {
if let Ok(pdb::SymbolData::Procedure(proc)) = symbol.parse() {
handle_proc(proc);
}
}
let mut modules = dbi.modules()?;
while let Some(module) = modules.next()? {
let info = match pdb.module_info(&module)? {
Some(info) => info,
None => continue,
};
let mut symbols = info.symbols()?;
while let Some(symbol) = symbols.next()? {
if let Ok(pdb::SymbolData::Procedure(proc)) = symbol.parse() {
handle_proc(proc);
}
}
}
println!(
"exe size:{}\ntext size:{}\nsize of fns found: {}\nratio:{}\nsize of fns with mangles found: {}\nratio:{}",
exe_size,
text_size,
total_parsed_size,
total_parsed_size as f32 / text_size as f32,
demangled_total_parsed_size,
demangled_total_parsed_size as f32 / text_size as f32
);
/*
// Test for the symbolic crate. It turns out it prints gives all the function names unmangled
// which is the opposite of what we need
use symbolic::common::{ByteView, Language, Name, NameMangling};
use symbolic::debuginfo::{Function, Object};
use symbolic::demangle::{Demangle, DemangleOptions};
let view = ByteView::open(&pdb_path).expect("failed to open file");
let object = Object::parse(&view).expect("failed to parse file");
let session = object.debug_session().expect("failed to process file");
for function in session.functions() {
if let Ok(function) = function {
if function.name.mangling() == symbolic::common::NameMangling::Unmangled {
dbg!(function);
}
}
}
*/
Ok(())
} I might try to add this to Related: getsentry/pdb#107 |
Is there a way to run cargo-bloat while developing? I'm getting "Error: can be run only via |
I just commented the eror lines. I got cargo-bloat to output this on itself: D:\dev\cargo-bloat [master ≡ +0 ~3 -0 !]> cargo run ; cargo run --release -- --release
Finished dev [unoptimized + debuginfo] target(s) in 0.02s
Running `target\debug\cargo-bloat.exe`
Finished dev [unoptimized + debuginfo] target(s) in 0.02s
[src\main.rs:912] &pdb_path = "D:\\dev\\cargo-bloat\\target\\debug\\cargo_bloat.pdb"
Analyzing target\debug\cargo-bloat.exe
File .text Size Crate Name
1.1% 1.4% 28.1KiB json json::parser::Parser::parse
0.5% 0.7% 13.2KiB json json::util::print_dec::write
0.5% 0.6% 11.7KiB json json::codegen::Generator::write_json
0.3% 0.4% 8.0KiB pdb <pdb::symbol::SymbolData as scroll::ctx::TryFromCtx>::try_from_ctx
0.3% 0.4% 7.0KiB regex_syntax <regex_syntax::hir::translate::TranslatorI as regex_syntax::ast::visitor::Visitor>::visit_class_set_item_post
0.3% 0.3% 6.6KiB regex_syntax <regex_syntax::hir::translate::TranslatorI as regex_syntax::ast::visitor::Visitor>::visit_post
0.2% 0.3% 5.3KiB regex regex::exec::ExecBuilder::build
0.2% 0.3% 5.3KiB std core::num::flt2dec::strategy::dragon::format_shortest
0.2% 0.2% 4.7KiB regex_syntax alloc::str::join_generic_copy
0.2% 0.2% 4.5KiB binfarce <binfarce::demangle::legacy::Demangle as core::fmt::Display>::fmt
0.2% 0.2% 4.4KiB binfarce binfarce::pe::Pe::symbols
0.2% 0.2% 4.4KiB std core::num::flt2dec::strategy::dragon::format_exact
0.2% 0.2% 4.0KiB json json::object::Object::insert_index
0.1% 0.2% 3.7KiB pdb <pdb::common::Error as core::fmt::Debug>::fmt
0.1% 0.2% 3.5KiB regex_syntax <regex_syntax::error::Formatter<E> as core::fmt::Display>::fmt
0.1% 0.2% 3.5KiB regex_syntax <regex_syntax::error::Formatter<E> as core::fmt::Display>::fmt
0.1% 0.2% 3.5KiB binfarce binfarce::elf64::Elf64::symbols
0.1% 0.2% 3.4KiB binfarce binfarce::macho::Macho::find_section
0.1% 0.2% 3.3KiB std rustc_demangle::demangle
0.1% 0.2% 3.2KiB cargo_bloat? <cargo_bloat::table::Table as core::fmt::Display>::fmt
47.3% 59.3% 1.1MiB And 8776 smaller methods. Use -n N to show more.
79.6% 100.0% 1.9MiB .text section size, the file size is 2.4MiB
Finished release [optimized] target(s) in 0.02s
Running `target\release\cargo-bloat.exe --release`
Finished release [optimized] target(s) in 0.02s
[src\main.rs:912] &pdb_path = "D:\\dev\\cargo-bloat\\target\\release\\cargo_bloat.pdb"
Analyzing target\release\cargo-bloat.exe
File .text Size Crate Name
0.6% 0.7% 5.3KiB std core::num::flt2dec::strategy::dragon::format_shortest
0.5% 0.6% 4.4KiB std core::num::flt2dec::strategy::dragon::format_exact
0.3% 0.4% 3.3KiB std rustc_demangle::demangle
0.3% 0.4% 3.1KiB std <rustc_demangle::legacy::Demangle as core::fmt::Display>::fmt
0.3% 0.4% 3.0KiB std std::process::Child::wait_with_output
0.3% 0.3% 2.5KiB std <std::env::Args as core::iter::traits::iterator::Iterator>::next
0.2% 0.3% 2.3KiB std core::num::flt2dec::strategy::grisu::format_shortest_opt
0.2% 0.2% 1.5KiB std <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
0.1% 0.2% 1.4KiB std core::str::pattern::StrSearcher::new
0.1% 0.2% 1.3KiB std core::fmt::Formatter::pad_integral
0.1% 0.2% 1.2KiB std core::num::flt2dec::strategy::grisu::format_exact_opt
0.1% 0.2% 1.2KiB std std::path::Path::components
0.1% 0.2% 1.2KiB std <rustc_demangle::v0::Ident as core::fmt::Display>::fmt
0.1% 0.2% 1.2KiB std core::fmt::Formatter::pad
0.1% 0.2% 1.2KiB std <str as core::fmt::Debug>::fmt
0.1% 0.2% 1.2KiB std core::str::slice_error_fail
0.1% 0.1% 1.1KiB std core::num::bignum::Big32x40::mul_pow2
0.1% 0.1% 1.1KiB std <std::path::Components as core::iter::traits::iterator::Iterator>::next
0.1% 0.1% 1.0KiB std <std::io::error::Error as core::fmt::Display>::fmt
0.1% 0.1% 1.0KiB std <core::str::lossy::Utf8LossyChunksIter as core::iter::traits::iterator::Iterator>::next
5.3% 6.8% 51.0KiB And 260 smaller methods. Use -n N to show more.
78.3% 100.0% 748.0KiB .text section size, the file size is 955.5KiB |
Sent a pull request, here's the current output on cargo-bloat itself:
|
It appears like goblin has support for PE files, so could windows support be added?
The text was updated successfully, but these errors were encountered: