Comments (7)
Here are the test binary and my libc: test-binaries.zip. I get the same behavior with the Rust version.
from blazesym.
Should be fixed now @TimPushkin. Let me know if you see any more issues. Thanks again for the report!
from blazesym.
So @osandov helped with better understanding the original issue (thanks!). Basically, the library is stripped and code in question (an address of which is contained in the backtrace) belongs to such a stripped symbol. __libc_init_first
just happens to be a dynsym and, thus, cannot be stripped.
As a result, strictly speaking not symbolizing is correct: there is no matching symbol. However, we do report the offset from the start of the address and so it's not truly a misreporting (but still somewhat arbitrary) -- and in any case most of the symbolization is best effort. I also vaguely recall to have seen this behavior elsewhere in the wild (though I can't point me finger at where exactly).
Overall I still leaning towards reporting the symbol as we currently do (in part because, as elf(5)
states, sizes may not be exact (though only when zero)), but perhaps we can improve documentation to point out this possibility better and we could consider also exposing the symbol's size. With size and offset available, users could clearly tell whether it was a perfect match or not.
from blazesym.
Thanks for the detailed report! Will take a look next week.
from blazesym.
So I looked at it a bit and unfortunately cannot seem to reproduce. I did come up with some exhaustive symbol lookup test and that suggests that the core algorithm works -- at least for the test binary I tested on. Would you be able to provide:
- the generated test binary
- a copy of your
glibc
(which should contain the__libc_init_first
symbol)
?
I'll do some more digging, but that will hopefully help me better understand what could be going on.
from blazesym.
Also, please use the Rust version of profile
if you can, as it has better error reporting.
from blazesym.
Thanks!
Okay, this is an interesting one. With 1b68789 we honor the size of the ELF symbol to decide whether or not it matches. Back then I believed -- and I still do -- that it was wrong not to do so (which was what the original algorithm did).
In this instance, the size of __libc_init_first
is five bytes. That's what blazesym
infers and it's also what readelf
reports:
Num: Value Size Type Bind Vis Ndx Name
[...]
660: 0000000000029d00 5 FUNC GLOBAL DEFAULT 15 __libc_init_first@@GLIBC_2.2.5
[...]
If we look at the successful symbolization you pasted, we see:
1 [<00007f3953829d90>] __libc_init_first+0x7f3953800090
Here, the normalized address (or file address) is:
>>> hex(0x00007f3953829d90 - 0x7f3953800090)
'0x29d00'
(that matches the Value
column entry as it should)
But, we have to keep in mind that the instruction in the backtrace is at some offset into the function. In our case this offset is 0x90 (the 0x7f3953800000 part is from relocation, randomization, and whatever else is happening inside the process, and can just be masked out). 0x90 is clearly larger than 5. So that's why we don't symbolize: there is no matching symbol based on the ELF information.
We can reproduce with:
#[test]
fn lookup_libc_first() {
let bin_name = Path::new("/tmp/test/libc.so.6");
let parser = ElfParser::open(bin_name.as_ref()).unwrap();
let addr = 0x29d00;
let (found_sym, _found_addr) = parser.find_sym(addr, STT_FUNC).unwrap().unwrap();
assert_eq!(found_sym, "__libc_init_first");
let addr = 0x29d90;
let (found_sym, _found_addr) = parser.find_sym(addr, STT_FUNC).unwrap().unwrap(); // <- panics
assert_eq!(found_sym, "__libc_init_first");
}
If we apply:
--- src/elf/parser.rs
+++ src/elf/parser.rs
@@ -427,9 +427,7 @@ impl ElfParser {
}
let addr = addr as u64;
- if (sym.st_size == 0 && sym.st_value == addr)
- || (sym.st_size != 0
- && (sym.st_value..sym.st_value + sym.st_size).contains(&addr))
+ if true
{
let name = match symbol_name(strtab, sym) {
Ok(name) => name,
Then the test passes.
Interestingly, if we look at the objdump
, the function does seem to be larger:
0000000000029d00 <__libc_init_first@@GLIBC_2.2.5>:
29d00: f3 0f 1e fa endbr64
29d04: c3 ret
29d05: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
29d0c: 00 00 00
29d0f: 90 nop
29d10: 50 push %rax
29d11: 58 pop %rax
29d12: 48 81 ec 98 00 00 00 sub $0x98,%rsp
29d19: 48 89 7c 24 08 mov %rdi,0x8(%rsp)
29d1e: 48 8d 7c 24 20 lea 0x20(%rsp),%rdi
29d23: 89 74 24 14 mov %esi,0x14(%rsp)
29d27: 48 89 54 24 18 mov %rdx,0x18(%rsp)
29d2c: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
29d33: 00 00
29d35: 48 89 84 24 88 00 00 mov %rax,0x88(%rsp)
29d3c: 00
29d3d: 31 c0 xor %eax,%eax
29d3f: e8 9c 84 01 00 call 421e0 <_setjmp@@GLIBC_2.2.5>
29d44: f3 0f 1e fa endbr64
29d48: 85 c0 test %eax,%eax
29d4a: 75 4b jne 29d97 <__libc_init_first@@GLIBC_2.2.5+0x97>
29d4c: 64 48 8b 04 25 00 03 mov %fs:0x300,%rax
29d53: 00 00
29d55: 48 89 44 24 68 mov %rax,0x68(%rsp)
29d5a: 64 48 8b 04 25 f8 02 mov %fs:0x2f8,%rax
29d61: 00 00
29d63: 48 89 44 24 70 mov %rax,0x70(%rsp)
29d68: 48 8d 44 24 20 lea 0x20(%rsp),%rax
29d6d: 64 48 89 04 25 00 03 mov %rax,%fs:0x300
29d74: 00 00
29d76: 48 8b 05 3b f2 1e 00 mov 0x1ef23b(%rip),%rax # 218fb8 <__environ@@GLIBC_2.2.5-0x8248>
29d7d: 8b 7c 24 14 mov 0x14(%rsp),%edi
29d81: 48 8b 74 24 18 mov 0x18(%rsp),%rsi
29d86: 48 8b 10 mov (%rax),%rdx
29d89: 48 8b 44 24 08 mov 0x8(%rsp),%rax
29d8e: ff d0 call *%rax <-------------- this is the frame being recorded
29d90: 89 c7 mov %eax,%edi
29d92: e8 59 b8 01 00 call 455f0 <exit@@GLIBC_2.2.5>
29d97: e8 d4 78 06 00 call 91670 <__pthread_get_minstack@@GLIBC_PRIVATE+0x40>
29d9c: f0 ff 0d 05 f5 1e 00 lock decl 0x1ef505(%rip) # 2192a8 <__nptl_nthreads@@GLIBC_PRIVATE>
29da3: 0f 94 c0 sete %al
29da6: 84 c0 test %al,%al
29da8: 75 0e jne 29db8 <__libc_init_first@@GLIBC_2.2.5+0xb8>
29daa: ba 3c 00 00 00 mov $0x3c,%edx
29daf: 90 nop
29db0: 31 ff xor %edi,%edi
29db2: 89 d0 mov %edx,%eax
29db4: 0f 05 syscall
29db6: eb f8 jmp 29db0 <__libc_init_first@@GLIBC_2.2.5+0xb0>
29db8: 31 ff xor %edi,%edi
29dba: eb d6 jmp 29d92 <__libc_init_first@@GLIBC_2.2.5+0x92>
29dbc: 0f 1f 40 00 nopl 0x0(%rax)
I haven't found anything in elf(5)
that could explain what is going on. So based on that I am inclined to say that the ELF symbol's st_size
is set to the wrong value. But...it's conceivable that 5
is the correct value at compile time. After all, if we look at address 29d0c
in the above dump it seems kind of bogus: a bunch of null bytes. So perhaps there is some patching happening at runtime. I briefly looked at https://github.com/bminor/glibc/blob/master/csu/init-first.c and didn't see anything standing out, but that's not ruling out anything.
Why are we seeing objdump
report more stuff? Well, it may not care about ELF symbol size: it may just disassemble starting at section boundary and keep going, marking addresses by their name as addresses match ELF symbols.
But back to blazesym
: ELF generation bug or not, we may want to err on the side of reporting more, especially in cases where, according to ELF, an address does not map to any function, but is lying between two symbols in the file. So I think we can do something along the lines of:
--- src/elf/parser.rs
+++ src/elf/parser.rs
@@ -378,26 +378,31 @@ impl ElfParser {
match find_match_or_lower_bound_by_key(symtab, addr, |sym| sym.st_value as Addr) {
None => Ok(None),
- Some(idx) => symtab[idx..]
- .iter()
- .find_map(|sym| {
+ Some(idx) => {
+ let mut syms = symtab[idx..].iter().peekable();
+
+ while let Some(sym) = syms.next() {
if sym.st_shndx == SHN_UNDEF || sym.type_() != st_type {
- return None
+ continue
}
let addr = addr as u64;
if sym.contains(addr) {
- let name = match symbol_name(strtab, sym) {
- Ok(name) => name,
- Err(err) => return Some(Err(err)),
- };
+ let name = symbol_name(strtab, sym)?;
let addr = sym.st_value as Addr;
- Some(Ok((name, addr)))
+ return Ok(Some((name, addr)))
} else {
- None
+ if let Some(next_sym) = syms.peek() {
+ if !next_sym.contains(addr) {
+ let name = symbol_name(strtab, sym)?;
+ let addr = sym.st_value as Addr;
+ return Ok(Some((name, addr)))
+ }
+ }
}
- })
- .transpose(),
+ }
+ Ok(None)
+ }
}
}
Which results in symbolization of 0x29d90
and hopefully won't cause reporting of false positives.
from blazesym.
Related Issues (20)
- Support pre-populating caches (pre-parsing data structures etc.)
- Cache demangling step HOT 5
- Failed to build on arm64 platform HOT 2
- Unable to get function name from c++ binary HOT 2
- Reduce number of (failed) file lookups HOT 6
- Unprivileged normalize API HOT 5
- Supported OSes HOT 1
- crash when call api blaze_symbolize_kernel_abs_addrs and blaze_symbolize_process_abs_addrs HOT 16
- Are there any plans to support parsing local variables? HOT 1
- Error symbolizing Go binaries HOT 11
- Consider switching to `goblin` for ELF support HOT 1
- C API for interfacing with traces?
- Remove optional `circular` dependency
- tracking issue: `0.2` release (stable) HOT 1
- how can I get elf type using c api HOT 4
- Issue symbolizing Android stack addresses HOT 9
- aarch64-linux-android ld error
- Gnu debug link CRC read failures HOT 10
- Fully support handling of kernel addresses HOT 4
- `Normalizer::normalize_user_addrs_opts` fails in a Docker container HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from blazesym.