Git Product home page Git Product logo

Comments (14)

mumbel avatar mumbel commented on June 19, 2024 3

No comment on SP, but I do have some suggestions which might improve your diassembler and decompiler output.

  • All your jumps are inverted logic, this adds an unnecessary goto to the instruction's PCODE.
  • off in the jumps could be a constructor to remove the temp calculation in every instruction and more info in disassembly: example in your screenshot JGT R1,R2,0x71 becomes JGT R1,R2,LBB0_11
:JEQ dst, imm, off   is imm & off & dst & op_alu_jmp_opcode=0x1 & op_alu_jmp_source=0 & op_insn_class=0x5 {	
	temp:8=inst_next + off * 8;  
	if (dst!=imm) goto <label>;
	goto [temp];	
	<label>	   
}
joff: reloc is off [ reloc = inst_next + off * 8; ] { export *:8 reloc; }

:JEQ dst, imm, joff   is imm & joff & dst & op_alu_jmp_opcode=0x1 & op_alu_jmp_source=0 & op_insn_class=0x5 {	
	if (dst==imm) goto joff;
}

from ebpf-for-ghidra.

emteere avatar emteere commented on June 19, 2024 1

The main issue is the dataflow analysis isn't the same in the decompiler for CALLOTHER user pcodeops, and CALL pcodeop. By specifying all the calls with pcodeop's in the sleigh processor you will get the name in the decompilation, but not the good data flow. However you will still need to apply good function signatures with types, which isn't easily done of CALLOTHER pcodeops until 9.1 (current master) is released. 9.1 allows overriding the CALLOTHER to any other place by putting a CALL_OTHER_OVERRIDE reference to any other address. The decompiler will then pickup the function signature from that location.

The following should fix your issue in the current 9.1, without doing anything fancy with the new call-override references. There are other ways to accomplish this, but this is probably the easiest.

Create a eBPFHelper syscall address space, and then make calls directly to it. You can allocate the space in the .pspec file. Then at each location you can create a 1 byte function that has the correct function signature. The issue with this is you get bad bookmarks because of the disassembly, so I chose this as the lesser of evils. We should probably change the code so that you can have references to an uninitialized space that doesn't cause disassembly errors (next release...)

The calling convention you have already specified will allow the decompiler to figure out the parameters and return.

The reason you need the latest 9.1 or master release is to get the better pointer aliasing. The stack variable is initialized, but the decompiler won't see that it could be changed by one of the calls that takes a pointer to the initialized stack in 9.0.4. The decompiler has an improved value-set analysis for the stack in 9.1.

The other issue is in the .pspec, since you are loading an ELF, normally the .pspec symbols don't get loaded, so when you import an elf, you have to check the in the loader options to apply processor specific symbols. This is hard to remember to do, so we either need to allow the default to be overriden, or provide another mechanism to force the processor specific symbols, possibly a tag on the symbol entries themselves. The symbols could get applied in another manner, but I wanted to get you this mechanism since it has been a while since you posed the question.

I looked around for good header file definitions for the functions, but didn't find some easily parse-able ones. Either a pre-parsed .gdt file, or a header file that is parsed at runtime with the CParser could be added. A header file parsed is probably better in the long run because it is easy to change and human readable.

I also changed the calling convention slightly.
eBPF.cspec.txt
eBPF.pspec.txt
eBPF.sinc.txt

from ebpf-for-ghidra.

Nalen98 avatar Nalen98 commented on June 19, 2024 1

Done!
image

from ebpf-for-ghidra.

Nalen98 avatar Nalen98 commented on June 19, 2024

Thank you, your comment is correct and competent. I will create new commit and fix all jumps in few days.
And it's interesting to solve the problem with the stack.

from ebpf-for-ghidra.

emteere avatar emteere commented on June 19, 2024

Can you give me an example that doesn't decompile?

Are you using stock 9.0.4, or are you using Ghidra main dev branch? There have been some quite significant changes made to the decompiler.

Since your functions don't appear to manipulate the stack depth, changing register R11, the stack depth field would always display zero. I do see passing of pointers on the stack, where the pointer "allocates" memory on the stack by assigning a parameter variable from the value in R10, and subtracting some value, but the stack is never changed here (R10 or R11).

The same thing would occur in a program that never changed the RSP, but assigned other registers to the value of RSP and used offsets from it.

There are other languages where functions essentially have their own "stack" and can't access outside of their stack space. Java essentially has a local stack, and you can't access a global stack.

Be happy to take a look at the samples if you can point out locations that you expect different results, especially if the decompiler is not working on them.

from ebpf-for-ghidra.

Nalen98 avatar Nalen98 commented on June 19, 2024

@emteere, yes, R10 is unchangeable and the only register which can access the stack in eBPF program. But, as I said, calling it «stackpointer» is absolutely wrong but no choice.

I use 9.0.4 and also checked on the latest build of Ghidra. The same problem.

Take a look on this screenshot of eBPF program bpf_lxc.o. As you can see, STXDW (“store double word”) doesn’t decompile if using R10. Otherwise, all successful.

No decomp

from ebpf-for-ghidra.

emteere avatar emteere commented on June 19, 2024

I'll take a look at the example. Most likely a pcode intricacy that isn't readily apparent or something in the .cspec.

The processor is just a model, and for all intents and purposes that is a stack for the function.
There are ways to have pcode injected at the start of your functions, but I don't think it is worth the complexity here. That might change depending on inter function calls.

Are you getting all the parameters correct for your calls, or is there some sort of lookup scheme (as in the const-pool in Java bytecode)?

BTW, nice work on the processor. I haven't done a full review, but getting this far on it is great.

from ebpf-for-ghidra.

emteere avatar emteere commented on June 19, 2024

The code is actually decompiling correctly, the initializations are seen as setting local variables, and since the value is never seen as changed, the decompiler gets rid of it. That is also why if statements are disappearing because they are seen as dead code. The value they are testing that was initialized on the stack is never seen to change.

This is a pointer aliasing problem. I believe can be fixed, but I'm still poking at it.

from ebpf-for-ghidra.

Nalen98 avatar Nalen98 commented on June 19, 2024

BTW, nice work on the processor. I haven't done a full review, but getting this far on it is great.

Thank you.

Are you getting all the parameters correct for your calls, or is there some sort of lookup scheme (as in the const-pool in Java bytecode)?

Yes, helper calls work correctly. I created a small eBPF program without any in-kernel calls (only arithmetic tasks and access the stack) and i found the same situation when "stack-accessing" instructions don't decompile.

from ebpf-for-ghidra.

emteere avatar emteere commented on June 19, 2024

OK, I'll take a look. I believe I have a way you can implement this, and get datatype information into the in-kernel calls. I'll spec it out as soon as I get a chance and get it to you.

from ebpf-for-ghidra.

Nalen98 avatar Nalen98 commented on June 19, 2024

@emteere Thank you

from ebpf-for-ghidra.

Nalen98 avatar Nalen98 commented on June 19, 2024

@emteere, Thanks for support!
Problem with ghidra's deadcode is resolved, stack-accessing instructions are successfully decompiled.

I decided to go a bit straight through and use power of Ghidra api, so custom eBPFAnalyzer was written.

Now eBPF helpers are recognized as functions, the separate memory segment (syscall as you defined) is allocated for them, you can take a look at what signatures they have, the disassembler and decompiler got a number of nice changes.

HelpersAnalyzer

A little effort, and the new release will come out soon. Thanks to you!

from ebpf-for-ghidra.

emteere avatar emteere commented on June 19, 2024

Nice, and you are welcome. Will take a closer look.

Curious where you are storing and applying the data types and function signatures.

Sorry for the red bookmarks. That's a side-effect of the disassembler trying to be helpful with code that doesn't disassemble that is called. There are ways that I thought of to stop it, but it was more trouble than it was worth at the time and I wanted to get you a solution. I suppose your eBPFanalyzer could get rid of the disassembly errors, since it knows better that they are not issues.

from ebpf-for-ghidra.

Nalen98 avatar Nalen98 commented on June 19, 2024

Curious where you are storing and applying the data types and function signatures.

Analyzer parses Symbol Table (where specified "syscall" in location field). You prompted the body of pspec file, where the memory addresses for each function were defined. I rely on this.

Analyzer determines, which function is to this address, and creates cmds for applying, which contain type of the return value, the name of function and which arguments are used (their type, name, index, etc).

It would be nice to solve the problem with bad bookmarks and i think it can be solved.

from ebpf-for-ghidra.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.