Git Product home page Git Product logo

Comments (9)

justfoxing avatar justfoxing commented on June 19, 2024 1

from jfx_bridge.

justfoxing avatar justfoxing commented on June 19, 2024 1

Hah! Just enough for me to see what's going on. Looks like halfway through a message being received, a second message is jumping in. The unicode error is being caused by the \x00\x00\x00\xaa bytes of the second message size, and there's 0xaa bytes of second message JSON before the initial message resumes. I'm going to have to go hunt through the network dispatching code to see why that can happen...

from jfx_bridge.

justfoxing avatar justfoxing commented on June 19, 2024 1

Transferred this issue over to jfx-bridge (the underlying comms beneath ghidra_bridge), because I'm pretty sure that's where the problem is.

Here's a braindump of what I think has happened - you can skip down to the bottom for how to upgrade and hopefully fix the issue if you want, this is mostly for historical record.

The problem probably lies in the potential for messages being sent across the bridge to become interleaved - it wouldn't happen often, because most socket.send() calls will drop into native and dispatch the message in one hit, but for very large messages, there's the potential for it to only send part of the message before it returns back to python and loops around to send the rest. If there's another thread waiting with a message when that happens, and python decides to swap threads, the first message will be incomplete when the second message's (including its size header) gets put on the wire. Eventually, when control returns to the first thread, it'll finish sending its message, but the damage is already done.

On the receiving end, it'll see the first message's size header and try to read that many bytes - which will include reading the second message's size header and data, and lose some of the end of the first message. When this gets fed into a unicode decode it'll probably fail with invalid bytes when it hits the binary size header - even if it didn't somehow, the JSON structure would almost certainly be broken, so the json.loads() would fail in the next step.

I've addressed this by gating all the places where data gets written to the socket through a lock. However, I haven't been able to build a testcase that actually replicates the problem, so it's all a guess as to whether this actually fixes your issue. If you did end up with code that reliably replicated the problem, that'd be nice to have so I could try turning it into a testcase to avoid regressions.

TL;DR - I've released version 0.9.1 of jfx-bridge with a fix that I think might sort the problem. Upgrade with pip install ghidra_bridge --upgrade --force-reinstall to get the latest jfx_bridge component, then re-install the server scripts with python -m ghidra_bridge.install_server <script location>. Note that you'll need to make sure you restart the ghidra_bridge server after re-installing the server scripts, since this bug is most likely triggering on the ghidra-side (look for INFO:jfx_bridge.bridge:serving! (jfx_bridge v0.9.1 in the ghidra console to make sure it's upgraded correctly).

Please let me know if you think it's solved the issue, or if it keeps occurring.

from jfx_bridge.

justfoxing avatar justfoxing commented on June 19, 2024

Yeah, there's a 2^32 limit on how much can be sent in a message - I didn't think people would be shipping more than 4 gb (and if you are coming close to this limit, you may want to try to identify alternatives to doing that - ghidra_bridge is definitely not going to perform well under those conditions, in either memory or network speed).

But! I don't think that's the problem here. The error message is having trouble decoding the received JSON, with the decode fail happening at 196607 - nowhere near a 2^32 limit (but still a way big message - at least 191Kb :o ). Additionally, packing the message on the server would have failed - struct.pack("!I", 2**32) detects the overflow and throws an exception - so it would never even sent the message out.

Tracking down invalid unicode always sucks - it'd help to be able to see the code for your "my_function".

from jfx_bridge.

maxeisele avatar maxeisele commented on June 19, 2024

Indeed, that is a lot of data that get's serialized to JSON and definitively not the most efficient way. What's actually done is passing edges from the control flow graph of the program. However, the error does only occur once in a while, but is is always byte 0xaa at different positions. If I find a way to reproduce it deterministically, I will let you know.

What's your recommendation on passing larger data back to python? Maybe a pipe or so?

from jfx_bridge.

justfoxing avatar justfoxing commented on June 19, 2024

Additionally, if you want, you could try patching the jfx_bridge/bridge.py on the receiving side to log the message when it hits the decode issue. This could be helpful in tracking down the source of the issue even if you can't reproduce it deterministically.

This would look something like replacing the line msg_dict = json.loads(data.decode("utf-8")) in BridgeReceiverThread.run() with something like the following:

                try:
                    msg_dict = json.loads(data.decode("utf-8"))
                except Exception:
                    with open("bad_message.bin", "wb") as output:
                        output.write(data)
                    raise

from jfx_bridge.

maxeisele avatar maxeisele commented on June 19, 2024

I have printed the file, as you suggested. It really contains non-Unicode characters. I have attached a shorted version for you.
shorted_bad_message.txt

from jfx_bridge.

maxeisele avatar maxeisele commented on June 19, 2024

Wow, that was fast. For now, the error has not occurred again, so I guess it is fixed. Thanks a lot!

from jfx_bridge.

justfoxing avatar justfoxing commented on June 19, 2024

Sweet! I'll close this now, but if it does reoccur, feel free to reopen.

from jfx_bridge.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.