Git Product home page Git Product logo

Comments (2)

hello-adam avatar hello-adam commented on June 3, 2024

For the "sequence alignment" do you think something like the Header Framer would help? Instead of breaking on patterns, it could probably be modified to break on parsed KSY sections.

I was supposed to add diffing to hobbits a long time ago. I will try to think of ways to make it happen nicely.

from hobbits.

KOLANICH avatar KOLANICH commented on June 3, 2024

do you think something like the Header Framer would help

Probably, but only as a workaround.

From the diffing plugin I suppose that following workflows should work (though it is just a concept, completely untested, I have never used such an algo because I have never used sequence alignment libraries with tunable objective for reverse engineering, I only used primitive tools like diff which are already available in distros)

  1. A user specifies the constraints, this way incorporating expert knowledge. The one of the most important constraints is block size and alignment (granularity). I.e. if a block is 4 bytes, then matches can be of size of multiple 4 bytes only and must be aligned, if a "match" is not aligned, it is not a match. This is needed because some formats have a clear structure that they are composed of words of n bytes. I.e. both lto format and the git commit graph format looks like they are made of u4s (in KS notation, in C++ one uint32_t), so everything that is unaligned is likely a false match. Also, larger the block - more efficient alignment is. These kind of constraints is specific to a frame, so is available to other plugins too. I.e. hex-editor can draw a visible grid and allow selection of bytes in the blocks of the needed size and alignment only (trying to select/hover any byte in a block should act as if the whole block was clicked/hovered).
  2. A user specifies the areas within the file which are "atomic" - that cannot be separated by operations cutting the binaries (probably would require a "Cutter" abstraft base class for plugins). In the example of git format they are the hashes of commits. It should be possible to create "atoms" via other plugins, such as Kaitai, and bulk processing. If constraints are violated, it should do something.
  3. a user selects multiple frames (probably would require a special subsystem within core, because the selected frames would acquire additional metadata linking them to each other and to the operation, I guess we need a separate issue for such a subsystem)
  4. a user initiates the operation. 3. The frames got linked to each other and to the operation.
  5. Then the sequence alignment is done with an objective function heavily penalysing (exponentially of the size of an atom (because the probabilty of such an atom occuring randomly exponentially decreases assumming that all the symbols are uniformly distributed) ?) intersection of atom boundaries by and also somehow penalysing reordering of atoms.
  6. Common blocks are identified by sequence alignment in all the selected frames are added as new atoms, with the metadata relating them to the operation. The process is repeated untill convergence (not sure if there will be more than 1 iterations, but by adding atoms we change the objective function, so I am not sure that the next iteration must give the same result), when there is no more atoms is left.
  7. Widgets for the blocks are added. Likely the highlights, but we need the metadata and some additional GUI to make it clear they originate from the diff plugin.
  8. Then a user tries to do something with them. I.e. he can split some "atoms" into multiple "atoms", if he sees enough evidence for that.
  9. Then a user tries to encode the obtained knowledge into Kaitai Struct and YARA (probably we need a yara plugin too?). It should be possible to easily copy basic stuff like sizes of highlights and offsets of them with right click. Also extraction into own subframes should be possible. And maybe autogeneration of YARA templates from multiple similar highlight areas?

from hobbits.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.