Git Product home page Git Product logo

Comments (1)

kupsch avatar kupsch commented on July 19, 2024

There are a couple of problems with the current LineInformation implementation.

The first one that you suspected is the issue of address ranges with the same starting and ending value (an empty range if it is half closed). The records produced by the DWARF state machine used to encode the line map table are defined such that the address for a line is inclusive as are any addresses up to but not including the address of the next line in the map (if the address is larger). This is easily fixed by changing ranges [x, x) to [x, x+1) as this matches DWARF semantics.

The second problem is that the implementation assumes non-overlapping ranges. If ranges overlap then the results of an address query to a statement will in most cases will be incorrect. When I create a pathological line maps containing all different types of overlaps and query addresses between and at all boundaries the results are correct for the lowest address and then everything else is incorrect with statements missing that should have been included as their range contains the address, and statements included that should not have been as their range does not contain the address, or both.

I see two ways to fix this.

  1. use an index sorted by the address range pair (this is what is currently done, along with another index sorted on the high address which can go away). To query would entail starting at the lowest pair, stopping when the low address of the pair is greater than the address. The time to create this data and the memory used would be less than the current implementation, but the query time becomes O(#line map entries for the module).
  2. switch to a real interval data structure, which is more complicated to create in terms of time and memory as you now have a data structure where each interval is a vector of line map entries. The intervals cover distinct ranges and must be split is new entries overlap and do not have the exact same range. On the plus side has performance of address queries is O(log #intervals).

The trade off is then how many queries are going to be performed vs the extra expense in memory and CPU to create the interval data structure instead of the simpler map. If the number of line map intervals is large and the number of bytes in a module is large, and the number of queries is large, then options 2 would be called for, otherwise it may be best to stick with option 1.

@mplegendre do you feel your use case would benefit by option 2?

from dyninst.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.