One of the project examples in the repository is dng_sdk (https://android.googlesource.com/platform/external/dng_sdk/+/refs/heads/master/)
dng_sdk is a C++ library and the current reachability extraction is sub-optimal on this example. An example is that the ParseIFD
function is not part of the callgraph :https://android.googlesource.com/platform/external/dng_sdk/+/refs/heads/master/source/dng_info.cpp#1971
A problem is that CFG extraction at the IR level with C++ programs is hard. For example, the call to ParseIFD
function here :https://android.googlesource.com/platform/external/dng_sdk/+/refs/heads/master/source/dng_info.cpp#1971 is not included in the graph.
The above line is called in the LLVM IR as follows:
Early in the LLVM IR function
%this1 = load %class.dng_info*, %class.dng_info** %this.addr, align 8
....
....
....
// Get "this" object
%43 = bitcast %class.dng_info* %this1 to void (%class.dng_info*, %class.dng_host*, %class.dng_stream*, %class.dng_exif*, %class.dng_shared*, %class.dng_ifd*, i64, i64, i32)***, !dbg !4560
// Load the VTable
%vtable32 = load void (%class.dng_info*, %class.dng_host*, %class.dng_stream*, %class.dng_exif*, %class.dng_shared*, %class.dng_ifd*, i64, i64, i32)**,
void (%class.dng_info*, %class.dng_host*, %class.dng_stream*, %class.dng_exif*, %class.dng_shared*, %class.dng_ifd*, i64, i64, i32)*** %43,
align 8, !dbg !4560
// Load virtual function from VTable
%vfn33 = getelementptr inbounds void (%class.dng_info*, %class.dng_host*, %class.dng_stream*, %class.dng_exif*, %class.dng_shared*, %class.dng_ifd*, i64, i64, i32)*,
void (%class.dng_info*, %class.dng_host*, %class.dng_stream*, %class.dng_exif*, %class.dng_shared*, %class.dng_ifd*, i64, i64, i32)** %vtable32,
i64 8, !dbg !4560
// Load the actual function pointer
%44 = load void (%class.dng_info*, %class.dng_host*, %class.dng_stream*, %class.dng_exif*, %class.dng_shared*, %class.dng_ifd*, i64, i64, i32)*,
void (%class.dng_info*, %class.dng_host*, %class.dng_stream*, %class.dng_exif*, %class.dng_shared*, %class.dng_ifd*, i64, i64, i32)** %vfn33,
align 8, !dbg !4560
%45 = ptrtoint void (%class.dng_info*, %class.dng_host*, %class.dng_stream*, %class.dng_exif*, %class.dng_shared*, %class.dng_ifd*, i64, i64, i32)* %44 to i64, !dbg !4560
call void @__sanitizer_cov_trace_pc_indir(i64 %45), !dbg !4560
// Call the function
call void %44(%class.dng_info* nonnull dereferenceable(332) %this1,
%class.dng_host* nonnull align 8 dereferenceable(54) %38,
%class.dng_stream* nonnull align 8 dereferenceable(104) %39,
%class.dng_exif* %call24,
%class.dng_shared* %call26,
%class.dng_ifd* %call29,
i64 %add,
i64 %42,
i32 0) #11, !dbg !4560
This means, we should be able to identify the target function from looking at the type of the this pointer and then also the index in the vtable. In this case, this is dng_info
and index 8
.
Am not entirely sure how to extract the vtables, but, it looks like debug information can help. First, to identify the class in the metadata:
!1749 = distinct !DICompositeType(tag: DW_TAG_class_type, name: "dng_info", file: !1750, line: 39, size: 2688, flags: DIFlagTypePassByReference | DIFlagNonTrivial, elements: !1751, vtableHolder: !1749)
Then correlate the scope 1749
and virtualIndex: 8
to get the ParseIFD
function:
!1920 = !DISubprogram(name: "ParseIFD", linkageName: "_ZN8dng_info8ParseIFDER8dng_hostR10dng_streamP8dng_exifP10dng_sharedP7dng_ifdmlj", scope: !1749, file: !1750, line: 114, type: !1921, scopeLine: 114, containingType: !1749, virtualIndex: 8, flags: DIFlagProtected | DIFlagPrototyped, spFlags: DISPFlagVirtual)
Actually, the entire vtable is specified as a global variable in the LLVM IR module, namely:
[15 x i8*] [i8* null,
i8* bitcast ({ i8*, i8* }* @_ZTI8dng_info to i8*),
i8* bitcast (void (%class.dng_info*)* @_ZN8dng_infoD1Ev to i8*),
i8* bitcast (void (%class.dng_info*)* @_ZN8dng_infoD0Ev to i8*),
i8* bitcast (void (%class.dng_info*, %class.dng_host*, %class.dng_stream*)* @_ZN8dng_info5ParseER8dng_hostR10dng_stream to i8*),
i8* bitcast (void (%class.dng_info*, %class.dng_host*)* @_ZN8dng_info9PostParseER8dng_host to i8*),
i8* bitcast (i1 (%class.dng_info*)* @_ZN8dng_info10IsValidDNGEv to i8*),
i8* bitcast (void (%class.dng_info*)* @_ZN8dng_info13ValidateMagicEv to i8*),
i8* bitcast (void (%class.dng_info*, %class.dng_host*, %class.dng_stream*, %class.dng_exif*, %class.dng_shared*, %class.dng_ifd*, i32, i32, i32, i32, i64, i64)* @_ZN8dng_info8ParseTagER8dng_hostR10dng_streamP8dng_exifP10dng_sharedP7dng_ifdjjjjml to i8*),
i8* bitcast (i1 (%class.dng_info*, %class.dng_stream*, i64, i64)* @_ZN8dng_info11ValidateIFDER10dng_streamml to i8*),
i8* bitcast (void (%class.dng_info*, %class.dng_host*, %class.dng_stream*, %class.dng_exif*, %class.dng_shared*, %class.dng_ifd*, i64, i64, i32)* @_ZN8dng_info8ParseIFDER8dng_hostR10dng_streamP8dng_exifP10dng_sharedP7dng_ifdmlj to i8*),
i8* bitcast (i1 (%class.dng_info*, %class.dng_host*, %class.dng_stream*, i64, i64, i64, i64, i64, i32)* @_ZN8dng_info17ParseMakerNoteIFDER8dng_hostR10dng_streammmlmmj to i8*),
i8* bitcast (void (%class.dng_info*, %class.dng_host*, %class.dng_stream*, i32, i64, i64, i64, i64)* @_ZN8dng_info14ParseMakerNoteER8dng_hostR10dng_streamjmlmm to i8*),
i8* bitcast (void (%class.dng_info*, %class.dng_host*, %class.dng_stream*, i64, i64, i64)* @_ZN8dng_info20ParseSonyPrivateDataER8dng_hostR10dng_streammmm to i8*),
i8* bitcast (void (%class.dng_info*, %class.dng_host*, %class.dng_stream*)* @_ZN8dng_info19ParseDNGPrivateDataER8dng_hostR10dng_stream to i8*)] },
This makes it a whole lot easier. Notice that _ZTV8dng_info
is "vtable for dng_infowhen demangled. For easy index calculation use the index from the
gep` instruction above and discount the first two elements in the vtable. This is great because we don't have to rely on debug symbols or anything like that - we simply need to identify:
- identify a call is a vtable call
- traceback to capture the struct/class type
- identify the index in the vtable
- find the relevant global variable representing the vtable
- get the funciton pointer.
We may have to do more in terms of capturing all implementations of a virtual function - let's deal with that afterwards.
- Am not sure if we can always relying on global variables being called "v table for ..." or if we should do some analysis of on the constructor methods of each type
- The reason we should skip the first two elements in a given vtable global variable is because the constructors for the given type will load the vtable at the 2nd index and store it in the new object. I am not sure if this is an absolute rule or if it varies, however, in the (few) examples I have looked at this is always true. If not, an option is to look at the constructor code for a given type and see which index a given vtable is used in the
GEP
instruction that assigns the vtable for the given object..