softsec-kaist / binkit Goto Github PK
View Code? Open in Web Editor NEWBinary Code Similarity Analysis (BCSA) Benchmark
License: MIT License
Binary Code Similarity Analysis (BCSA) Benchmark
License: MIT License
Thank you for providing this comprehensive dataset. It is the best one I have found for binary code analysis!
I downloaded the dataset sizeopt_dataset.7z
and tried to decompile some of the .elf
files using objdump -d <file>
command. It turns out many of these .elf
could be successfully decompiled (some could not). For example, for a2ps/a2ps-4.14_gcc-8.2.0_x86_32_Os_a2ps.elf
, I have something that looks like (with more than 50000 lines)
a2ps-4.14_gcc-8.2.0_x86_32_Os_a2ps.elf: file format elf32-i386
Disassembly of section .init:
08049624 <_init>:
8049624: 53 push %ebx
8049625: 83 ec 08 sub $0x8,%esp
8049628: e8 93 0c 00 00 call 804a2c0 <__x86.get_pc_thunk.bx>
804962d: 81 c3 d3 49 04 00 add $0x449d3,%ebx
8049633: 8b 83 fc ff ff ff mov -0x4(%ebx),%eax
8049639: 85 c0 test %eax,%eax
804963b: 74 05 je 8049642 <_init+0x1e>
804963d: e8 fe 05 00 00 call 8049c40 <__gmon_start__@plt>
8049642: e8 22 0d 00 00 call 804a369 <frame_dummy>
8049647: e8 07 b8 02 00 call 8074e53 <__do_global_ctors_aux>
804964c: 83 c4 08 add $0x8,%esp
804964f: 5b pop %ebx
8049650: c3 ret
Disassembly of section .plt:
08049660 <.plt>:
8049660: ff 35 04 e0 08 08 pushl 0x808e004
8049666: ff 25 08 e0 08 08 jmp *0x808e008
804966c: 00 00 add %al,(%eax)
...
I am not quite familiar with hardware and low-levels. I am wondering if there is any possibility I could map each section of these assembly codes to the source I found for a2ps
.
buffer.c lexps.c main.h read.h ssheet.h
buffer.h lexps.h Makefile.am regex.c sshread.c
delegate.c lexps.l Makefile.in regex.h sshread.h
delegate.h lexssh.c parsessh.c select.c version-etc.c
ffaces.c lexssh.l parsessh.h select.h version-etc.h
ffaces.h long-options.c parsessh.output sheets-map.c versions.c
generate.c long-options.h parsessh.y sheets-map.l versions.h
generate.h main.c read.c ssheet.c yy2ssh.h
Thank you. I am looking forward to your reply.
Cheers,
Yui
Hi, I am trying to build BinKit.
When I was execute "setup_gcc", some error occurs.
[00:10] / gmake: *** [/mnt/nfs/repo/BinKit/tools//crosstool-ng/ct-ng:261:build] error 1
/mnt/nfs/repo/BinKit//ctng_conf/5.5/mipseb_32
[ERROR] isl: download failed
According to config file, CT_ISL_MIRRORS=http://isl.gforge.inria.fr/
However, this source seems down for a long period. HERE
So is there any solutions? Thanks a lot.
Hi, I am testing my method on BinKit, The dataset is too large, could you supply the dataset(pickle files) which extracts features already by Tiklib.
I have successfully cross-compiled most of the gnu software following the workflow. However, for some of the software, "Branch out of range" error occurs when compiling to mips_64 with clang-4.0 (as well as clang-obfus-xxx).
For example, when compiling tar-1.30 with clang-4.0 (O0) to mips_64, it shows:
...
Making install in gnu
make[1]: Entering directory '/home/binkit/dataset/gnu/sources/tar/tar-1.30_clang-4.0_mips_64_O0_normal/gnu'
make install-recursive
make[2]: Entering directory '/home/binkit/dataset/gnu/sources/tar/tar-1.30_clang-4.0_mips_64_O0_normal/gnu'
make[3]: Entering directory '/home/binkit/dataset/gnu/sources/tar/tar-1.30_clang-4.0_mips_64_O0_normal/gnu'
CC parse-datetime.o
/tmp/parse-datetime-41f91c.s: Assembler messages:
/tmp/parse-datetime-41f91c.s:44857: Error: branch out of range
/tmp/parse-datetime-41f91c.s:44960: Error: branch out of range
/tmp/parse-datetime-41f91c.s:45061: Error: branch out of range
/tmp/parse-datetime-41f91c.s:47476: Error: branch out of range
clang-4.0: error: assembler command failed with exit code 1 (use -v to see invocation)
Makefile:1897: recipe for target 'parse-datetime.o' failed
make[3]: *** [parse-datetime.o] Error 1
make[3]: Leaving directory '/home/binkit/dataset/gnu/sources/tar/tar-1.30_clang-4.0_mips_64_O0_normal/gnu'
Makefile:1922: recipe for target 'install-recursive' failed
make[2]: *** [install-recursive] Error 1
make[2]: Leaving directory '/home/binkit/dataset/gnu/sources/tar/tar-1.30_clang-4.0_mips_64_O0_normal/gnu'
Makefile:2076: recipe for target 'install' failed
make[1]: *** [install] Error 2
make[1]: Leaving directory '/home/binkit/dataset/gnu/sources/tar/tar-1.30_clang-4.0_mips_64_O0_normal/gnu'
Makefile:1388: recipe for target 'install-recursive' failed
make: *** [install-recursive] Error 1
I have tried my best to google it but I haven't found any solution.
I wonder if you have encountered similar problem before and do you have suggestions to solve it?
Hello,
You have mentioned that BinKit 2.0 has 371,928 binaries, however, the Zip file download from the drive contains ~213K files. Could you please clarify?
Thank you
Could you please release a dataset of all source code? Manually downloading all packages of the right version is time-consuming and error-prone.
By the way, this is an outstanding work for releasing such a complete binary code similarity detection dataset.
Hi, I am trying to compile Coreutils-8.30 using clang-6.0 in x86_64 using O2, but meet errors. The log file coreutils-8.30_clang-6.0_x86_64_O2_normal_CCTARGET_AUTO_install_error.log writes that:
_lib/libcoreutils.a(randint.o): In function explicit_bzero': /root/tools//x86_64-ubuntu-linux-gnu-8.2.0/x86_64-ubuntu-linux-gnu/sysroot/usr/include/bits/string_fortified.h:83: undefined reference to
__explicit_bzero_chk'
/root/tools//x86_64-ubuntu-linux-gnu-8.2.0/x86_64-ubuntu-linux-gnu/sysroot/usr/include/bits/string_fortified.h:83: undefined reference to __explicit_bzero_chk' lib/libcoreutils.a(randread.o): In function
explicit_bzero':
/root/tools//x86_64-ubuntu-linux-gnu-8.2.0/x86_64-ubuntu-linux-gnu/sysroot/usr/include/bits/string_fortified.h:83: undefined reference to `_explicit_bzero_chk'
clang-6.0: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [src/cp] Error 1
make[1]: *** [install-recursive] Error 1
make: *** [install] Error 2
Do you know how to solve this problem? Waiting for your reply.
I am trying to set up the gcc, but this issue has been raised and I reported it to crosstool-ng
crosstool-ng/crosstool-ng#1842 (comment)
Is it mandatory to use the same Glibc version which is 2.26 you mentioned in the setup_gcc.sh script?
Any suggestion?
Greetings. Thank you for your fundamental work. Now I encounter a problem, which makes me feel confused.
README points out that some binaries are compiled by gcc-6.4.0. However, after I downloaded the BinKit 2.0 dataset and checked the content, I cannot find any binaries generated by gcc-6.4.0. Is the README wrong?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.