Git Product home page Git Product logo

at51's Introduction

at51

Crates.io

A bunch of applications for the purpose of reverse engineering 8051 firmware. Currently, there are four applications:

  • stat, which gives blockwise statistical information about how similar a given file's opcode distribution is to normal 8051 code
  • base, which determines the load address of a 8051 firmware file
  • libfind, which reads library files and scans the firmware file for routines from those files
  • kinit, which reads a specific init data structure generated by the C51 compiler

The output of each subcommand can also be used in other programs via JSON.

Installation

Downloadable releases should be on the release page of the github repository.

In order to compile manually, only cargo is needed, which can be installed with rustup. With cargo one can install it with cargo install at51.

Alternatively, to install from the repository source, do

git clone 'https://github.com/8051Enthusiast/at51.git'
cargo install --path at51

stat

This subprogram is useful for determining which regions of a file are probably 8051. If you want to determine the architecture of a file in general, a useful tool might be cpu_rec.

This subcommand does some statistics on the firmware. It steps through the file as if it was a continuous instruction stream and does some tests on those instructions. The image is divided into equal-sized blocks and the value of the test for each block (which by default has a size of 512) is given back. That means it is normally more suited for bigger images (in this context, something like >4kB) where you want to know which regions are probably 8051 codes and which are data.

By default, it calculates the aligned jump test, which gives the percentage of relative jump instructions where the jump target is not on a start of an instructions. This has a value of 0 to 1, where 0 is better and it generally works well, but has a lot of NaN on streams of 0s and similiar repeated instructions, as there are no jumps in those blocks. If the location is entirely 8051 code, it should have a value of 0 (although someone might do some hacks with unaligned jumps), but it can contain small jump tables and therefore is sometimes not exactly zero, but still should be fairly low (<0.1). One can additionally show the number of jumps used with the -n flag to know how certain the value is. Furthermore, two other flags -A and -O exist, where the first one also includes absolute jumps in the calculation (useful if the file is already aligned and there are not enough jumps) and the second one includes jumps to outside the firmware image as misses (useful with -A if one knows there is no code outside the firmware and the firmware file does not cover the whole address space).

It can also do a blockwise Kullback-Leibler divergence on the distribution on the opcodes, which means each block has a value from 0 to 1, 0 being the most 8051-like. A default distribution derived from a corpus I did is included (which I can probably not publish due to copyright issues), but you can set your own corpus with the -c option. With that metric, <0.06 usually means it is 8051 code, 0.06-0.12 means it is probably either 8051 with some data in it (like a jump table) or it is unusual (maybe a small set of instructions repeated a lot of times). Note that random data is only at roughly 0.25, so the Kullback-Leibler might not be very reliable.

An alternative is a chi squared test on the distribution of opcodes, which is can have a value bigger than 1 and is not constrained in its values. But as a downside, it is harder to say what ranges usually are 8051 code, as that changes for example with blocksize. It is useful for comparing the 8051-ness of different blocks and is normally more reliable thatn Kullback-Leibler divergence in that case. Also note that I have no experience in statistics so I may be doing things wrong.

One can also set the standard metric that gets used when no option is given in the config under the name stat_mode with either AlignedJump, SquareChi or KullbackLeibler.

I normally do not need the second or third option (Kullback-Leibler or chi squared) and they exist mostly because I didn't implement the first test until later.

One can use the output as the input for gnuplot, for example with

at51 stat path/to/firmware | gnuplot -p -e "plot '-' with lines"

base

This application tries to determine the load address of a firmware image (which in the best case only includes the actual firmware that will be on the device). It loads the first 64k of a given file and for each offset from 0 to 0x10000 determines how many ljmps/lcalls jump right behind ret instructions, as that is the place where new functions normally starts. The offset can also be interpreted cyclically inside the 16-bit space (with -c), which means that at offset 0xffe0, the first 0x20 bytes are loaded at 0xffe0-0xffff and the rest is then loaded at the start of the address space. The likeliness of the output is the amount of jumps and calls that target instructions right behind rets, as in this example:

Index by likeliness:
	1: 0x3fe0 with 218
	2: 0xc352 with 89
	3: 0xd096 with 87

Here the most likely load location is 0x3fe0, as it has 218 fitting ljmp/lcall instructions, in contrast to the only 89 instructions or 87 instructions of the second and third case. In the example given, the load location of this particular 0x3fe0 address is caused by a 0x20 byte header and the code itself starts at 0x4000.

Normally, acall/ajmp are ignored since this introduces a lot of noise by non-code data (1/16th of the 8051's instruction set is acall/ajmp) and can be enabled with the -a flag, but make sure that noisy/non-8051 parts of the files (as detectable with entrpoy and the stat application) are zeroed-out.

One can also use multiple firmware images where one knows that they are loaded at the same location (useful for smaller images where also different revisions exist), in which case the arithmetic mean of the fitting instructions on each offset is calculated.

libfind

This application loads some libraries given by the user and tries to find the standard library functions inside the firmware. Right now, OMF-51 libraries from C51 (which is the compiler of most firmwares in my experience) and sdld libraries from sdcc are supported

In general, library files contains some bytes of the library functions and then some "fixup" locations which are changed at linking time and are often targets of jumps. They are normally divided into different segments and each segment can have public symbols defined for itself and each fixup location can reference other segments by id or public symbol.

For each segment, the occurences of it are found by comparing the bytes of the non-fixup locations against each possible location in the firmware. It then tries to verify that it is actually the segment by following the fixups (which can be done by reading the values in the firmware that are at the fixup location) and determining if the referenced segments are at the targets referenced by the firmware.

The public symbols of each matching segment is then output, along with its location and sometimes a description. If some referenced segment is not there, it is output in square brackets to signify that. On the other hand, if a segment is referenced but not actually there, that is output in parentheses (this is mostly useful for finding main, as it cannot be included in the libraries, but is referenced). If there are multiple segments matching, but one matches better (nothing > square brackets > parentheses), only the ones that match best are output.

To illustrate this, consider these three segments:

segment 0: 01 23 45 XX XX 67
           public symbol: "sym1"
           fixup XX XX: 16-bit absolute code reference to segment 1
segment 1: 89 AB CD EF
segment 2: 01 23 45 00 08
           public symbol: "sym2"

And then the code

      0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
0000: 02 25 54 01 23 45 00 12 67 52 36 14 46 39 45 23
0010: 00 00 89 AB CD EF 33 01 23 45 00 08 67 25 34 12

The program would search for the segments and would find segment 0 at locations 0x03 and 0x17, segment 1 at location 0x12 and segment 2 at location 0x17. It would then verify the fixups for all segment occurances:

  • The segment 0 at location 0x03 has 00 12 at the fixup location, which interpreted as an absolute 16-bit address points to 0x0012, where segment 1 is. Thus it is valid.
  • The segment 0 at location 0x17 points to 0x0008, however there is no occurence of segment, so it is put in brackets.
  • The segment 1 is valid, but has no public symbol and thus is not output. This is mostly the case with auxillary segments inside a module and outputting them would not really give any insight.
  • The segment 2 is valid and has sym2 as public symbol. It overshadows the occurence of segment 0 at the same location, as it does not have valid references.

The output would then be

Address | Name                 | Description
0x0003    sym1
0x0017    sym2

For C51, the relevant libraries are of the form C51*.LIB (not C[XHD]51*.LIB) and can currently be found on the internet just by searching for them (one name that might pop up is C51L.LIB), but you can of course also try to download the trial version of C51 to get the libraries from there.

When searching for functions in a C51-compiled firmware, one thing that will often pop up is a [?C_START] and a (MAIN). This is because the compiler inserts a function called ?C_START before main which loads variable variable from a data structure, which can be read by at51 kinit. ?C_START is in square brackets because it references MAIN, which of course is not a library function, which is the same reason (MAIN) is in parentheses.

For sdcc, the relevant libraries are normally found at /usr/share/sdcc/lib/{small,small-stack-auto,medium,large,huge}/ if you have a linux sdcc installation. Note that noise with sdcc libraries might be higher, as the fixup locations in the library files do not specify whether the target is in the code, imem etc. address space.

It is recommended to align the file to its load address before using this, since absolute locations may fail to verify otherwise. Segments shorter than 4 bytes are not output, since they provide much noise and don't really add any info.

A list of libraries to use if no others are given as argument can be specified in the config using the field "libraries" containing a list of library paths.

Example (on some random wifi firmware)

With at51 libfind some_random_firmware /path/to/lib/dir/:

Address | Name                 | Description
0x4220    ?C?CLDOPTR             char (8-bit) load from general pointer with offset
0x424d    ?C?CSTPTR              char (8-bit) store to general pointer
0x425f    ?C?CSTOPTR             char (8-bit) store to general pointer with offset
0x4281    ?C?IILDX              
0x4297    ?C?ILDPTR              int (16-bit) load from general pointer
0x42c2    ?C?ILDOPTR             int (16-bit) load from general pointer with offset
0x42fa    ?C?ISTPTR              int (16-bit) store to general pointer
0x4319    ?C?ISTOPTR             int (16-bit) store to general pointer with offset
0x4346    ?C?LOR                 long (32-bit) bitwise or
0x4353    ?C?LLDXDATA            long (32-bit) load from xdata
0x435f    ?C?OFFXADD            
0x436b    ?C?PLDXDATA            general pointer load from xdata
0x4374    ?C?PLDIXDATA           general pointer post-increment load from xdata
0x438b    ?C?PSTXDATA            general pointer store to xdata
0x4394    ?C?CCASE              
0x43ba    ?C?ICASE              
0x46f5    [?C_START]            
0x50e1    (MAIN)                

For some symbol names, which are in a general form, there are descriptions available.

kinit

This application is very specific to C51 generated code in that it decodes a specific data structure used to initialize memory values on startup. The structure is read by the ?C_START procedure and the location of the structure can therefore usually be found by running libfind and looking at the two bytes after the start of ?C_START (since it starts with a mov dptr, #structure_address). When (?C_START) is in parentheses, this is probably not the case, as ?C_START is referenced by the ljmp at location 0 in the keil libraries, which happens to be the instruction at the start of most 8051 firmwares even if there is no ?C_START function.

Example

With at51 kinit -o offset some_random_firmware:

bit 29.6 = 0
idata[0x5a] = 0x00
xdata[0x681] = 0x00
xdata[0x67c] = 0x00
xdata[0x692] = 0x00
xdata[0x6aa] = 0x01
xdata[0x46f] = 0x00
bit 27.2 = 0
bit 27.0 = 0
bit 26.3 = 0
bit 26.1 = 0
xdata[0x47d] = 0x00
xdata[0x40c] = 0x00
bit 25.3 = 0
xdata[0x46d] = 0x00
idata[0x5c] = 0x00
xdata[0x403..0x40a] = [0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]
xdata[0x467] = 0x00

Config

A (rudimentary) config file in json format can be created at $CONFIG_PATH/at51/config.json, where $CONFIG_PATH depends on the OS. Following paths are normally used:

  • ~/.config for Linux
  • ~/Library/Preferences for macOS
  • ~/AppData/Roaming for Windows

Example config:

{
	"libraries": [
    "/usr/share/sdcc/lib/small",
    "/usr/share/sdcc/lib/medium",
    "/usr/share/sdcc/lib/large",
    "/usr/share/sdcc/lib/huge",
    "/opt/C51/LIB"
  ],
	"stat_mode": "AlignedJump"
}

at51's People

Contributors

8051enthusiast avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

at51's Issues

error: could not compile `at51`.

Hello,

I get an error when I do the cargo build:

Compiling at51 v0.4.2 (/home/toto/at51)
error: no rules expected the token `,`
   --> src/stat.rs:133:80
    |
133 |                 InsType::LJMP | InsType::LCALL | InsType::AJMP | InsType::ACALL,
    |                                                                                ^ no rules expected this token in macro call

error: aborting due to previous error

error: could not compile `at51`.

I think we juste have to remove the comma in src/stat.rs at line 133, and it works.

BR,

Use another name than area8051

I just noticed that the name "area8051" is already taken (by another rust 8051 project using a backronym too, what are the chances).

Fails to build on Debian 10.3

ii cargo 0.35.0-2 amd64 Rust package manager
ii rustc 1.34.2+dfsg1-1 amd64 Rust systems programming language

$ cargo build
Updating crates.io index
Downloaded clap v2.33.0
Downloaded bitflags v1.2.1
Downloaded rustfft v3.0.1
Downloaded serde_json v1.0.48
Downloaded dirs v2.0.2
Downloaded nom v5.1.1
Downloaded num-traits v0.2.11
Downloaded lazy_static v1.4.0
Downloaded serde v1.0.104
Downloaded itoa v0.4.5
Downloaded version_check v0.9.1
Downloaded atty v0.2.14
Downloaded memchr v2.3.3
Downloaded strength_reduce v0.2.3
Downloaded num-integer v0.1.42
Downloaded unicode-width v0.1.7
Downloaded ryu v1.0.3
Downloaded ansi_term v0.11.0
Downloaded dirs-sys v0.3.4
Downloaded transpose v0.1.0
Downloaded textwrap v0.11.0
Downloaded strsim v0.8.0
Downloaded vec_map v0.8.1
Downloaded num-complex v0.2.4
Downloaded ar v0.8.0
Downloaded libc v0.2.67
Downloaded autocfg v1.0.0
Downloaded serde_derive v1.0.104
Downloaded quote v1.0.3
Downloaded cfg-if v0.1.9
Downloaded syn v1.0.16
Downloaded proc-macro2 v1.0.9
Downloaded lexical-core v0.6.7
Downloaded rustc_version v0.2.3
Downloaded arrayvec v0.4.12
Downloaded unicode-xid v0.2.0
Downloaded semver v0.9.0
Downloaded semver-parser v0.7.0
Downloaded nodrop v0.1.14
Downloaded static_assertions v0.3.4
Compiling proc-macro2 v1.0.9
Compiling unicode-xid v0.2.0
Compiling semver-parser v0.7.0
Compiling autocfg v1.0.0
Compiling syn v1.0.16
Compiling ryu v1.0.3
Compiling libc v0.2.67
Compiling arrayvec v0.4.12
Compiling bitflags v1.2.1
Compiling version_check v0.9.1
Compiling memchr v2.3.3
Compiling serde v1.0.104
Compiling nodrop v0.1.14
Compiling static_assertions v0.3.4
Compiling cfg-if v0.1.9
Compiling unicode-width v0.1.7
Compiling ansi_term v0.11.0
Compiling strsim v0.8.0
Compiling vec_map v0.8.1
Compiling itoa v0.4.5
Compiling strength_reduce v0.2.3
Compiling transpose v0.1.0
Compiling ar v0.8.0
Compiling lazy_static v1.4.0
Compiling semver v0.9.0
Compiling nom v5.1.1
Compiling num-traits v0.2.11
Compiling num-complex v0.2.4
Compiling num-integer v0.1.42
Compiling textwrap v0.11.0
Compiling rustc_version v0.2.3
Compiling lexical-core v0.6.7
Compiling dirs-sys v0.3.4
Compiling atty v0.2.14
Compiling clap v2.33.0
Compiling quote v1.0.3
Compiling dirs v2.0.2
Compiling rustfft v3.0.1
Compiling serde_derive v1.0.104
Compiling serde_json v1.0.48
Compiling at51 v0.4.0 (/ssdhome/ranma/src/at51)
error[E0277]: the trait bound std::iter::Take<std::slice::Iter<'_, num_complex::Complex<f32>>>: std::iter::DoubleEndedIterator is not satisfied
--> src/base.rs:93:18
|
93 | .rev()
| ^^^ the trait std::iter::DoubleEndedIterator is not implemented for std::iter::Take<std::slice::Iter<'_, num_complex::Complex<f32>>>

error[E0599]: no method named cycle found for type std::iter::Rev<std::iter::Take<std::slice::Iter<'_, num_complex::Complex<f32>>>> in the current scope
--> src/base.rs:94:18
|
94 | .cycle()
| ^^^^^
|
= note: the method cycle exists but the following trait bounds were not satisfied:
&mut std::iter::Rev<std::iter::Take<std::slice::Iter<'_, num_complex::Complex<f32>>>> : std::iter::Iterator
std::iter::Rev<std::iter::Take<std::slice::Iter<'_, num_complex::Complex<f32>>>> : std::iter::Iterator

error[E0277]: the trait bound std::string::String: std::convert::From<&std::string::String> is not satisfied
--> src/libfind/mod.rs:116:23
|
116 | name: String::from(s),
| ^^^^^^^^^^^^ the trait std::convert::From<&std::string::String> is not implemented for std::string::String
|
= help: the following implementations were found:
<std::string::String as std::convert::From<&'a str>>
<std::string::String as std::convert::From<std::borrow::Cow<'a, str>>>
<std::string::String as std::convert::From<std::boxed::Box>>
= note: required by std::convert::From::from

error: aborting due to 3 previous errors

Some errors occurred: E0277, E0599.
For more information about an error, try rustc --explain E0277.
error: Could not compile at51.

How to build/compile/use Linux?

I've tried both downloading the application, to which I get:

./at51: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by ./at51)

And git cloning but can't find instructions to build. How can I use this tool?

Additional documentation please

Hi! I'm very pleased to see this tool. Thank you for making it.

Unfortunately I am unclear on how to interpret the results. For example, the stat command says it "Shows statistical information about 8051 instruction frequency". There is also something in the README that explains that lower numbers are more indicative of 8051. But I'm still not sure what the two columns mean. I believe the first is an opcode, and maybe the second gives the chi-square value vs. some "expected" distribution? What sort of distributions would make you feel confident it was 8051 code? An example might help.

I have a similar concern about the libfind subcommand, where I am getting some output that I assume means you've detected calls to library functions in the .LIB files I supplied. But I don't have the experience to determine whether it's accidental... maybe an example there? And/or add call counts - I think if I saw (MAIN) being called once but ?C?ULCMP multiple times I might assume it was legitimate 8051 code and not noise... Thank you!

Weird results on RTD2719W firmware

at51 libfind is one of my standard tools when reverse-engineering 8051 firmwares. When working on a firmware binary for RTD2719W, I get strange results, though.

Binary (well, actually just the first bank):
rtd_000000.bin.zip

Running libfind with Keil's C51.LIB produces good results:

$ at51 libfind rtd_split/rtd_000000.bin C51C.LIB
Address | Name                 | Description
0x00fe    ?C?COPY               
...
0x079d   [?C_START]             
0x0a7b   (MAIN)                 
0x42da    ?C?PLDXDATA            general pointer load from xdata
0x5be6    ?C?PLDCODE             general pointer load from code space

Then I built a project with a veeeery similiar codebase (and many matching functions) as C51 library using Keil UV5: linsn_2796Code
Resulting lib:
RL6193_Project.LIB.zip

Running libfind produces repeating blocks of a bunch of functions and only very few good matches: (full output: libfind.txt)

$ at51 libfind rtd_split/rtd_000000.bin RL6193_Project.LIB
Address | Name                 | Description
0x0006   (RTDFACTORYOSDFUNCDISABLEOSD) 
0x0006   (SCALEROSDENABLEOSD)   
0x0006   (SCALERSYNCTMDSHPDTOGGLEPROC) 
0x0006   (USERCOMMONEEPROMSAVEMODEUSERDATA) 
0x0006   (USERDDCCIHANDLER)     
0x0006   (_SCALERCOLORDCCNORMALIZEFACTORADJUST) 
0x0006   (_SCALERCOLORSIXCOLORINITIALNORMAL) 
0x0006   (_SCALERMCUUARTWRITE)  
0x0006   (_SCALEROSDSETTRANSPARENCY) 
0x0006   (_SCALEROSDWINDOWDISABLE) 
0x0016   (SCALERAUDIODIGITALAUDIOINITIAL) 
0x0016   (SCALERFRCINITIAL)     
0x0016   (USERCOMMONAUTOCONFIG) 
0x00fe   (RTDFACTORYOSDFUNCDISABLEOSD) 
0x00fe   (SCALEROSDENABLEOSD)   
0x00fe   (SCALERSYNCTMDSHPDTOGGLEPROC) 
0x00fe   (USERCOMMONEEPROMSAVEMODEUSERDATA) 
0x00fe   (USERDDCCIHANDLER)     
0x00fe   (_SCALERCOLORDCCNORMALIZEFACTORADJUST) 
...
0x5dbe    SCALERTMDSRX1TMDSVIDEODETECTEVENT  
0x5dbe    SCALERTMDSRX2TMDSVIDEODETECTEVENT  
0x5dbe    SCALERTMDSRX4TMDSVIDEODETECTEVENT  
0x5dbe    SCALERTMDSRX5TMDSVIDEODETECTEVENT  
0x5e80   (SCALERAUDIODIGITALAUDIOINITIAL) 
0x5e80   (SCALERFRCINITIAL)     
0x5e80   (USERCOMMONAUTOCONFIG) 
0x5f4c    _SCALERGLOBALCRYSTALCLKSEL  
0x5f5d    SCALERAUDIODACCLRSTATECHANGE  
0x5f7e   (SCALERPINSHAREPOWERONRESET) 
0x5fa6    GETINPUTCAPFOUNTION   
0x5fa6    SCALERCOLORPCMGETTABLEBANK  
0x5fa6    SCALERDEBUGGETDDCCIDEBUGMODE  
0x5fa6    SYSMODEGETDISPLAYMODE  
0x5fa6    USERINTERFACEGETSHARPNESSCOEFBANKNUM  
0x5fac    SCALERMCUCACHEINITIAL  
0x5fbb    SCALERDPTXHDCPTIMEOUTFORVREADYEVENT  
0x5fc3    _SCALERTIMERDELAYXMS  
...

Any idea, what's wrong here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.