This repository is an open-source tool of AutoFirm, as well as the artifacts of our paper "AutoFirm: Automatically Identifying Reused Libraries inside IoT Firmware at Large-Scale".
The repository is organized as follows:
AutoFirm/
│ ├── README.md
│ ├── tool/
| ├── main.py
| ├── scan.py
| ├── path_search.py
| ├── linux_shell.py
| ├── common.py
| ├── bin_analysis.py
| ├── bin_process.py
| ├── decompress.py
| ├── filesystem.py
| ├── regex/
│ │ ├── arch.json
│ │ ├── black_list.csv
│ │ ├── special_command.csv
│ │ ├── special_regex.csv
│ │ └── special_ver.csv
└── artifact/
├── firmware/
└── library/
└── vulnearbility/
└── RQ/
The Autofirm/
folder is organized such that the top level directories are different splits of bench (tool, artifact).
The tool/
folder contains the source code of AutoFirm, which automatically downloading, decompressing firmware, and identifying libraries and versions, and detecting vulnearable versions.
The tool/regex
is a folder that contains the regex rules for identifying the firmware architecture, black list, special command, special regex, and special version.
The artifact/
folder contains dataset of firmware, library, vulnearbility, and research questions for our paper.
- iterate vendor
- read vendor.json to get info (the conjunction of firmware real name and local storage name)
- use binwalk to extract filesystem
- judge if the step 3 succeed
- iterate the successfully extracted filesystem
- search the /bin and /sbin directory and get executable binary
- use linux file command to get the firmware architecture
- iterate binary of step 6
- use qemu to emulate and get version info
- write to excel named by vendor
This dataset is released under the Apache-2.0 license. You're welcome to use it with attribution.
We will be regularly maintaining new packages to the dataset.
Every vulnerable library version should be matched with CVE ID. Every vulnerable library version has been manually triaged by a human.
At the time, the repository is not accepting contributions.