fraunhofer-aisec / archie Goto Github PK

ARCHIE is a QEMU-based architecture-independent fault evaluation tool, that is able to simulate transient and permanent instruction and data faults in RAM, flash, and processor registers.

License: Apache License 2.0

Shell 1.17% Python 42.27% Makefile 0.63% C 53.54% Jupyter Notebook 2.39%

flash qemu ram

archie's People

Contributors

Stargazers

Watchers

Forkers

tibersam ddddavidee elfalko aewag diablo0822 lukasauer e-shreve-ti classicvalues fkonrad dal3006 ks0777 berkayurun 440791217

archie's Issues

Understanding trigger address setting and handling faulted instruction address that occur at the beginning of a TB

I'm seeing some issues with situations where the instruction to be faulted occurs at the start of a TB and a negative trigger_address is specified for the fault. If the instruction to be faulted occurs at the start of a TB then very frequently (perhaps always) ARCHIE either fails to find a trigger instruction or it finds a trigger instruction that executes after the instruction to be faulted.

It appears to me that search_for_fault_location() searches the wrong direction. goldenrun_tb_exec and goldenrun_tb_info are sorted such that the first index (index 0) is the last TB executed. Thus, to search earlier in the execution, the index value should be increased if the TB at the current index doesn't have the trigger location. However, on lines 62 and 88 the index is decreased.

Also, in case the trigger location is not found, the while loop should start with a check that the idx value does not equal or exceed the number of rows in the goldenrun TB data frames. (Or a try block should catch a ValueError when the idx is used in the call to find_tb_info_row.)

I can open a PR, but wanted to have a discussion first in case I am misunderstanding something.

Test ARCHIE as part of CI

Use of instruction overwrite to replace instruction before it is executed, with trigger address set to -1.

I've been unable to create a fault matching the description above. It seems that the combination of setting trigger_address to [-1] and trigger_counter to [0] is not a supported combination? Is there a way to do this? Here is an example fault description:

			{
				"fault_address"		: [134217962, 134217972, 1],
				"fault_type"		: "instruction",
				"fault_model"		: "overwrite",
				"fault_livespan"	        : [3],
				"fault_mask"		: [191], 
				"trigger_address"	: [-1],
				"trigger_counter"	: [0],
				"num_bytes"		: [2]
			}

FIFOs are not properly deleted

Use explicit data types with pandas

Pandas may sometimes choose to convert large integers (close to 64-bit integer limit) to floats. This is very common on register dataset on 64-bit architecture and is fixed with #41. Similar situation in the code base should be fixed as well. This problem is likely to also occur if a target binary is mapped to a negative memory address (typical for kernels).

feature/fault_address exclude wildcard mode

          This won't be able to parse excludes, which are in the wildcard style. This might be quite useful when excluding whole functions, since we would only have to specify the jump instruction and the one following it.

Originally posted by @lukasauer in #60 (comment)

Allow to define a timeout for each experiment

Currently ARCHIE determines the end of an experiment by looking at the number of instructions that are encountered ("max_instruction_count") or by reaching a defined address ("end").

However, there are conditions where neither "max_instruction_count" is reached nor the address defined in "end" is reached:

A bug in the QEMU machine model was triggered by the fault injection. If the bug enters end endless loop in the host process, ARCHIE will never finish the experiments.
I encountered a case where an injected fault caused a write of the value "0xFFFFFFFD΅" to the SCTRL reg (ARMv7). QEMU continued to run, but the guest code was not executing anymore (probably due to the fact that paging was enabled but no valid pagetable was present). As QEMU did not execute any instructions "max_instruction_count" was never reached and the experiment was never finished.

Regards,

Robert

Support multiple end conditions

Missing analysis folder

From the hdf5-readme.md:

Analysis
An exemplary analysis script of the hdf5 output for an AES round skip and differential fault analysis can be found in the folder analysis.

but there is no analysis folder in the repo.

fault_mask dict - Specify multiple instructions to be overwritten

          Looks good! One thing we should look into as a follow-up, is using this to specify multiple instructions to be overwritten (either in number of instructions or all instructions in a specified address range).

Originally posted by @lukasauer in #63 (review)

Refactor codebase

faultplugin
python code

Append does not load existing goldenrun_data

If the append flag is set, the goldenrun is not executed [1]. Thus, the goldenrun_data is not created. Any existing data in the output.hdf5 or from somewhere else is as well not loaded. This leads to huge output.hdf5 files in the output mode as the results cannot be filtered. I would suggest to store the goldenrun results into a file and load if the append flag is set.

[1] https://github.com/Fraunhofer-AISEC/archie/blob/master/controller.py#L208

Provide a good description and keywords

It would be good to provide a good description and some keywords in the "About" section on GitHub. Let me know, if you have problems accessing it.

Optimized filtering

Current approach: Iterate through TBs sorted by size. More efficient approach needed.

Filter during runtime to reduce memory usage, see #11

Inconsistency between behavior and documentation: fault_livespan = 0

According to https://github.com/Fraunhofer-AISEC/archie/blob/master/fault-readme.md#fault_lifespan , "fault_lifespan" + "trigger_address" must be larger than, or equal to "0" if "trigger_address" is negative, otherwise ARCHIE will issue a warning and remove the fault:

If the trigger_address is set to a negative number, trigger_address + fault_lifespan must be larger or equal to 0. Otherwise ARCHIE will remove the fault configuration with a warning. In addition, the fault_lifespan is automatically reduced if the trigger address is calculated to be before the start point.

However, with this fault.json I don't see any warning although "trigger_address" is negative and "trigger_address" + "fault_lifespan" < 0:

{
  "max_instruction_count" : 1000,
	"start" : {
		"address" : 0xffff0000,
		"counter" : 0
	},
	"end" : {
    "address" : 0xffff06b0,
		"counter" : 0
	},
	"faults" :[	
			[
				{
					"fault_address"		: [0xffff47C8, 0xffff4888, 4],
					"fault_type"		: "instruction",
					"fault_model"		: "overwrite",
					"fault_lifespan"	: [0],
          // fault_mask = bytes to insert: ARMv7 NOP
					"fault_mask"		: [0xE320F000], 
					"trigger_address"	: [-1],
					"trigger_counter"	: [0],
					"num_bytes"		: [4]
				}
			]
		],
}

Also, it seems both "fault_lifespan" and "fault_livespan" are used throughout the code/documentation and examples. This is rather confusing.

Regards,

Robert

Traces are kept in RAM which leads to problems for long experiments

Long running experiments create a lot data. Archie normally writes the data all at once at the end of each experiment. This leads to a memory consumption which can be problematic. I would suggest to incrementally write the data.

QEMU save intermediate state

https://translatedcode.wordpress.com/category/debugging-tricks/

Faultconfig is not filtered if end is not set

If the "end" keyword is not used in the fault.json, which is valid behavior as defined in the fault-readme.md, the filtering of the faultconfig, which is at [1], is not executed.

I am not sure if this is done on purpose, but I would guess filtering is as well wanted when only using "start" and "max_instruction_count" values.

EDIT: I may fail to understand how this works as I also don't get how the goldenrun_data is created in some cases: a) start and max_instruction_count or b) max_instruction_count defined. The max_instruction_count set by the user is not used, but a huge predefined number.

[1] https://github.com/Fraunhofer-AISEC/archie/blob/master/goldenrun.py#L89