DF-SCA-Dynamic-Frequency-Side-Channel-Attacks-are-Practical

We present DF-SCA, which is a software-based dynamic frequency side-channel attack on Linux and Android OS devices. We show that Dynamic Voltage and Frequency Scaling (DVFS) feature in modern systems can be utilized to perform website fingerprinting attacks for Google Chrome and Tor browsers on modern Intel, AMD, and ARM architectures. Moreover, we extract properties of keystroke patterns on frequency readings, that leads to 95% accuracy to distinguish the keystrokes from other activities on Android phones.

Experimental Setup:

Intel Comet Lake Microarchitecture
- CPU Model: Intel(R) Core (TM) i7-10610U CPU @ 1.80GHz
- OS: Ubuntu 20.04 LTS
- Linux Kernel: 5.11.0-46-generic
- Google Chrome version 85.0.4183.102
- Tor browser version 10.5.10
Intel Tiger Lake Microarchitecture
- CPU Model: Intel(R) Core (TM) i7-1165G7 @ 2.80GHz
- OS: Ubuntu 20.04.4 LTS
- Linux Kernel: 5.13.0-44-generic
- Google Chrome version 101.0.4951.64
- Tor browser version 10.5.10
AMD Ryzen 5 Microarchitecture
- CPU Model: AMD Ryzen 5 5500U CPU @ 1.70GHz
- OS: Ubuntu 20.04.4 LTS
- Linux Kernel: 5.13.0-44-generic
- Google Chrome version 101.0.4951.64
- Tor browser version 10.5.10
ARM Cortex-A Microarchitecture
- CPU Model: Four ARM Cortex-A53 and Four ARM Cortex-A73 cores
- OS: Android 9
- Google Chrome version 97.0.4692.98
- Bank of America application version 21.11.04
Additional
- Nvidia GeForce RTX 3090 GPU card
- Software: MATLAB R2021a

Data Collection:

For collecting CPU frequency for different websites follow the steps mentioned below:

Checking the current scaling governor in the victim's device:
cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_governor
For offline phase, the attacker can change the scaling governor in his device to match with the victim's device and collect data which will be utilized to train the ML model. The command for changing the current scalling governor to ondemand mode:
echo "ondemand" | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Data collection for Chrome Browser: In this study, we have collected 100 energy consumption measurements for each 100 different websites (Website list. We are going to record 100 individual measurements of 20 websites at a single run and save it in a single csv file. The data_chrome1.sh is written to record data for the first 20 websites and will be saved in web1_20.csv file. Similarly, the data_chrome2.sh is responsible for collecting data for the next 20 websites and will be saved in web21_40.csv file. By following the commnands mentioned below, five csv file will be created and saved in the directory which we will call raw data.

cd ~
cd Data Collection/
script -a web1_20.csv
./data_chrome1.sh
exit
script -a web21_40.csv
./data_chrome2.sh
exit
script -a web41_60.csv
./data_chrome3.sh
exit
script -a web61_80.csv
./data_chrome4.sh
exit
script -a web81_100.csv
./data_chrome5.sh
exit
Preprocessing: Once the data collection is completed, the next step is to pre-process the data. Each csv file contains data for 20 websites and each website has 100 measurements with 1000 samples per measurement. The Data_process.m reads the collected raw data, perfomrs pre-processing and split it among training, validation, and test dataset. By running Data_process.m in MATLAB, the processed data will be saved in the Data directory. This processed data will be feed into the predictions model for the later experimentations.
Data collection for Tor Browser: The procedure remains same for Tor Browser as well. For Tor Browser scenario, follow the commands below:
cd ~
cd Data Collection/
script -a tor1_20.csv
./data_tor1.sh
exit
script -a tor21_40.csv
./data_tor2.sh
exit
script -a tor41_60.csv
./data_tor3.sh
exit
script -a tor61_80.csv
./data_tor4.sh
exit
script -a tor81_100.csv
./data_tor5.sh
exit

It is to be noted that, we have collected 3000 samples per measurement for Tor Browser as it takes compratively more time to load the webpages in the Tor Browser. The website list for the Tor browser scenario is also different than chrome browser as some of the webpages does not work in Tor browser due to various restrictions.

Preprocessing: For preprocessing the raw data, run the Data_process_tor.m file in MATLAB, which will create the processed data and will save in the Data directory as before.

After preprocessing the data, the final data for different devices are allocated to the appropriate folders.

Website Fingerprinting:

Website fingerprinting is carried out on four different devices. The procedures and explanations will remain consistent for all of them. As example, we will explain the procedures for reproducing the result for AMD Ryzen 5. The directories are named based on the different microarchitectures, such as AMD Ryzen 5, Intel Comet Lake, Intel Tiger Lake, and ARM Cortex A-73.

The Data Collection directory contains the relevant script for collecting CPU frequency data based on the Algorithm 1 presented in the paper.

The Universal ML directory contains the code from which Table 5 is generated to prove the concept of universal ML explained in Section 9: Discussion.

In the following section, we will explain the case study for AMD Ryzen 5. The procedures will remain same for other microarchitectures as mentioned earlier.

cd AMD_Ryzen_5/

Google-Chrome Scenario:

The website fingerprinting is tested with six different scaling governors avaialable in our device. The default scaling governor of AMD Ryzen 5 is ondemand. The folders are named based on the scaling governors. As example, we will explain the scanario of deafult scaling governor ondemand. The explanation will be consistent with other governors as well.

ondemand
- Data: The final preprocessed data splitted among train, validation, and test.
  - X_train_100.csv: CPU frequency traces of 80 measurements from every 100 websites that constitutes the train dataset. Each row refers to the 1000 samples/measurement (Shape: 8000x1000).
  - Y_train_100.csv: Labels of the train data (Shape: 8000x100).
  - X_val_100.csv: CPU frequency traces of 10 measurements from every 100 websites that constitutes the validation dataset. Each row refers to the 1000 samples/measurement (Shape: 1000x1000).
  - Y_val_100.csv: Labels of the validation data (Shape: 1000x100).
  - X_test_100.csv: CPU frequency traces of the rest 10 measurements from every 100 websites that constitutes the test dataset which will be used in online phase. Each row refers to the 1000 samples/measurement (Shape: 1000x1000).
  - Y_test_100.csv: Labels of the test data (Shape: 1000x100).
- CNN_1D.py: The CNN model trained during the offline phase for website fingerprinting. It takes the train and validation data as input and make the prediction. After the training complete, the model is saved as Model1.h5.
- Restored_model_val.py: The pretrained CNN model which is saved as Model1.h5, is loaded to make a prediction on the validation dataset.
- Restored_model_test.py: The pretrained CNN model which is saved as Model1.h5, is loaded to make a prediction on the test dataset. This script is used to evaluate the performance of the prediction model during the online phase.
- knn.py, rf.py, svm.py: The performance of the prediction model is tested with other ML algorithm-based model, such as Kth nearest neighbour (KNN), Random forest (RF), and Support Vector Machine (SVM).

Instruction to run:

Offline phase:
- The CNN_1D.py is executed to train a model using the data collected from AMD Ryzen 5. The pretrained model is saved as Model1.h5.
- To look at the accuracy on validation dataset, Run the Restored_model_val.py. It is to be noted that, GPU support with appropriate environment (tensorflow backened with keras) is required for executing the code.
  - cd Chrome/ondemand/
  - ./Restored_model_val.py
Online phase:
- Run the Restored_model_test.py to find out the accuracy on test dataset. This accuracy is reported on Table 2 of the paper.
  - ./Restored_model_test.py
- For checking the performance of other models except for our proposed CNN, run the following code:
  - ./knn.py
  - ./svm.py
  - ./rf.py
If anyone is interested to create a new pretrained model on the similar set up, then run the CNN_1D.py after updating the name of the model. Then, upate the name of the loaded model in Restored_model_test.py. However, It is to be noted that, for a newly created pretrained model, the accuracy will slightly differ than the one we reported in the paper. For reproducing the exact accuracy, we have attached the pretrained model here named Model1.h5.

Tor Browser Scenario:

For the Tor Browser scenario, the explanation remains same, as we have named the file in the same manner as Chrome Browser scenario for the convenience. Although the test accuracy on the Tor browser is comparatively lower than the Google Chrome scenario, we also provide the top 5 accuracy for Tor browser scenario, which corresponds to the accuracy rate at which the correct website belongs among the top 5 predictions of the ML model as presented in Table 2. For this part, we have modified the Restored_model_test.py which will automaitcally save the raw predictions for different classes. Hence, Similar to google-chrome, the following commands need to run for reproducing the result:

cd Tor/
./Restored_model_val.py
After running this python script, the test accuracy for Tor Browser will be printed out. In addition, the raw predictions for different classes will be saved as Raw_Prediction_cnn.txt. Later, for finding out the Top 5 score as reported in Table 2, run the MATLAB code confidance_score.m.

For ML models other than CNN (For example: SVM), run ./svm.py. This will provide the accuracy as well as save the raw predictions as Raw_prediction_svm.txt. Update the confidance_score.m by replacing the Raw_Prediction_cnn.txt file with Raw_Prediction_svm.txt. Run the confidance_score.m to get the Top 5 score for the Tor Browser.

It is to be noted that, for Tor Browser scenario, we have carried out the experiment only for the default scaling governor of the device.

Universal ML Model for different CPU models:

In the previous experiment, we trained separate ML models for Intel, AMD, and ARM architectures to obtain the highest website fingerprinting accuracy. However, it is still unclear whether it is possible to replace the individual ML models with a universal ML model trained with the CPU frequency data from several micro-architectures. Thus, an attacker can use a combined ML model without requiring to know the exact targeted microarchitecture for website fingerprinting. For this purpose, initially, we combined the CPU frequency data collected with powersave governor from both Intel Tiger Lake and Intel Comet Lake architectures to train a universal CNN model and evaluated the performance of the universal model with the test data. The Intel_combined directory incorporates the relevant data and codes.

cd Universal_ML/Intel_combined/
./Restored_model_test.py

Later, we added the CPU frequency data set from the AMD Ryzen 5 architecture collected from ondemand governor and created a universal cross-architecture ML model. The Intel_amd_combined directory incorporates the relevant data and codes.

cd Universal_ML/Intel_amd_combined/
./Restored_model_test.py

All the results are reported in Table 5 of the paper.

Password Detection:

In the password detection scenario, we assume that a phone user enters her password to log into her account in a banking application. Our goal is not to outperform the existing works in the keystroke attack literature, but rather demonstrates DF-SCA attack has sufficient resolution and accuracy to perform a password detection attack. For the target, Bank of America (BoA) mobile application is chosen.

The CPU frequnecy traces for different profiles passwords mentioned in Table 9 of the paper are made available in mat_files_password. Based on the length of characters, individual frequnency traces of different passwords are collected, preprocessed, and saved as .mat file. In Artifact_dataset_model_Code_Keystroke, the splitted data among training and test are included with the proposed knn model.

cd ARM_Cortex_A-73/Keystroke/Artifact_dataset_model_Code_Keystroke/
Run knn_all.m in MATLAB

yuuko1337 / df-sca-dynamic-frequency-side-channel-attacks-are-practical Goto Github PK