In this project directory you will find a c++ application that displays CPU-based Concurrency and Parallelism within a Blockchain Simulator that uses the SHA-256 algorithm for hashing.
The aim of this coursework dealt with improving the performance of a blockchain simulator via parallel techniques such as OpenMP, algorithmic skeletons or my chosen implementation multi-threading.
Blockchain is a current technology that is trying to solve the problem with sending money digitally. Currently, transactions can take days to process along with a fee, this is where Blockchain comes in. Blockchain allows for transactions to be sent and received instantly with transactions being able to be tracked by anyone via a "public ledger" to look for false transactions.
The steps of how a Blockchain Simulator would work are as followed:
- The Difficulty of the Blockchain is established.
- An Initial Block (Genesis Block) is added to the chain and its content is hashed.
- When another Block is added to the chain it points back to the previous Block's hash in-order to check for any unauthorised changes, but before that is must be "signed".
- In-order for a Block to be signed (hash starts with the correct number of 0's, established via the Difficulty) a new value must be created ("Nonce").
- The Nonce allows for the hash of the Block to be changed without the data being affected, this random value is incremented and the Block re-hashed until it is regarded as signed.
- Once the Block has been signed, it can successfully be added to the chain.
For each iteration of the mining process the function mine_block is called, this deals with dividing the mining process among multiple threads, timing said process and retrieving the correct hash value from the mine_hash using Promises and Futures:
void block::mine_block(uint32_t difficulty) noexcept
{
// Creating a Promise/Future to get the correct Hash value from the mine_hash function.
promise<string> p_hash;
future<string> f_hash = p_hash.get_future();
// Creation of Vector for threads
vector<thread> threads;
// Creating the Atomic Bool and setting it to False
auto found = make_shared<atomic_bool>(false);
// Starts timer
auto start = system_clock::now();
// For loop to start n threads
for (size_t i = 0; i < num_threads; ++i)
{
threads.push_back(thread(&block::mine_hash, this, difficulty, 0, &p_hash, found));
}
// Join all the threads
for (auto &t : threads)
{
t.join();
}
// Ends timer
auto end = system_clock::now();
// Works out duration
duration<double> diff = end - start;
// Gets the future value for the hash
_hash = f_hash.get();
cout << "Block mined: " << _hash << " in " << diff.count() << " seconds " << "using " << num_threads << " Thread/s " << endl;
}
The mine_hash function is built around a for loop that compares the current calculated hash to a set variable that determines how many zeros (established by the difficulty) the hash should start with:
void block::mine_hash(uint32_t difficulty, uint64_t nonce, std::promise<std::string> * p_hash, std::shared_ptr<std::atomic_bool> found)
{
// Used for comparing hash
string str(difficulty, '0');
// Used so each thread can calculate their own hash (independantly)
string t_hash;
for (int i = 0; !*found; i += num_threads)
{
// If the current hash is correct
if (t_hash.substr(0, difficulty) == str)
{
// Set Found to True and Pass the Value (hash)
(*found) = true;
(*p_hash).set_value(t_hash);
}
else
{
// Increment Nonce and re-calc hash.
++_nonce;
t_hash = calculate_hash(_nonce);
}
}
}
An Atomic Boolean (found) is used to halt threads when the correct hash has been found, if the current hash in rotation is not current the nonce value will be incremented, and the hash will be recalculated. Each thread increments the nonce and re-calculates the hash independently, this allows for not just a faster computation time but also helps to make sure that each thread is not doing work that has already been done by another thread (stepped by the number of threads the given machine has).
The increase in performance after the implementation of parallel techniques is clear, both sets of tests were running on the same data and difficulty (6).
Method | Genesis Block Time (s) | Ten Block Time (s) |
---|---|---|
Parallel | 2.25 | 230.33 |
Sequential | 7.53 | 704.37 |
When we reach a high thread count, we can see a small flat line in the performance, however it is clear that there is a relationship between the number of threads and the speed-up the application gains in comparison to the base readings (sequential):
Thread Count | Genesis Block Time (s) | Ten Block Time (s) |
---|---|---|
8 | 2.25 | 230.33 |
7 | 2.29 | 237.60 |
6 | 2.30 | 245.01 |
5 | 2.56 | 265.88 |
4 | 2.82 | 293.81 |
3 | 3.16 | 345.02 |
2 | 3.78 | 456.61 |
1 | 7.53 | 704.37 |
The flat line in performance may be due to the application reaching the potential cap of speed-up increase (3.06).
During my investigation it became apparent that that the given blockchain algorithm scaled well with parallel techniques, the progression in speed-up scaled nicely with maximum performance providing over three times more speed in comparison to the base application.
- Optimisation of the SHA-256 algorithm implementation.
- OpenMP Approach.