I wrote the following code to simulate and simplify what is happening in my application.
In this simplification, I have the if and else branch that are executing the same code, but writing in a different portion of memory. So I thought to use an array of two entries, and depending the statement condition the first or the second entry is updated.
This solution brings the expected speed up.
But when during the execution we have a random access for each iteration, the improvement almost disappears. To show this bizarre behavior, I have used the template to activate or deactivate the use of if-statement and random access, i.e..
useif: true when the if-statement is used, false when the memory access is used.
rand_access: true when we have a random access for each iteration, false otherwise.
#include <chrono>
#include <iostream>
#include <vector>
#include <numeric>
#include <algorithm>
#define N 1000000000
using namespace std;
using namespace std::chrono;
template <bool useif, bool rand_access>
void exec(vector<int>& V, vector<bool>& B) {
auto start = high_resolution_clock::now();
int sum[2], sum1 = 0, sum2 = 0;
sum[0] = 0; sum[1] = 0;
for ( int i = 0; i < N; i++ ) {
const int index = (rand_access) ? V[i] : i;
if ( useif ){
if ( B[index] ) sum2 += V[i];
else sum1 += V[i];
} else
sum[B[index]] += V[i];
}
auto t = std::chrono::duration_cast<milliseconds>(high_resolution_clock::now() - start);
std::cout << "Time useif="<<useif<<", rand_access="<<rand_access<<" : " << t.count() << " ms" << std::endl;
std::cout << (sum1+sum2+sum[0]+sum[1]) << std::endl;
}
int main() {
vector<int> V(N);
vector<bool> B(N, false);
iota( V.begin(), V.end(), 0 );
random_shuffle( V.begin(), V.end() );
fill( B.begin(), B.begin() + B.size()/2, true);
random_shuffle( B.begin(), B.end() );
exec<false, false>(V, B);
exec<false, true>(V, B);
exec<true, false>(V, B);
exec<true, true>(V, B);
return 0;
}
On my machine, I obtain the following results:
Time useif=0, rand_access=0 : 1518 ms
Time useif=0, rand_access=1 : 10791 ms
Time useif=1, rand_access=0 : 4384 ms
Time useif=1, rand_access=1 : 12214 ms
So, there is a speed up of 2.8 substituting the if-statement with a memory access, when there is NOT a random access involved, otherwise the performance are really close (1.1 speed up).
I don't understand why this is happening and how can I deal with it, i.e. how can I optimize the if-statement knowing that the if and else branch are executing the same code?
Aucun commentaire:
Enregistrer un commentaire