The code performs a number (ITERS) of iterations of the Bailey s 6-step FFT algorithm (following the ideas in the CMU Task parallel suite). 1.- Generates an input signal vector (dgen) with size n=n1xn2 stored in row major order In this code the size of the input signal is NN=NxN (n=NN, n1=n2=N) 2.- Transpose (tpose) A to have it stored in column major order 3.- Perform independent FFTs on the rows (cffts) 4.- Scale each element of the resulting array by a factor of w[n]**(p*q) 5.- Transpose (tpose) to prepair it for the next step 6.- Perform independent FFTs on the rows (cffts) 7.- Transpose the resulting matrix The code requires nested Parallelism.