I have a 9920x at up to 4.8GHz on all cores and 32GB of quad channel 3600MHz CL16 RAM and memcpy copies at a rate of just 11GB/s which is clearly limited by the CPU cores, I could write a multithreaded memcpy routine for large copies but I would like to know if there is a more elegant solution maybe using DMA to fully utilize the memory bandwidth without writing messy multithreaded memcopy code?
↧