Spamming GPU work in parallel with a light load makes it go faster

Most AMD and Intel GPUs will erratically ramp their clock speed if they are not getting fed work fast enough. When more work is received, there is a delay imposed by the firmware before the GPU reaches its full clock speed again, which causes the overall performance to significantly degrade. You can defeat this power management issue by running a constant dummy load on the GPU that regularly yields, so that it always allows new work to take priority, but otherwise keeps the GPU busy.

If you are running into the problem of being unable to feed the GPU work fast enough, a Vulkan compute shader like this should do the job:

#version 460

layout (local_size_x = 16, local_size_y = 8, local_size_z = 1) in;
layout (binding = 0) buffer Work {
    uint data[];
};

uint globalIndex() {
    return gl_GlobalInvocationID.x + gl_GlobalInvocationID.y * gl_WorkGroupSize.y * gl_NumWorkGroups.y;
}

void main() {
    atomicAdd(data[globalIndex() % data.length()], 1);
}

Then use a separate thread to dispatch compute work in an infinite loop using a (64, 64, 1) dispatch over a 2MiB device local buffer in a separate logical device, waiting for it to complete after every submission.

Note 1: If you are using OpenGL, the core idea should be similar, but you will need to do this in a separate process, as only one context can legally exist per process.

Note 2: Clock ramping is rarely a problem on Nvidia GPUs.