cuda - How many cores/threads does cublas_sgemm uses? -
i new gpu , parallel programming . want execute function 'a' parallel-ly on different data x1,x2,x3.... 'a' calls function 'cublas_sgemm' .
then whether or not have care implementation of cublas_sgemm ?
you don't need concerned implementation of cublassgemm. use of device possible problem size. reasonably large matrices, utilize whole device. function utilizes whole device, you're not observe improvement in performance trying add additional parallelism (vs. issuing gemm functions in sequence, appropriate use of overlap of copy , compute).
for small matrices, there batched gemm function should better trying manage parallelism yourself.
Comments
Post a Comment