Allocated Fused LinearΒΆ
This file hosts the high-performance kernel that computes RMSNorm(hidden) @ wQKV.
This implementation uses
the direct allocation API to achieve better performance.
Allocated kernel that computes RMSNorm(hidden) @ wQKV. |
This file hosts the high-performance kernel that computes RMSNorm(hidden) @ wQKV.
This implementation uses
the direct allocation API to achieve better performance.
Allocated kernel that computes RMSNorm(hidden) @ wQKV. |