Allocated Fused LinearΒΆ
This file hosts the high-performance kernel that computes RMSNorm(hidden) @ wQKV
.
This implementation uses
the direct allocation API
to achieve better performance.
Allocated kernel that computes RMSNorm(hidden) @ wQKV. |
This file hosts the high-performance kernel that computes RMSNorm(hidden) @ wQKV
.
This implementation uses
the direct allocation API
to achieve better performance.
Allocated kernel that computes RMSNorm(hidden) @ wQKV. |