Allocated Fused LinearΒΆ

This file hosts the high-performance kernel that computes RMSNorm(hidden) @ wQKV. This implementation uses the direct allocation API to achieve better performance.

allocated_fused_rms_norm_qkv

Allocated kernel that computes RMSNorm(hidden) @ wQKV.