Allocated AttentionΒΆ
This file hosts the high-performance reference implementation for
the attention blocks that are used
in Stable Diffusion models.
This implementation uses
the direct allocation API
to achieve better performance.
Allocated fused self attention kernel for small head size Stable Diffusion workload. |