NKI Samples¶
nki_samples.reference¶
All kernels located in this folder have numeric accuracy tests and performance benchmarks defined in the test directory. We also demonstrate using these kernels end-to-end in our integration tests.
You are welcome to customize them to fit your unique workloads, and contributing to the repository by opening a PR.
Note that these kernels are already being deployed as part of the Neuron stack. With flash attention as an example,
compiling Llama models with transformers-neuronx
will automatically invoke the flash_fwd kernel listed here. Therefore, replacing the framework operators with these
NKI kernels likely won’t result in extra performance benefit.
Please see the README page of the GitHub Repository nki-samples for more details.
For NKI documentation, please refer to the main Neuron SDK documentation page.
Relationship to neuronxcc.nki.kernels¶
The kernels under reference folder is also available in the neuronxcc.nki.kernels namespace. The
kernels in the neuronxcc is synced with this repository on every Neuron SDK release.
nki_samples.tutorial¶
Please refer to this page for the tutorials. The code associated with the tutorial can be found at nki-samples/src/tutorials