1. Simple NKIPy Tutorial#
This tutorial uses a simple softmax NKIPy kernel to go through how NKIPy works.
We will cover:
Defining a NKIPy kernel
Run it as NumPy function
Trace and run in simulation mode
Compile it and run it on Trainium hardware
import numpy as np
from nkipy.core.trace import NKIPyKernel
from nkipy.core.compile import lower_to_nki
from nkipy.runtime.execute import simulate_traced_kernel, baremetal_run_traced_kernel
1.1. Defining A NKIPy Kernel#
A NKIPy looks like a NumPy kernel. It supports a subset of NumPy and Python syntax.
def softmax_kernel(x):
exp_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
sum_x = np.sum(exp_x, axis=-1, keepdims=True)
return exp_x / sum_x
1.2. Running a NKIPy Kernel as a NumPy function#
# NKIPy is NumPy-like, and in most cases, NumPy compatible
# So, we can run NKIPy kernel directly as NumPy
x = np.random.rand(2, 2).astype(np.float32)
print(f"Input is {x}")
out_numpy = softmax_kernel(x)
print(f"NumPy output is {out_numpy}")
Input is [[0.9495564 0.39231038]
[0.05852599 0.9262922 ]]
NumPy output is [[0.635815 0.36418492]
[0.2957193 0.7042807 ]]
1.3. Tracing a NKIPy Kernel#
# To run NKIPy kernels on Trainium, we need to trace as a NKIPyKernel with the `trace` wrapper
traced_kernel = NKIPyKernel.trace(softmax_kernel)
1.4. Running the Traced Kernel with Simulation#
out_nkipy = simulate_traced_kernel(traced_kernel, x)
print(f"Is the simulated output the same as NumPy? {np.allclose(out_nkipy, out_numpy)}")
Is the simulated output the same as NumPy? True
1.5. Running it On Trainium Hardware#
# NKIPy kernel can be compiled to binary (NEFF) and execute on real hardware!
# The baremetal wrapper is used to execute the compiled binary on Trainium hardware
# in baremetal mode (without framework support)
out_baremetal = baremetal_run_traced_kernel(traced_kernel, x)
print(f"Is the output the same as NumPy? {np.allclose(out_baremetal, out_numpy)}")
Is the output the same as NumPy? True