1. Simple NKIPy Tutorial#
This tutorial uses a simple softmax NKIPy kernel to go through how NKIPy works.
We will cover:
Defining a NKIPy kernel
Run it as NumPy function on CPU
Trace it to validate HLO generation
Compile it and run it on Trainium hardware
import numpy as np
from nkipy.core.trace import NKIPyKernel
from nkipy.runtime.execute import baremetal_run_traced_kernel
1.1. Defining A NKIPy Kernel#
A NKIPy looks like a NumPy kernel. It supports a subset of NumPy and Python syntax.
def softmax_kernel(x):
exp_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
sum_x = np.sum(exp_x, axis=-1, keepdims=True)
return exp_x / sum_x
1.2. Running a NKIPy Kernel as a Python function on CPU#
x = np.random.rand(2, 2).astype(np.float32)
print(f"Input is {x}")
# We can run NKIPy kernel directly as NumPy on CPU
out_numpy = softmax_kernel(x)
print(f"NumPy output is {out_numpy}")
Input is [[0.2657556 0.00210309]
[0.03143163 0.05532647]]
NumPy output is [[0.56553394 0.43446603]
[0.49402657 0.5059734 ]]
1.3. Tracing a NKIPy Kernel#
# To run NKIPy kernels on Trainium, we need to trace as a NKIPyKernel first
traced_kernel = NKIPyKernel.trace(softmax_kernel)
1.4. Running it On Trainium Hardware#
# NKIPy kernel is now compiled to binary (NEFF) and execute on real hardware!
# The baremetal wrapper is used to execute the compiled binary on Trainium hardware
# in baremetal mode (without framework support)
out_baremetal = baremetal_run_traced_kernel(traced_kernel, x)
print(f"Is the output the same as NumPy? {np.allclose(out_baremetal, out_numpy)}")
Is the output the same as NumPy? True