1. Simple NKIPy Tutorial#

This tutorial uses a simple softmax NKIPy kernel to go through how NKIPy works.

We will cover:

  • Defining a NKIPy kernel

  • Run it as NumPy function

  • Trace and run in simulation mode

  • Compile it and run it on Trainium hardware

import numpy as np

from nkipy.core.trace import NKIPyKernel
from nkipy.core.compile import lower_to_nki
from nkipy.runtime.execute import simulate_traced_kernel, baremetal_run_traced_kernel

1.1. Defining A NKIPy Kernel#

A NKIPy looks like a NumPy kernel. It supports a subset of NumPy and Python syntax.

def softmax_kernel(x):
    exp_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
    sum_x = np.sum(exp_x, axis=-1, keepdims=True)
    return exp_x / sum_x

1.2. Running a NKIPy Kernel as a NumPy function#

# NKIPy is NumPy-like, and in most cases, NumPy compatible
# So, we can run NKIPy kernel directly as NumPy
x = np.random.rand(2, 2).astype(np.float32)
print(f"Input is {x}")

out_numpy = softmax_kernel(x)
print(f"NumPy output is {out_numpy}")
Input is [[0.9495564  0.39231038]
 [0.05852599 0.9262922 ]]
NumPy output is [[0.635815   0.36418492]
 [0.2957193  0.7042807 ]]

1.3. Tracing a NKIPy Kernel#

# To run NKIPy kernels on Trainium, we need to trace as a NKIPyKernel with the `trace` wrapper
traced_kernel = NKIPyKernel.trace(softmax_kernel)

1.4. Running the Traced Kernel with Simulation#

out_nkipy = simulate_traced_kernel(traced_kernel, x)
print(f"Is the simulated output the same as NumPy? {np.allclose(out_nkipy, out_numpy)}")
Is the simulated output the same as NumPy? True

1.5. Running it On Trainium Hardware#

# NKIPy kernel can be compiled to binary (NEFF) and execute on real hardware!
# The baremetal wrapper is used to execute the compiled binary on Trainium hardware
# in baremetal mode (without framework support)
out_baremetal = baremetal_run_traced_kernel(traced_kernel, x)
print(f"Is the output the same as NumPy? {np.allclose(out_baremetal, out_numpy)}")
Is the output the same as NumPy? True