1. Simple NKIPy Tutorial#

This tutorial uses a simple softmax NKIPy kernel to go through how NKIPy works.

We will cover:

  • Defining a NKIPy kernel

  • Run it as NumPy function on CPU

  • Trace it to validate HLO generation

  • Compile it and run it on Trainium hardware

import numpy as np
from nkipy.core.trace import NKIPyKernel
from nkipy.runtime.execute import baremetal_run_traced_kernel

1.1. Defining A NKIPy Kernel#

A NKIPy looks like a NumPy kernel. It supports a subset of NumPy and Python syntax.

def softmax_kernel(x):
    exp_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
    sum_x = np.sum(exp_x, axis=-1, keepdims=True)
    return exp_x / sum_x

1.2. Running a NKIPy Kernel as a Python function on CPU#

x = np.random.rand(2, 2).astype(np.float32)
print(f"Input is {x}")

# We can run NKIPy kernel directly as NumPy on CPU
out_numpy = softmax_kernel(x)
print(f"NumPy output is {out_numpy}")
Input is [[0.2657556  0.00210309]
 [0.03143163 0.05532647]]
NumPy output is [[0.56553394 0.43446603]
 [0.49402657 0.5059734 ]]

1.3. Tracing a NKIPy Kernel#

# To run NKIPy kernels on Trainium, we need to trace as a NKIPyKernel first
traced_kernel = NKIPyKernel.trace(softmax_kernel)

1.4. Running it On Trainium Hardware#

# NKIPy kernel is now compiled to binary (NEFF) and execute on real hardware!
# The baremetal wrapper is used to execute the compiled binary on Trainium hardware
# in baremetal mode (without framework support)
out_baremetal = baremetal_run_traced_kernel(traced_kernel, x)
print(f"Is the output the same as NumPy? {np.allclose(out_baremetal, out_numpy)}")
Is the output the same as NumPy? True