PyTorch Example*

MNIST Model Compilation Process*

This document outlines the overall process of exporting a PyTorch model, focusing on model construction, training, validation, inference script creation, ScriptModule static model export, and model compilation using the MNIST model as a base.

1. Model Construction, Training, and Validation*

Model Construction

Construct a custom MNIST model by inheriting from the nn.Module base class. The model architecture includes convolutional layers, pooling layers, activation layers, and fully connected layers.

import torch
from torch import nn
from torch.nn import functional as F

class MNISTModel(nn.Module):
    def __init__(self):
        super(MNISTModel, self).__init__()
        self.conv1 = nn.Conv2d(1, 8, 1, 1)
        self.conv2 = nn.Conv2d(8, 32, 1, 1)
        self.maxpool1 = nn.MaxPool2d(2, 2)
        self.maxpool2 = nn.MaxPool2d(2, 2)
        self.dropout1 = nn.Dropout(0.25)
        self.fc1 = nn.Linear(1568, 32)
        self.fc2 = nn.Linear(32, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.maxpool1(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = self.maxpool2(x)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output

Model Training

Load the MNIST training dataset using the torchvision library, perform normalization and standardization, and input it into the network for training.

Training parameters include:

Batch size: 64
Number of epochs: 3
Optimizer: SGD
Learning rate: 0.01
Momentum: 0.5
Loss function: Negative Log Likelihood Loss

from torch import optim
from torchvision import datasets, transforms

def train(model, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

# 数据加载和预处理
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=64, shuffle=True)

model = MNISTModel()
optimizer = optim.SGD(model.parameters(), lr=0.005, momentum=0.5)

for epoch in range(3):
    train(model, train_loader, optimizer, epoch)

torch.save(model.state_dict(), "./mnist.pth")

Model Validation

After training, load the MNIST validation dataset to validate the model. Output the average loss and accuracy.

def test(model, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # 将批次的损失相加
            pred = output.argmax(dim=1, keepdim=True)  # 获取概率最高的预测结果
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=1000, shuffle=True)

test_model = MNISTModel()
test_model.load_state_dict(torch.load("./mnist.pth"))

test(model, test_loader)

2. Exporting the Model in a Specific Format*

Inference Model Construction

Due to the limitations of exporting models as ScriptModule static files, when a model structure involves specific control flow patterns, users need to determine a fixed flow for inference.

To elaborate on the above description, specific control flow patterns refer to the presence of conditional statements such as "if" and "assert" in the forward method of nn.Module subclasses, where the conditions can only be determined during inference.

In such situations, users are required to specify a definite computational flow within the original model structure and eliminate these control flow operators.

In this example, the MNIST model does not include control flow structures, so there is no need for specific modifications. Only the dropout operator, which is not involved in inference calculations, is removed. The NPU compiler can handle this automatically, and users can choose whether to delete it, as it does not affect model compilation.

The NPU compiler currently does not support the softmax operator. Therefore, in the inference model, npu_softmax is used as a replacement.

For the construction method of npu_softmax, please refer to the Softmax section.

class MNISTModelInference(nn.Module):
    def __init__(self):
        super(MNISTModelInference, self).__init__()
        self.conv1 = nn.Conv2d(1, 8, 1, 1)
        self.conv2 = nn.Conv2d(8, 32, 1, 1)
        self.maxpool1 = nn.MaxPool2d(2, 2)
        self.maxpool2 = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(1568, 32)
        self.fc2 = nn.Linear(32, 10)

        # for softmax
        self.partitions = split_and_factorize(10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.maxpool1(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = self.maxpool1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        x = npu_softmax(x, self.partitions)
        return x

ScriptModule Static Model Export

Export Process:
- Step 1: Create a PyTorch Module based on the inference model class and load the trained model weights.
- Step 2: Convert the PyTorch Module to a TorchScript Module using tracing.
- Step 3: Serialize and save the TorchScript Module to the "mnist_trace.pth" file.

inference_model = MNISTModelInference()
inference_model.load_state_dict(torch.load("./mnist.pth"))

input_tensor = torch.randn([1, 1, 28, 28])

traced_model = torch.jit.trace(inference_model, [input_tensor])
traced_model.save("./mnist_trace.pth")

3. Writing the Compilation Configuration File*

Before reading this section, please familiarize yourself with the PyTorch configuration options in the NPU Compiler usage document.

CORE_NAME

The gx8002 is fixedly configured as GRUS.

NPU_UNIT

The gx8002 is fixedly configured as NPU32.

FRAMEWORK

In this example, the configuration is set to PT as it is a PyTorch model.

MODEL_FILE

As indicated in ScriptModule Static Model Export, the model file required by the NPU compiler is "./mnist.pth."

OUTPUT_TYPE

The gx8002 is fixedly configured as c_code.

OUTPUT_FILE

In this example, the output file is configured with the name "mnist.h."

INPUT_NCX_TO_NXC

In this example, no data layout format conversion is applied to the input tensor.

INPUT_OPS

The shape of the input tensor during MNIST model inference is [1, 1, 28, 28].

FP16_OUT_OPS

In this example, since there are no state output tensors, this parameter does not need to be configured.

FUSE_BN, COMPRESS, CONV2D_COMPRESS

In this example, BN fusion is not enabled, while weight compression is enabled for fully connected and convolutional layers.

EXCLUDE_COMPRESS_OPS, WEIGHT_MIN_MAX, WEIGHT_CACHE_SIZE

These parameters are not used in the current scenario.

Specific configuration file is as follows:

mnist_config.yaml

CORENAME: GRUS
NPU_UNIT: NPU32
FRAMEWORK: PT
MODEL_FILE: mnist_trace.pth
OUTPUT_TYPE: c_code
OUTPUT_FILE: mnist.h

INPUT_NCX_TO_NXC: []
INPUT_OPS:
    0: [1, 1, 28, 28]

FP16_OUT_OPS: []
FUSE_BN: false
COMPRESS: true
CONV2D_COMPRESS: true

4. Model Compilation*

Compile the model using the gxnpuc tool.

$ gxnpuc mnist_config.yaml

This generates the NPU file mnist.h and prints out the required memory information for the model:

------------------------
Memory allocation info:
Mem0(ops): 0
Mem1(data): 28224
Mem2(instruction): 836
Mem3(in): 1568
Mem4(out): 40
Mem5(tmp content): 0
Mem6(weights): 56052
Total NPU Size (Mem0+Mem1+Mem2+Mem5+Mem6): 85112
Total Memory Size: 86720
------------------------
Compile OK.

Explanation of each memory region:

Memory Region	Description
Mem0(ops)	Not in use
Mem1(data)	Intermediate data memory
Mem2(instruction)	Instruction memory
Mem3(in)	Input data memory
Mem4(out)	Output data memory
Mem5(tmp content)	SRAM weight memory
Mem6(weights)	Weight memory