PyTorch Example*
MNIST Model Compilation Process*
This document outlines the overall process of exporting a PyTorch model, focusing on model construction, training, validation, inference script creation, ScriptModule static model export, and model compilation using the MNIST model as a base.
1. Model Construction, Training, and Validation*
Model Construction
Construct a custom MNIST model by inheriting from the nn.Module base class. The model architecture includes convolutional layers, pooling layers, activation layers, and fully connected layers.
import torch
from torch import nn
from torch.nn import functional as F
class MNISTModel(nn.Module):
def __init__(self):
super(MNISTModel, self).__init__()
self.conv1 = nn.Conv2d(1, 8, 1, 1)
self.conv2 = nn.Conv2d(8, 32, 1, 1)
self.maxpool1 = nn.MaxPool2d(2, 2)
self.maxpool2 = nn.MaxPool2d(2, 2)
self.dropout1 = nn.Dropout(0.25)
self.fc1 = nn.Linear(1568, 32)
self.fc2 = nn.Linear(32, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.maxpool1(x)
x = self.conv2(x)
x = F.relu(x)
x = self.maxpool2(x)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
Model Training
Load the MNIST training dataset using the torchvision library, perform normalization and standardization, and input it into the network for training.
Training parameters include:
-
Batch size: 64
-
Number of epochs: 3
-
Optimizer: SGD
-
Learning rate: 0.01
-
Momentum: 0.5
-
Loss function: Negative Log Likelihood Loss
from torch import optim
from torchvision import datasets, transforms
def train(model, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % 10 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
# 数据加载和预处理
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=64, shuffle=True)
model = MNISTModel()
optimizer = optim.SGD(model.parameters(), lr=0.005, momentum=0.5)
for epoch in range(3):
train(model, train_loader, optimizer, epoch)
torch.save(model.state_dict(), "./mnist.pth")
Model Validation
After training, load the MNIST validation dataset to validate the model. Output the average loss and accuracy.
def test(model, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item() # 将批次的损失相加
pred = output.argmax(dim=1, keepdim=True) # 获取概率最高的预测结果
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=1000, shuffle=True)
test_model = MNISTModel()
test_model.load_state_dict(torch.load("./mnist.pth"))
test(model, test_loader)
2. Exporting the Model in a Specific Format*
Inference Model Construction
Due to the limitations of exporting models as ScriptModule static files, when a model structure involves specific control flow patterns, users need to determine a fixed flow for inference.
To elaborate on the above description, specific control flow patterns refer to the presence of conditional statements such as "if" and "assert" in the forward method of nn.Module subclasses, where the conditions can only be determined during inference.
In such situations, users are required to specify a definite computational flow within the original model structure and eliminate these control flow operators.
In this example, the MNIST model does not include control flow structures, so there is no need for specific modifications. Only the dropout operator, which is not involved in inference calculations, is removed. The NPU compiler can handle this automatically, and users can choose whether to delete it, as it does not affect model compilation.
The NPU compiler currently does not support the softmax operator. Therefore, in the inference model, npu_softmax is used as a replacement.
For the construction method of npu_softmax, please refer to the Softmax section.
class MNISTModelInference(nn.Module):
def __init__(self):
super(MNISTModelInference, self).__init__()
self.conv1 = nn.Conv2d(1, 8, 1, 1)
self.conv2 = nn.Conv2d(8, 32, 1, 1)
self.maxpool1 = nn.MaxPool2d(2, 2)
self.maxpool2 = nn.MaxPool2d(2, 2)
self.fc1 = nn.Linear(1568, 32)
self.fc2 = nn.Linear(32, 10)
# for softmax
self.partitions = split_and_factorize(10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.maxpool1(x)
x = self.conv2(x)
x = F.relu(x)
x = self.maxpool1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.fc2(x)
x = npu_softmax(x, self.partitions)
return x
ScriptModule Static Model Export
-
Export Process:
-
Step 1: Create a PyTorch Module based on the inference model class and load the trained model weights.
-
Step 2: Convert the PyTorch Module to a TorchScript Module using tracing.
-
Step 3: Serialize and save the TorchScript Module to the "mnist_trace.pth" file.
-
inference_model = MNISTModelInference()
inference_model.load_state_dict(torch.load("./mnist.pth"))
input_tensor = torch.randn([1, 1, 28, 28])
traced_model = torch.jit.trace(inference_model, [input_tensor])
traced_model.save("./mnist_trace.pth")
3. Writing the Compilation Configuration File*
Before reading this section, please familiarize yourself with the PyTorch configuration options in the NPU Compiler usage document.
CORE_NAME
The gx8002 is fixedly configured as GRUS.
NPU_UNIT
The gx8002 is fixedly configured as NPU32.
FRAMEWORK
In this example, the configuration is set to PT as it is a PyTorch model.
MODEL_FILE
As indicated in ScriptModule Static Model Export, the model file required by the NPU compiler is "./mnist.pth."
OUTPUT_TYPE
The gx8002 is fixedly configured as c_code.
OUTPUT_FILE
In this example, the output file is configured with the name "mnist.h."
INPUT_NCX_TO_NXC
In this example, no data layout format conversion is applied to the input tensor.
INPUT_OPS
The shape of the input tensor during MNIST model inference is [1, 1, 28, 28].
FP16_OUT_OPS
In this example, since there are no state output tensors, this parameter does not need to be configured.
FUSE_BN, COMPRESS, CONV2D_COMPRESS
In this example, BN fusion is not enabled, while weight compression is enabled for fully connected and convolutional layers.
EXCLUDE_COMPRESS_OPS, WEIGHT_MIN_MAX, WEIGHT_CACHE_SIZE
These parameters are not used in the current scenario.
Specific configuration file is as follows:
CORENAME: GRUS
NPU_UNIT: NPU32
FRAMEWORK: PT
MODEL_FILE: mnist_trace.pth
OUTPUT_TYPE: c_code
OUTPUT_FILE: mnist.h
INPUT_NCX_TO_NXC: []
INPUT_OPS:
0: [1, 1, 28, 28]
FP16_OUT_OPS: []
FUSE_BN: false
COMPRESS: true
CONV2D_COMPRESS: true
4. Model Compilation*
Compile the model using the gxnpuc tool.
$ gxnpuc mnist_config.yaml
This generates the NPU file mnist.h
and prints out the required memory information for the model:
------------------------
Memory allocation info:
Mem0(ops): 0
Mem1(data): 28224
Mem2(instruction): 836
Mem3(in): 1568
Mem4(out): 40
Mem5(tmp content): 0
Mem6(weights): 56052
Total NPU Size (Mem0+Mem1+Mem2+Mem5+Mem6): 85112
Total Memory Size: 86720
------------------------
Compile OK.
Explanation of each memory region:
Memory Region | Description |
---|---|
Mem0(ops) | Not in use |
Mem1(data) | Intermediate data memory |
Mem2(instruction) | Instruction memory |
Mem3(in) | Input data memory |
Mem4(out) | Output data memory |
Mem5(tmp content) | SRAM weight memory |
Mem6(weights) | Weight memory |