TnesorFlow Usage Example1*
MNIST Model Compilation Process*
MNIST is an introductory computer vision dataset, where the input consists of 28x28-pixel handwritten digit images, and the output is the probability of the image corresponding to a digit from 0 to 9. Below, we illustrate the usage of the gxnpuc toolchain and API with the TensorFlow built-in mnist model (TensorFlow v1.15).
1. Generate NPU Files*
The MNIST model in this example is straightforward and can be represented by the formula: y = x * W + b
(during training, softmax is calculated, but since we only need to retrieve the index of the maximum value in the output, and softmax
is a monotonically increasing function, we can omit this function as it won't affect the results). Here, x
represents the input data, y
represents the output data, and W
and b
are the trained parameters. The training process involves continuously adjusting W
and b
based on the calculated y
and the expected y_
. On the NPU, we only need to use the trained W
and b
without the training process.
1.1 Generate CKPT and PB Files*
First, we need to generate CKPT and PB files. Additionally, to conveniently specify the input and output nodes of the model during NPU compilation, we can give names to the input and output nodes. See the highlighted parts in the main
function.
def main(_):
# Import data
mnist = input_data.read_data_sets(FLAGS.data_dir)
# Create the model
x = tf.placeholder(tf.float32, [None, 784], name="input_x") # Specify the input name as input_x for easy use in the compilation configuration script
w = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.matmul(x, w) + b
y = tf.identity(name="result") # Specify the output name as result for easy use in the compilation configuration script
# Define loss and optimizer
y_ = tf.placeholder(tf.int64, [None])
# The raw formulation of cross-entropy,
#
# tf.reduce_mean(-tf.reduce_sum(y_ * tf.math.log(tf.nn.softmax(y)),
# reduction_indices=[1]))
#
# can be numerically unstable.
#
# So here we use tf.compat.v1.losses.sparse_softmax_cross_entropy on the raw
# logit outputs of 'y', and then average across the batch.
cross_entropy = tf.losses.sparse_softmax_cross_entropy(labels=y_, logits=y)
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
config = tf.ConfigProto()
jit_level = 0
if FLAGS.xla:
# Turns on XLA JIT compilation.
jit_level = tf.OptimizerOptions.ON_1
config.graph_options.optimizer_options.global_jit_level = jit_level
run_metadata = tf.RunMetadata()
sess = tf.compat.v1.Session(config=config)
tf.global_variables_initializer().run(session=sess)
# Train
train_loops = 1000
for i in range(train_loops):
batch_xs, batch_ys = mnist.train.next_batch(100)
# Create a timeline for the last loop and export to json to view with
# chrome://tracing/.
if i == train_loops - 1:
sess.run(train_step,
feed_dict={x: batch_xs,
y_: batch_ys},
options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE),
run_metadata=run_metadata)
trace = timeline.Timeline(step_stats=run_metadata.step_stats)
with open('/tmp/timeline.ctf.json', 'w') as trace_file:
trace_file.write(trace.generate_chrome_trace_format())
else:
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), y_)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy,
feed_dict={x: mnist.test.images,
y_: mnist.test.labels}))
# Generate CKPT and PB files
saver = tf.train.Saver()
saver.save(sess, "./mnist.ckpt")
tf.train.write_graph(sess.graph_def, "./", "mnist.pb")
sess.close()
After running the program, the mnist.ckpt.*
and mnist.pb
files will be generated in the current directory.
Note
Remember to note the op names and their shapes for the model's input and output nodes, as they will be required in the gxnpuc compilation configuration YAML file.
1.2 Merge CKPT and PB Files into FROZEN_PB File*
Use the freeze_graph.py script to merge the mnist.ckpt.*
and mnist.pb
files into a single pb file.
Note
The freeze_graph.py
script may vary for different versions of TensorFlow.
Execute the command:
$ python freeze_graph.py --input_graph=mnist.pb --input_checkpoint=./mnist.ckpt --output_graph=mnist_with_ckpt.pb --output_node_names=result
mnist_with_ckpt.pb
file in the current directory.
Here, --input_graph
is followed by the input PB name, --input_checkpoint
is followed by the input CKPT name, --output_graph
is followed by the name of the merged FROZEN_PB file, and --output_node_names
is followed by the output node name. If there are multiple output nodes, separate them with commas.
After completion, you'll find the mnist_with_ckpt.pb
file in the current directory.
If the model is saved in saved_model
format, use the following command to generate the FROZEN_PB file:
$ python freeze_graph.py --input_saved_model_dir=./saved_model_dir --output_graph=mnist_with_ckpt.pb --output_node_names=result
1.3 Edit NPU Configuration File*
Edit the mnist_config.yaml
file, as explained in the comments.
CORENAME: GRUS # Chip model
MODEL_FILE: mnist_with_ckpt.pb # Input PB file
OUTPUT_FILE: mnist.h # Output NPU file name
NPU_UNIT: NPU32 # NPU device type
COMPRESS: true # Compress FC weights
CONV2D_COMPRESS: false # Do not compress Conv2D weights
OUTPUT_TYPE: c_code # NPU file type
INPUT_OPS:
input_x: [1, 784] # Input node name and data dimension, each input data has a size of 1x784, representing one image
OUTPUT_OPS: [result] # Output node name
FP16_OUT_OPS: [] # No OPs need to be output as float16
FUSE_BN: true # Fuse BN parameters into Conv2D (if applicable)
# WEIGHT_CACHE_SIZE: 0 # Weight does not need to be placed separately, no need to set
Note
The input_x and result here must match the op names of the model's input and output nodes, respectively.
1.4 Compilation*
Compile using the gxnpuc tool:
$ gxnpuc mnist_config.yaml
mnist.h
and print the memory information required by the model:
------------------------
Memory allocation info:
Mem0(ops): 0
Mem1(data): 40
Mem2(instruction): 140
Mem3(in): 1568
Mem4(out): 40
Mem5(tmp content): 0
Mem6(weights): 8990
Total NPU Size (Mem0+Mem1+Mem2+Mem5+Mem6): 9170
Total Memory Size: 10778
------------------------
Compile OK.
Explanation of each memory area:
Memory Area | Description |
---|---|
Mem0(ops) | Not used |
Mem1(data) | Intermediate data memory |
Mem2(instruction) | Instruction memory |
Mem3(in) | Input data memory |
Mem4(out) | Output data memory |
Mem5(tmp content) | SRAM weight memory |
Mem6(weights) | Weights memory |
2. Execute NPU File*
After generating the NPU file, you need to deploy the model to the GX8002 development board to run it. For detailed deployment instructions, please refer to the NPU Model Deployment Guide.