TnesorFlow Usage Example1*

MNIST Model Compilation Process*

MNIST is an introductory computer vision dataset, where the input consists of 28x28-pixel handwritten digit images, and the output is the probability of the image corresponding to a digit from 0 to 9. Below, we illustrate the usage of the gxnpuc toolchain and API with the TensorFlow built-in mnist model (TensorFlow v1.15).

1. Generate NPU Files*

The MNIST model in this example is straightforward and can be represented by the formula: y = x * W + b (during training, softmax is calculated, but since we only need to retrieve the index of the maximum value in the output, and softmax is a monotonically increasing function, we can omit this function as it won't affect the results). Here, x represents the input data, y represents the output data, and W and b are the trained parameters. The training process involves continuously adjusting W and b based on the calculated y and the expected y_. On the NPU, we only need to use the trained W and b without the training process.

1.1 Generate CKPT and PB Files*

First, we need to generate CKPT and PB files. Additionally, to conveniently specify the input and output nodes of the model during NPU compilation, we can give names to the input and output nodes. See the highlighted parts in the main function.

def main(_):
  # Import data
  mnist = input_data.read_data_sets(FLAGS.data_dir)

  # Create the model
  x = tf.placeholder(tf.float32, [None, 784], name="input_x") # Specify the input name as input_x for easy use in the compilation configuration script
  w = tf.Variable(tf.zeros([784, 10]))
  b = tf.Variable(tf.zeros([10]))
  y = tf.matmul(x, w) + b
  y = tf.identity(name="result") # Specify the output name as result for easy use in the compilation configuration script

  # Define loss and optimizer
  y_ = tf.placeholder(tf.int64, [None])

  # The raw formulation of cross-entropy,
  #
  #   tf.reduce_mean(-tf.reduce_sum(y_ * tf.math.log(tf.nn.softmax(y)),
  #                                 reduction_indices=[1]))
  #
  # can be numerically unstable.
  #
  # So here we use tf.compat.v1.losses.sparse_softmax_cross_entropy on the raw
  # logit outputs of 'y', and then average across the batch.
  cross_entropy = tf.losses.sparse_softmax_cross_entropy(labels=y_, logits=y)
  train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

  config = tf.ConfigProto()
  jit_level = 0
  if FLAGS.xla:
    # Turns on XLA JIT compilation.
    jit_level = tf.OptimizerOptions.ON_1

  config.graph_options.optimizer_options.global_jit_level = jit_level
  run_metadata = tf.RunMetadata()
  sess = tf.compat.v1.Session(config=config)
  tf.global_variables_initializer().run(session=sess)
  # Train
  train_loops = 1000
  for i in range(train_loops):
    batch_xs, batch_ys = mnist.train.next_batch(100)

    # Create a timeline for the last loop and export to json to view with
    # chrome://tracing/.
    if i == train_loops - 1:
      sess.run(train_step,
               feed_dict={x: batch_xs,
                          y_: batch_ys},
               options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE),
               run_metadata=run_metadata)
      trace = timeline.Timeline(step_stats=run_metadata.step_stats)
      with open('/tmp/timeline.ctf.json', 'w') as trace_file:
        trace_file.write(trace.generate_chrome_trace_format())
    else:
      sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

  # Test trained model
  correct_prediction = tf.equal(tf.argmax(y, 1), y_)
  accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
  print(sess.run(accuracy,
                 feed_dict={x: mnist.test.images,
                            y_: mnist.test.labels}))

  # Generate CKPT and PB files
  saver = tf.train.Saver()
  saver.save(sess, "./mnist.ckpt")
  tf.train.write_graph(sess.graph_def, "./", "mnist.pb")

  sess.close()

After running the program, the mnist.ckpt.* and mnist.pb files will be generated in the current directory.

Note

Remember to note the op names and their shapes for the model's input and output nodes, as they will be required in the gxnpuc compilation configuration YAML file.

1.2 Merge CKPT and PB Files into FROZEN_PB File*

Use the freeze_graph.py script to merge the mnist.ckpt.* and mnist.pb files into a single pb file.

Note

The freeze_graph.py script may vary for different versions of TensorFlow.

Execute the command:

$ python freeze_graph.py --input_graph=mnist.pb --input_checkpoint=./mnist.ckpt --output_graph=mnist_with_ckpt.pb --output_node_names=result

This will generate the mnist_with_ckpt.pb file in the current directory. Here, --input_graph is followed by the input PB name, --input_checkpoint is followed by the input CKPT name, --output_graph is followed by the name of the merged FROZEN_PB file, and --output_node_names is followed by the output node name. If there are multiple output nodes, separate them with commas. After completion, you'll find the mnist_with_ckpt.pb file in the current directory.

If the model is saved in saved_model format, use the following command to generate the FROZEN_PB file:

$ python freeze_graph.py --input_saved_model_dir=./saved_model_dir --output_graph=mnist_with_ckpt.pb --output_node_names=result

1.3 Edit NPU Configuration File*

Edit the mnist_config.yaml file, as explained in the comments.

mnist_config.yaml

CORENAME: GRUS # Chip model
MODEL_FILE: mnist_with_ckpt.pb # Input PB file
OUTPUT_FILE: mnist.h # Output NPU file name
NPU_UNIT: NPU32 # NPU device type
COMPRESS: true # Compress FC weights
CONV2D_COMPRESS: false # Do not compress Conv2D weights
OUTPUT_TYPE: c_code # NPU file type

INPUT_OPS:
    input_x: [1, 784] # Input node name and data dimension, each input data has a size of 1x784, representing one image

OUTPUT_OPS: [result] # Output node name
FP16_OUT_OPS: [] # No OPs need to be output as float16

FUSE_BN: true # Fuse BN parameters into Conv2D (if applicable)
# WEIGHT_CACHE_SIZE: 0 # Weight does not need to be placed separately, no need to set

Note

The input_x and result here must match the op names of the model's input and output nodes, respectively.

1.4 Compilation*

Compile using the gxnpuc tool:

$ gxnpuc mnist_config.yaml

This will generate the NPU file mnist.h and print the memory information required by the model:

------------------------
Memory allocation info:
Mem0(ops): 0
Mem1(data): 40
Mem2(instruction): 140
Mem3(in): 1568
Mem4(out): 40
Mem5(tmp content): 0
Mem6(weights): 8990
Total NPU Size (Mem0+Mem1+Mem2+Mem5+Mem6): 9170
Total Memory Size: 10778
------------------------
Compile OK.

Explanation of each memory area:

Memory Area	Description
Mem0(ops)	Not used
Mem1(data)	Intermediate data memory
Mem2(instruction)	Instruction memory
Mem3(in)	Input data memory
Mem4(out)	Output data memory
Mem5(tmp content)	SRAM weight memory
Mem6(weights)	Weights memory

2. Execute NPU File*

After generating the NPU file, you need to deploy the model to the GX8002 development board to run it. For detailed deployment instructions, please refer to the NPU Model Deployment Guide.