TnesorFlow Usage Example*
Here is an example of TensorFlow. Download the source code through the following link
1. Generate NPU Files*
1.1 Write the Inference Model*
In general, the inference model differs slightly from the training model. In the inference model, operations only needed during training, such as Dropout
, are removed, and the CKPT file generated during training is loaded to produce the PB file of the inference model.
Refer to the code snippet below for details:
self.inputs=tf.placeholder(tf.float32,[1,1,256,1],name='Feats') # Specify the input feats name
self.state_in = tf.placeholder(tf.float32,[1, 64],name="State_c0") # Specify the input state0 name
...
self.net_output , self.state = eval_network(self.inputs,self.state_in)
self.predProb = tf.reshape(self.net_output,[1],name='probOut')
self.state_out = tf.identity(self.state, name="State_c0_out") # Specify the output state0 name
...
with tf.Session(config=config) as sess:
sess.run(tf.global_variables_initializer())
saver=tf.train.Saver(tf.global_variables(),max_to_keep=100)
ckpt = './ckpt_dir/tfMask-8000'
saver.restore(sess,ckpt) # Load the CKPT file generated during training
...
with tf.gfile.FastGFile('./model.pb', mode='wb') as f:
f.write(constant_graph.SerializeToString()) # Generate the PB file of the inference model
1.2 Generate PB Files*
Refer to Generating PB Files and Merging CKPT and PB Files into FROZEN_PB File
1.3 Edit NPU Configuration File*
Edit the config.yaml
file, as explained in the comments.
CORENAME: LEO
PB_FILE: ./model.pb
OUTPUT_FILE: model.c
SECURE: false # true/false
NPU_UNIT: NPU32 # NPU16/NPU32/NPU64
COMPRESS: true # true/false
COMPRESS_QUANT_BITS: 8
COMPRESS_TYPE: LINEAR
OUTPUT_TYPE: c_code # c_code/raw
#DEBUG_INFO_ENABLE: false # true/false
CONV2D_COMPRESS: true # true/false
INPUT_OPS:
Feats: [1, 1, 256, 1]
State_c0: [1, 64]
OUTPUT_OPS: [ model/State_c0_out,model/probOut]
FP16_OUT_OPS: [model/State_c0_out,]
#FUSE_BN: true
DUMP_OPS_TIME: false
Note
The Feats,State_c0,State_c0_out,probOut here must match the op names of the model's input and output nodes, respectively.
Note
In the OUTPUT_OPS
configuration, the state output nodes need to be placed in front of the actual output nodes of the model.
1.4 Compilation*
Compile using the gxnpuc tool:
$ gxnpuc config.yaml
model.c
and print the memory information required by the model:
------------------------
Memory allocation info:
Mem0(ops): 6148
Mem1(data): 16312
Mem2(instruction): 181536
Mem3(in): 640
Mem4(out): 130
Mem5(tmp content): 9216
Total Memory Size: 213982
------------------------
Compile OK.