TnesorFlow Usage Example2*

FSMN Structure Model Compilation Process*

FSMN is a commonly used structure in recent years for speech recognition models. It contains a memory module in its hidden layer. The following example demonstrates how gxnpuc supports this type of model structure with a memory module.

1. Generate NPU Files*

1.1 Write the Inference Model*

In general, the inference model differs slightly from the training model. In the inference model, operations only needed during training, such as Dropout, are removed, and the CKPT file generated during training is loaded to produce the PB file of the inference model.

For the FSMN model, since it contains a memory module and the NPU itself does not have a state, we need to consider the model's memory as the input state and output state of the model. When running the NPU, the current output state becomes the next input state.

Refer to the code snippet below for details:

    inputs = tf.placeholder(tf.float32, [batch_size, frame_len, feat_dim*win_len], "Feats") # Specify the input feats name
    state1_in = tf.placeholder(tf.float32, [batch_size, 3, linear_size[0], name="State_c0") # Specify the input state0 name
    state2_in = tf.placeholder(tf.float32, [batch_size, 4, linear_size[1], name="State_c1") # Specify the input state1 name
    state3_in = tf.placeholder(tf.float32, [batch_size, 5, linear_size[2], name="State_c2") # Specify the input state2 name
    states = (state1_in, state2_in, state3_in)

    ...

    cnn_outputs = cnn_layer(inputs, seq_len, cnn_info, tf.nn.relu, fusedbn)
    outputs, states = fsmn_layer(cnn_outputs, memory_size, linear_size, hidden_size, states, tf.nn.relu, keep_prob)

    ...

    state1_out = tf.identity(states[0], name="State_c0_out") # Specify the output state0 name
    state2_out = tf.identity(states[1], name="State_c1_out") # Specify the output state1 name
    state3_out = tf.identity(states[2], name="State_c2_out") # Specify the output state2 name
    phone_prob = tf.identity(outputs, name="phone_prob") # Specify the final output name

    ...

    with tf.compat.v1.Session() as sess:
        tf.global_variables_initializer().run(session=sess)
        saver = tf.train.Saver()
        saver.restore(sess, "model.ckpt") # Load the CKPT file generated during training

        ...

        tf.train.write_graph(sess.graph_def, "./", "model.pb") # Generate the PB file of the inference model

1.2 Generate CKPT and PB Files, and Merge CKPT and PB Files into FROZEN_PB File*

Refer to Generating CKPT and PB Files and Merging CKPT and PB Files into FROZEN_PB File

1.3 Edit NPU Configuration File*

Edit the config.yaml file, as explained in the comments.

config.yaml

CORENAME: GRUS # Chip model
MODEL_FILE: model_with_ckpt.pb # Input PB file
OUTPUT_FILE: model.h # Output NPU file name
NPU_UNIT: NPU32 # NPU device type
COMPRESS: true # Model Fully Connected Weight Compression
CONV2D_COMPRESS: false # Model Convolutional Weight Uncompressed
OUTPUT_TYPE: c_code # NPU file type
INPUT_OPS:
    Feats: [1, 784]
    State_c0: [1, 3, 64]
    State_c1: [1, 4, 64]
    State_c2: [1, 5, 64]
OUTPUT_OPS: [State_c0_out,State_c1_out,State_c2_out,phone_prob] # Output node names, put state nodes in front
FP16_OUT_OPS: [State_c0_out,State_c1_out,State_c2_out] # Directly assign current output state to the next input state without converting to fp32
FUSE_BN: true # Fuse BN parameters into Conv2D (if applicable)
# WEIGHT_CACHE_SIZE: 0 # Weight does not need to be placed separately, no need to set

Note

The Feats, State_c0, State_c1, State_c2, State_c0_out, State_c1_out, State_c2_out, phone_prob here must match the op names of the model's input and output nodes, respectively.

Note

In the OUTPUT_OPS configuration, the state output nodes need to be placed in front of the actual output nodes of the model.

1.4 Compilation*

Compile using the gxnpuc tool:

$ gxnpuc config.yaml

This will generate the NPU file model.h and print the memory information required by the model:

------------------------
Memory allocation info:
Mem0(ops): 0
Mem1(data): 15520
Mem2(instruction): 468
Mem3(in): 3104
Mem4(out): 1796
Mem5(tmp content): 0
Mem6(weights): 84052
Total NPU Size (Mem0+Mem1+Mem2+Mem5+Mem6): 100040
Total Memory Size: 104940
------------------------
Compile OK.

2. Execute NPU File*

For detailed deployment instructions, please refer to the NPU Model Deployment Guide.