中国语音学报(第11辑)
上QQ阅读APP看书,第一时间看更新

4.Results

To determine the number of hidden units of each layer,we first conduct an experiment on a neural network with one hidden layer.And the number of hidden units varies from 50 to 1600.The results indicate that the neural network with 400 hidden units achieves the best performance.Therefore,we construct a deep neural network with 6 hidden layers.And each hidden layer contains 400 ‘ReLU’ units.The momentum d in Equation 5-8 is 0.8.The initial learning rate is set to be 0.0004,and decays with the proportion of 0.9.Each mini-batch contains 1024 examples.The maximum number of training epoch is 50.

The RMSE are calculated for the coils estimated from acoustic signal.As shown in Table 1.The first two columns are the RMSE of the DNN trained with cost functions L1 and L2 on test set,respectively.The last two columns are the correlation coefficients between estimated trajectory and the ground truth.The bold italic number in the second and fourth column demonstrate that the RMSE/correlation between the estimate and the ground truth of the corresponding articulatory channel is depressed/improved.The performance of 11 of 14 articulatory channels are improved for either RMSE or correlation.Among these improved articulatory channels,the RMSE decreases for 9 articulatory channels,the correlation increases also for 9 articulatory channels.The performance of 7 articulatory channel are improved in aspect of both RMSE and correlation.

Table 1 Experiment results.

The average RMSE and correlation for all the coil's coordinates is about 1.092mm and 0.8894 respectively when L1 cost function is used.This result is better than the results reported by other researches based on a MOCHA database[13,27].The average RMSE and correlation for all the coil's coordinates is about 1.0841mm and 0.8909 respectively when L2 cost function is used.Both the position and shape similarities between estimated trajectory and ground truth have been improved.