![中国语音学报(第11辑)](https://wfqqreader-1252317822.image.myqcloud.com/cover/438/34866438/b_34866438.jpg)
2.Method
In past few years,Deep Neural Network became popular due to its success in many fields.It has been applied to the field of speech inversion[22,23].Wu et al.[23]try general linear model (GLM),Gaussian mixture model (GMM),artificial neural network (ANN) and deep neural network (DNN) with sigmoid hidden units to estimate articulatory position from synchronized speech features on a English articulatory-acoustic corpus MNGU0.Their results demonstrate that the DNN performs best.
Traditional DNN is obtainedby stacking a series of trained RBMs together layer by layer,where hidden layer of the preceding Restricted Boltzmann Machine (RBM) serves as the visible layer of the following RBM.At the top layer,a regression layer with linear units is added to the stack RBMs.This method is simple and effective in many applications.In DNN,The input of each neuron can be formulated as:
whereI(n),i is the input of the ith unit in the nth layer,o(n-1),j is the output of the jth neuron in the (n-1)th layer,w(n),ij is the weight that connecting the ith unit in the nth layer and the jth unit in the (n-1)th layer,and b(n),j is the bias of the jth neuron in the (n-1)th layer.f(x) is the activation function of corresponding neuron.It is usually a sigmoid function.
However,the training a traditional DNN with sigmoid hidden units is slow and the performance tends to be affected by gradient vanish/explosion problems.Moreover,the distribution of I(n),i of hidden layers changes during training as the parameters of the previous layers change[26],which slows down the training by requiring lower learning rates and careful parameter initialization,and makes it hard to train models with saturating nonlinearities.
Figure 1 structure of batch normalized feedforward neural network
In this study,the batch normalization technique is implemented to perform the normalization for each training mini-batch (yellow blocks shown in Figure 1).And ‘ReLU’ activation function is used for the neurons of hidden layers.The process of the batch normalization can be formulated as:
whereμ(n),i,are the mean and variance of x(n),i,γ(n),iand β(n),iare scaling and shifting parameter on normalized value
so as to keep the representation capability of the layer.These parameters are optimized with momentum gradient method:
where L is the loss over the training set,d is the momentum,and η is the learning rate. The partial derivatives ,
,
,
of each layer can be calculated by using backpropagation algorithm (shown in Equation 12-22).
whereL(l) is the loss over the lth example,m is the number training examples, is the input to the batch normalization blocks of the (n-1)th layer corresponding to the lth input example,I(n) is the input of the nth layer,
is the input to the ith neuron of the nth layer corresponding to the lth input example,
is the batch normalized input to the ith batch normalization block of then nth layer corresponding to the lth input example,o(n) is the output of then nth layer,W(n) is the connection weight that link neurons in the (n-1)th layer and the nth layer,and b(n) b(n-1)is the bias of the nth layer.