2.Method_中国语音学报（第11辑）-QQ阅读女频仙侠网

上QQ阅读APP看书，第一时间看更新

2.Method

In past few years，Deep Neural Network became popular due to its success in many fields.It has been applied to the field of speech inversion[22，23].Wu et al.[23]try general linear model （GLM），Gaussian mixture model （GMM），artificial neural network （ANN） and deep neural network （DNN） with sigmoid hidden units to estimate articulatory position from synchronized speech features on a English articulatory-acoustic corpus MNGU0.Their results demonstrate that the DNN performs best.

Traditional DNN is obtainedby stacking a series of trained RBMs together layer by layer，where hidden layer of the preceding Restricted Boltzmann Machine （RBM） serves as the visible layer of the following RBM.At the top layer，a regression layer with linear units is added to the stack RBMs.This method is simple and effective in many applications.In DNN，The input of each neuron can be formulated as：

whereI_（n），i is the input of the i^th unit in the n^th layer，o_{（n-1），j} is the output of the j^th neuron in the （n-1）^th layer，w_（n），ij is the weight that connecting the i^th unit in the n^th layer and the j^th unit in the （n-1）^th layer，and b_（n），j is the bias of the j^th neuron in the （n-1）^th layer.f（x） is the activation function of corresponding neuron.It is usually a sigmoid function.

However，the training a traditional DNN with sigmoid hidden units is slow and the performance tends to be affected by gradient vanish/explosion problems.Moreover，the distribution of I_（n），i of hidden layers changes during training as the parameters of the previous layers change[26]，which slows down the training by requiring lower learning rates and careful parameter initialization，and makes it hard to train models with saturating nonlinearities.

Figure 1 structure of batch normalized feedforward neural network

In this study，the batch normalization technique is implemented to perform the normalization for each training mini-batch （yellow blocks shown in Figure 1）.And ‘ReLU’ activation function is used for the neurons of hidden layers.The process of the batch normalization can be formulated as：

whereμ_（n），i，are the mean and variance of x_（n），i，γ_（n），iand β_（n），iare scaling and shifting parameter on normalized value so as to keep the representation capability of the layer.These parameters are optimized with momentum gradient method：

where L is the loss over the training set，d is the momentum，and η is the learning rate. The partial derivatives ，，， of each layer can be calculated by using backpropagation algorithm （shown in Equation 12-22）.

whereL^（l） is the loss over the l^th example，m is the number training examples， is the input to the batch normalization blocks of the （n-1）^th layer corresponding to the l^th input example，I_（n） is the input of the n^th layer， is the input to the i^th neuron of the n^th layer corresponding to the l^th input example， is the batch normalized input to the i^th batch normalization block of then n^th layer corresponding to the l^th input example，o_（n） is the output of then n^th layer，W_（n） is the connection weight that link neurons in the （n-1）^th layer and the n^th layer，and b_（n） b_（n-1）is the bias of the n^th layer.