ニューラルネットのクロスエントロピーの式

今読んでいる深層学習の本でわかりにくいところがあったので、ChatGPTに補足してもらいました。コピペしたらすこしフォーマットが崩れましたが、数式は表示できているようです。

質問:What would be the equation of E(W,b) = croos-entropy error across all the training examples (n).

The equation for the cross-entropy error across all training examples (

nn

) is given by:

E(W,b)=1ni=1nj=1Cyijlog(y^ij)E(W, b) = -\frac{1}{n} \sum_{i=1}^{n} \sum_{j=1}^{C} y_{ij} \log(\hat{y}_{ij})

 

Explanation of the terms:


  1. n:
     

    • The total number of training examples.

  2. C:
     

    • The number of classes in the classification task.

  3. yij:
     

    • A binary indicator (1 or 0) that indicates whether the
      ii
       

      -th example belongs to class j (from the ground truth/label).

    • This corresponds to a one-hot encoding of the true label.

  4. y^ij\hat{y}_{ij}
     

    :

    • The predicted probability that the
      ii
       

      -th example belongs to class

      jj, as computed by the model (often after applying a softmax activation function).


  5. 1ni=1n-\frac{1}{n} \sum_{i=1}^{n}
     

    :

    • The average of the loss across all training examples, ensuring that the error is normalized.

  6. j=1Cyijlog(y^ij)-\sum_{j=1}^{C} y_{ij} \log(\hat{y}_{ij})
     

    :

    • The cross-entropy loss for a single training example
      ii
       

      .

Key Points:

  • The equation computes the average cross-entropy loss across all training examples in the dataset.
  • If there is only one correct class for each training example, the term
    yijlog(y^ij)y_{ij} \log(\hat{y}_{ij})
     

    simplifies to only the correct class, because

    yij=1y_{ij} = 1 

    for the correct class and

    yij=0y_{ij} = 0 

    for all other classes.

Simplified Version (if yiy_i

 

 

 

 

is a scalar class label):

If the ground truth

yiy_i

is not one-hot encoded but instead directly represents the class index (e.g.,

yi{1,2,...,C}y_i \in \{1, 2, …, C\}

), the equation becomes:

E(W,b)=1ni=1nlog(y^i,yi)E(W, b) = -\frac{1}{n} \sum_{i=1}^{n} \log(\hat{y}_{i, y_i})

 

where:


  • y^i,yi\hat{y}_{i, y_i}
     

    is the predicted probability for the true class

    yiy_i 

    of the

    ii 

    -th training example.

This is commonly used in implementations where the true labels are given as class indices (e.g., PyTorch’s CrossEntropyLoss).