cross entropy softmax softmax with cross entropy cross entropy \[\begin{aligned} L =& -\frac{\sum t \log y}{B} = -\frac{\sum_{ij}t_{ij}\log y_{ij}}{B} \\ \Rightarrow \frac{\partial L}{\partial y_{ij}} =& -\frac{t_{ij}}{B*y_{ij}} \\ \Rightarrow \frac{\partial L}{\partial y} =& -\frac {t}{B*y}\\ \end{aligned}\] softmax \[\begin{aligned} y =& \frac{e^x}{\sum e^x} \Rightarrow y_{ij} = \frac{e^{x_{ij}}}{\sum_j {e^{x_{ij}}}} \\ \Rightarrow \frac{\partial {y_{ij}}}{\partial {x_{pq}}} =& \frac{e^{x_{ij}}*\frac{\partial x_{ij}}{\partial x_{pq}}*\sum_j e^{x_{ij}}-e^{x_{ij}}*e^{x_{iq}}*\frac{\partial x_{iq}}{\partial x_{pq}}}{(\sum_j{e^{x_{ij}}})^2} \\ =& \begin{cases} \frac{e^{x_{ij}}*(\sum_je^{x_{ij}}-e^{x_{ij}})}{(\sum_j{e^{x_{ij}}})^2} = y_{ij} * (1-y_{ij})& p=i, q = j \\ \frac{-e^{x_{ij}}*e^{x_{iq}}}{(\sum_j{e^{x_{ij}}})^2} = -y_{ij} * y_{iq}& p = i, q\neq j\\ 0& p\neq j \end{cases} \end{aligned}\] softmax with cross entropy \[\begin{aligned} \frac{\partial L}{\partial x_{pq}} =& \sum_{ij} (\frac {\partial L}{\partial y_{ij}} * \frac{\partial y_{ij}}{\partial x_{pq}}) \\ =& \sum_{j}(-\frac{t_{pj}}{B*y_{pj}}*{\frac{\partial y_{pj}}{\partial x_{pq}}}) \\ =& \sum_{j\neq q} (\frac{t_{pj}}{B*y_{pj}}*y_{pj}*y_{pq}) - \frac{t_{pq}}{B*y_{pq}}*y_{pq}*(1-y_{pq}) \\ =& \sum_{j\neq q} \frac{t_{pj}y_{pq}}{B} - \frac{t_{pq}(1-y_{pq})}{B} \\ =& \frac{(\sum_{q}t_{pq})*y_{pq}-t_{pq}}{B} \\ =& \frac{y_{pq}-t_{pq}}{B} \\ \Rightarrow \frac{\partial L}{\partial x} =& \frac{y-t}{B} \end{aligned}\]