Generally about what the cross entropy function measures. At its core, it takes two probability distributions and estimates the "distance" between them, allowing you to get them closer to one another. For this case, one distribution is the softmax vector (logits), and the second is the one-hot label. I'm not sure if y-conv or y_ is the softmax or one-hot label, so I took a random assumption in the code.
A very useful method of use case is to have logits and labels of shape [batch_size, num_classes], but higher dimensions are supported, with the axis argument specifying the class dimension.
Backpropagation will only occur into both logits and labels. For disallowing back propagation into labels, pass label tensors through tf.stop_gradient before feeding it to this function.
You have to computes softmax cross entropy between logits and labels.
labels, logits, axis=-1, name=None
We face this error after changing to what you said ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'Variable:0' shape=(32,) dtype=float32_ref .