It is difficult problem to explain unless you've taken the higher math courses, but basically the number of layers necessary to properly record the shapes is one level lower than what is actually used in the neural network, which is why you see these artifacts cropping up. it is also the same reason why you get people with missing arms or extra arms.
sorry, "is one level lower than what is needed to be used"