Machine-based image recognition and generation can already produce fantastic results, but machines cannot always handle data perfectly.
The sources of possible errors take root at an early stage in the design of automated programs. Hordes of microjobbers from Africa, India, or China spend their time trawling through and precisely labeling thousands of images for training neural networks. The quality and quantity of training samples are very important. These images may also have features that we humans overlook, while the machine regards them as a basis for its decisions. Computer vision programs cannot really interpret what they see freely, as humans do: they have only learned to recognize certain elements and images and draw conclusions from those.