Return to Article Details A Picture is Worth a Thousand Words: Using Cross-Modal Transformers and Variational AutoEncoders to Generate Images from Text Download Download PDF