Student project for the lecture Deep Vision summer term 2019.
from left to right: he former president of Zimbabwe Africa young, he Swiss businessman and actor Europe young, she American congresswoman from 1967-1998 Canada middle, he Japanese professor for socio-economics Asia middle
This project aims at translating text in the form of single-sentence human-written de- scriptions from Wikipedia articles into high-resolution images using a conditional ProGAN (by akanimax, thanks!). We use a self-crawled dataset of (image, text)-pairs from the wikidata Query Service. The training data are only weakly correlated - only very salient features of persons within images (such as age, gender or origin) find mention in the texts while the rest of the texts consist of noise irrelevant to the image and a learning model.
For obtaining the data set please change to the data/ directory. Within the data/ directory we provide the json file contianing the entities we crawled from wikipedia.
!wget https://dl.fbaipublicfiles.com/infersent/infersent2.pkl
!wget http://nlp.stanford.edu/data/glove.840B.300d.zip && unzip glove.840B.300d.zip && rm glove.840B.300d.zip
See requirements.txt
For the evaluation we provide a Jupyter Notebook and pretrained weights within the Evaluator folder. For more details, have a look at the notebook.
Code is based partially on akanimax.