VGGPT-2 has been trained to understand your questions and focus its attention on relevant spots of the input images to generate answers. It can generate arbitrary answers, since it is an auto-regressive architecture.
Pick one of these images to begin, or reload this page to load new ones
Images are randomly selected across 40k samples taken from the VQAv2 validation set