CS5720 - One-to-Many RNN: Image Captioning

One-to-Many RNN Architecture

🖼️
Image

Single Input
(CNN Features)

→

RNN₁

"A"

→

RNN₂

"cat"

→

RNN₃

"sits"

→

RNNₙ

"..."

Multiple Sequential Outputs from Single Image Input

🏖️

"A beautiful beach with clear blue water and white sand under a sunny sky"

🏙️

"A busy city street with tall buildings and people walking on the sidewalk"

🐕

"A golden retriever playing with a ball in a green park"

🍝

"A plate of spaghetti with tomato sauce and fresh basil leaves"

Explore how one-to-many RNNs generate captions from visual input