Presentation link - https://youtu.be/8ChOLvpfkzc
Problem Statement - Image captioning involves the automatic generation of one or more natural language sentences to describe an image. In recent years, the field has advanced rapidly, transitioning from initial template-based models to contemporary deep neural network-based approaches. This report provides a comprehensive overview of recent image captioning research, focusing specifically on models that employ the combination of convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. We examine the merits of various strategies - Attention mechanism and Word Embeddings, as well as review prevalent evaluation metrics and datasets commonly used in the field to evaluate these strategies.