======= inspired by NicsMeme®
"In the future memes will be able to generate themselves and propagate automatically"
see also the Dead Internet Theory
For a complete presentation of this project see hypermeme.ipynb
The aim of this project is to categorize memes into 4 categories or topics.
- pol: politics
- ent: entertainment
- sport: sports
- oth: other
The classification is based on their visual content (ocr and caption).
-
Make sure you got docker and wget installed on your machine as they are required for this project
-
Download Kafka
cd ./kafka/setup wget https://downloads.apache.org/kafka/3.7.1/kafka_2.13-3.7.1.tgz
-
Build containers
# if you want gpu acceleration docker compose -f gpu_compose.yaml build # otherwise (cpu) docker compose build
-
Start pipeline (see quickstart)
-
Import
dashboard+data_view.ndjson
file from /kibana directory into kibana
# if you want gpu acceleration
docker compose -f gpu_compose.yaml --profile pipeline up
# otherwise (cpu)
docker compose --profile pipeline up
docker compose --profile download_dataset up
# Pretrained model is already included with this repo. If you want to rebuild it using your own data you can use this command
docker compose --profile build_model up
Container | URL | Description |
---|---|---|
kafka-UI | http://localhost:8080 | Open kafka UI |
kibana | http://localhost:5601 | Kibana base URL |
- Ingestion:
- Image data extraction
- Image captioning model: blip-image-captioning-large su huggingface
- EasyOcr
- Streaming:
- Processing:
- Indexing:
- Visualization: