-
Notifications
You must be signed in to change notification settings - Fork 602
Importing into PostgreSQL
The instructions below assume you have PostgreSQL up and running. There are many good guides on the Internet about that, this is one of them.
To import the annotations into PostgreSQL, connect to the Postgres server with psql and run the following commands:
Note: this is a work in progress, and it's known to be incomplete and suboptimal. Suggestions to improve it are very welcome!
Note an alternative to using a local database would be to use BigQuery that already has the annotations hosted as a part of Public Datasets program.
postgres=# CREATE DATABASE openimages;
CREATE DATABASE
postgres=# \c openimages
You are now connected to database "openimages" as user "username".
openimages=> CREATE TABLE Images (
ImageID CHAR(16),
Subset VARCHAR,
OriginalURL VARCHAR,
OriginalLandingURL VARCHAR,
License VARCHAR,
AuthorProfileURL VARCHAR,
Author VARCHAR,
Title VARCHAR,
OriginalSize BIGINT,
OriginalMD5 VARCHAR,
Thumbnail300KURL VARCHAR,
PRIMARY KEY(ImageID)
);
openimages=> CREATE TABLE Dict (
LabelName VARCHAR,
DisplayLabelName VARCHAR,
PRIMARY KEY (LabelName)
);
CREATE TABLE
openimages=> CREATE TABLE Labels (
ImageID CHAR(16) REFERENCES Images(ImageID),
Source VARCHAR,
LabelName VARCHAR REFERENCES Dict(LabelName),
Confidence REAL,
PRIMARY KEY(ImageID, Source, LabelName)
);
Now, let's load the data from the CSV files:
openimages=> \COPY Images FROM 'images_2016_08/validation/images.csv' DELIMITER ',' CSV HEADER
COPY 167056
openimages=> \COPY Images FROM 'images_2016_08/train/images.csv' DELIMITER ',' CSV HEADER;
COPY 9011219
openimages=> \COPY Dict FROM 'dict.csv' DELIMITER ',' CSV;
COPY 7881
openimages=> \COPY Labels FROM 'human_ann_2016_08/validation/labels.csv' DELIMITER ',' CSV HEADER;
COPY 1741384
openimages=> \COPY Labels FROM 'machine_ann_2016_08/validation/labels.csv' DELIMITER ',' CSV HEADER;
COPY 2060220
openimages=> \COPY Labels FROM 'machine_ann_2016_08/train/labels.csv' DELIMITER ',' CSV HEADER;
COPY 79196415
The last command above will take more than 25 minutes. It's normal for the number of rows being inserted.
You can now verify that import was successful:
openimages=> SELECT Subset, count(*) FROM Images GROUP BY Subset;
subset | count
------------+---------
train | 9011219
validation | 167056
(2 rows)
Which labels are in the dataset?
openimages=> SELECT * FROM dict LIMIT 10;
labelname | labeldisplayname
--------------+---------------------------
/g/1172hxxrk | traditional sport
/g/11jwzh3_l | volcanic landform
/g/11jxkqbpp | mountainous landforms
/g/11jy976z6 | small terrier
/g/11k2xz7mr | meteorological phenomenon
/g/11k38drqt | grilled food
/g/121sl9wl | jaguar s-type
/g/12233hjw | classic
/g/1261x0jn3 | baby products
/g/1q5yy2zkf | hairless cat
(10 rows)
Which labels have “bus” in their display names?
openimages=> SELECT * FROM dict WHERE LabelDisplayName LIKE '%bus%' LIMIT 20;
labelname | labeldisplayname
------------+----------------------
/m/012t_z | businessperson
/m/015z8s | airbus a320 family
/m/015zbk | airbus a330
/m/015zfz | airbus
/m/016_bh | shadbush
/m/018rl2 | airbus a380
/m/01bjv | bus
/m/01jw_1 | bus stop
/m/01kqwy | business jet
/m/02539r | sorbus
/m/0283szz | flxible new look bus
/m/02w11w8 | tour bus service
/m/02yvhj | school bus
/m/03_k0c | busy lizzie
/m/03n9vx | double-decker bus
/m/03qk36c | airport bus
/m/045jsc | minibus
/m/04yqq2 | bust
/m/05jlh5 | saltbush
/m/0c5q0q | mi rebus
(20 rows)
How many images of a trolleybus are in the dataset?
openimages=> SELECT count(*) FROM Labels
INNER JOIN Images ON Labels.ImageID = Images.ImageID
WHERE Labels.LabelName='/m/0f6pl' AND Labels.Confidence > 0.5;
count
-------
3550
(1 row)
What are some landing pages of images with a trolleybus?
openimages=> SELECT OriginalLandingURL, Confidence FROM Labels
INNER JOIN Images ON Labels.ImageID = Images.ImageID
WHERE LabelName='/m/0f6pl' AND Confidence = 1 AND Subset='validation' LIMIT 10;
originallandingurl | confidence
----------------------------------------------------------+------------
https://www.flickr.com/photos/tadokoro/8615989093 | 1
https://www.flickr.com/photos/daveiam/3492373572 | 1
https://www.flickr.com/photos/sergejf/8706867707 | 1
https://www.flickr.com/photos/metrocincinnati/4400806591 | 1
https://www.flickr.com/photos/metrocincinnati/4400806389 | 1
https://www.flickr.com/photos/cityoftoronto/10732215443 | 1
https://www.flickr.com/photos/jmv/9332891476 | 1
https://www.flickr.com/photos/ociosa/167986014 | 1
https://www.flickr.com/photos/tjc/165330995 | 1
https://www.flickr.com/photos/hisgett/3453032426 | 1
(10 rows)
Which images with cherries are in the training set?
openimages=> SELECT Images.ImageID, OriginalURL, Confidence FROM Labels
INNER JOIN Images ON Labels.ImageID = Images.ImageID
WHERE LabelName='/m/0f8sw' AND Confidence >= 0.85 AND Subset='train' LIMIT 10;
imageid | originalurl | confidence
------------------+------------------------------------------------------------------+------------
24ed494492b89654 | https://farm5.staticflickr.com/3045/2639644807_ff2180f547_o.jpg | 0.9
26ed578010c2126b | https://c4.staticflickr.com/8/7557/15584458587_beeaf99d1d_o.jpg | 0.9
275344e5e05fbd55 | https://farm6.staticflickr.com/5515/11645877016_a813d091c1_o.jpg | 0.9
3a70463d87017ae4 | https://farm5.staticflickr.com/5496/14225297485_94265d55cd_o.jpg | 0.9
442fd9b4ae0a56e4 | https://farm5.staticflickr.com/6230/6415233549_eb8a7eab74_o.jpg | 0.9
466bc960160c8f04 | https://farm5.staticflickr.com/2936/14373355594_a1e5e37316_o.jpg | 0.9
47e895a6a68d2adc | https://c5.staticflickr.com/3/2390/2481908048_a761c1b7a6_o.jpg | 0.9
572df253194d60ec | https://c5.staticflickr.com/3/2465/3715205628_1b7367290a_o.jpg | 0.9
5b2c5f8cc98b135f | https://c8.staticflickr.com/2/1083/944423396_16a7ef9cc0_o.jpg | 0.9
5e655374ee5fd3ff | https://c8.staticflickr.com/4/3930/15263226238_01a625f6c2_o.jpg | 0.9
(10 rows)