A demonstration of how to create tables in PostgreSQL and import data for analysis.
In this project, I will be using PostgreSQL and PgAdmin and I will demonstrate how to create tables from a dataset, review common data types and constraints, and how to import data via two different methods (GUI vs SQL Commands).
Language and Utilities Used:
- Language:
SQL
- DBMS: PostgreSQL 14
- GUI: PgAdmin 4 on Windows 10
📍 Table of Contents 📍
- Create Database (PgAdmin)
- Syntax for Creating Tables
- From CSV to SQL Table
- Importing Data into Tables (2 Methods)
- In PgAdmin, create your database first by right-clicking Databases and create a new database.
- Name your DB and click save. If you don’t immediately see your DB, right-click again on Databases and click refresh.
- Query your DB by right-clicking its name and selecting the Query Tool. This will bring up the Query Panel and then you can begin interacting with it via SQL.
Note: If you are working with multiple databases at once, you can keep track of which one you are currently via the name in the Query Panel. The panel name defines which DB you are in : YourDatabaseName / PostgreSQL Version #
As a best practice, before creating tables that are based on a dataset in any SQL database:
🔎 Review your headers and data types in your dataset🔎
- The column headers from your dataset must match your SQL table headers and your data types must also agree in order to import into the table.
- Data type issue - A column data type as a
VARCHAR
when the corresponding source values is anINTEGER
. - Null issue - Be familiar with your data set. You might discover there are
null
strings represented by 'NA' in your column that is assigned an Integer type (which cannot read strings of text). I demonstrate how to handle nulls in the import section.
- Data type issue - A column data type as a
For this project, I will use a dataset from Kaggle.com (USA People Without Internet in 2016).
From the CSV file, I will create two tables in PostgreSQL:
- A county population table (county_pop) which will have the population and racial data with the percentage of each county with no internet access.
- An education level and income level table (education_income) that has the education, median age, and income to compare to the population table.
This is the general syntax for creating a table:
CREATE TABLE table_name (
column_name TYPE column_constraint,
column_name TYPE column_constraint,
column_name TYPE column_constraint
);
--- Don't forget the commas between your columns, minus the last column
-
A DATA TYPE specifies the pattern (Text, Number...) of the data and how the value is stored. The values must adhere to the requirements of the type for PostgreSQL to accept.
-
Common data types:
- True or False
boolean
- Character
char
,varchar
, andtext
- Numeric
integer
andfloat
- Temporal
date
,time
,timestamp
PostgreSQL has many data types, for more details on each type (and storage limitations) refer to this handy documentation: https://www.postgresql.org/docs/current/datatype.html
- True or False
-
-
A column CONSTRAINT is an additional requirement or condition for the values in that column.
-
Common column constaints:
-
NOT NULL
Ensures there is never null data (or an absence of data). Very useful when importing new data to table. For example, if you are importing new data into a customer sales table and company policy is to always require an email address you would set the email_address column constraint asNOT NULL
. -
UNIQUE
Ensures that each value in the column is unique (not repeated). For example, you may want to ensure each row of a phone number column is unique per person. -
PRIMARY KEY
An assigned number, used to uniquely identify each row in a table. This can allow you to target, retrieve or modify a row based on the specific PK (primary key). -
SERIAL
A way of automatically creating new PK integers (adds +1) as new row data is entered into a table.
For more details on constraints: https://www.postgresql.org/docs/14/ddl-constraints.html
-
-
Upon reviewing the headers and data types my CSV dataset below, I can build my SQL query to create the table structure.
--- First table (county_pop)
--- Contains counties, states, total popoulation numbers (indicated with 'P_'), race numbers, and a percentage of each without having Internet.
CREATE TABLE county_pop(
id_pop SERIAL PRIMARY KEY,
county VARCHAR(150) NOT NULL,
state VARCHAR(2) NOT NULL,
P_total INT,
P_white INT,
P_black INT,
P_asian INT,
P_native INT,
P_hawaiian INT,
P_others INT,
percent_no_internet decimal
);
--- Second table (education_income)
--- Contains population numbers for education levels, poverty levels, and median age and median income per counties and states.
CREATE TABLE education_income(
id SERIAL PRIMARY KEY,
county VARCHAR(150) NOT NULL,
state VARCHAR(2) NOT NULL,
P_below_middle_school INT,
P_some_high_school INT,
P_high_school_equivalent INT,
P_some_college INT,
P_bachelor_and_above INT,
P_below_poverty INT,
median_age decimal,
median_household_income decimal,
median_rent_per_income decimal
);
Tip: If you make a mistake when assigning a data type or constraint, use the ALTER
statement.
--- Changing data type for a column
ALTER TABLE table_name
ALTER COLUMN column_name
TYPE your_new_data_type;
--- Removing a constraint
--- To add, replace drop with ADD
ALTER TABLE table_name
ALTER COLUMN column_name
DROP constraint_name;
Now that we have tables in the database, we need to insert values into those tables. Here are two ways to accomplish this:
1. Using PgAdmin (GUI method)
- This is the easiest. Does not require special file permissions
- Right-click on your database ➡️ go to Schemas ➡️ Tables ➡️ Right-click on your table and select Import/Export Data
On the options side, locate the file path of your CSV dataset and toggle headers ON:
On the columns side, select all columns in the file to import (minus the ID primary key column we created for the table).
Note: If your dataset includes null strings (as in, nulls are represented by NA or some other string in your data), you need to specify what PostgreSQL should do when it encounters ‘NA’ in the data. For my example, I have the NULL Strings as ‘NA’.
2. Using the SQL COPY command
- Requires special permissions for PostgreSQL to read/write files to the local PC (if that’s where you’re pulling data).
General COPY command syntax:
COPY table_name (column_name, column_name, etc ...)
FROM ‘C:\Users\Name\Location.csv’
DELIMITER ‘,’ --- Since we are using a CSV (comma seperated values) file formant, the DELIMITER is ','
NULL ‘NA’ --- We are declaring that any null data is represented by the string 'NA'
CSV HEADER; --- We want to leave the headers from our file
If you get a permission error from PostgreSQL, similar to:
ERROR: could not open files "YourFile.csv" for reading: Permission Denied
There are a couple of ways to approach this, but for my purposes I changed the permission settings for my specific file.
- Go to the file/folder location you are using for your dataset
- Right click the file/folder
- Give Access to ➡️ Specific People
- Give Read/Write permissions to Everyone
- Giving the Write permission will allow you to export from PostgreSQL later for sharing or for data visualizations.
For both of these import methods, the messages box in your Query Panel will return something like:
COPY 820 Query returned succesfullly in 46 msec.
The 820 integer for this example is the total count of number of rows successfully imported.
✔️
Testing my imports with a simple SELECT
all (*) statement to confirm data was added correctly.
In summary, we have:
- Created a simple database using PgAdmin
- Created Tables
- Basic Table Syntax
- Covered Common Data Types
- Covered Common Constraints
- Used a Dataset (CVS file) to Create a Table in the Database
- Imported Values
- Graphic Interface Method
- COPY Command Method
Now we can begin analyzing the data with SQL commands to answer questions about our dataset.
This concludes this chapter of the project and we will move to analyzing in the next section (I will update with a link once the write-up is complete). Thanks for reading! 👋