PostgreSQL

PostgreSQL is an open-source relational database.

Installation

Read the official documentation for installation on Ubuntu or other OS.

System architecture

Role

A role is like an account. PostgreSQL by default creates a role, postgres, with superuser privilege. The superuser privilege lets it bypass all safety checks so using it comes with high risk, as we may accidentally delete things. It is recommended to use the postgres role to create a custom role with the least privilege needed and use the custom role to interact with PostgreSQL.

Role creation

Start a PostgreSQL interactive terminal, psql, and create the target role using Structured Query Language (SQL).

sudo -i -u postgres psql # Execute psql as user postgres and database postgres.
postgres=# CREATE ROLE <ROLE> WITH LOGIN PASSWORD <PASSWORD> CREATEROLE CREATEDB;
postgres=# \du # List all roles.
postgres=# \q # Exit psql.

Add an entry to pg_hba.conf (/etc/postgresql/<VERSION>/main/pg_hba.conf for ubuntu) to enable the connection to server.

sudo vim /etc/postgresql/<VERSION>/main/pg_hba.conf
# Append "local <DATABASE> <ROLE> scram-sha-256" to the end of the file.
sudo systemctl reload-or-restart postgresql

Role deletion

We drop the objects owned by the target role in each database.

psql -U <ROLE> -d <DATABASE>
<DATABASE>=# DROP OWNED BY <ROLE>;

Then, we drop the target role.

sudo -i -u postgres psql
postgres=# DROP ROLE <ROLE>;

Role management tree

flowchart TB postgres --> project-admin project-admin --> project-app

PSQL commands

\h # Get help.
\du # List all roles.
\l # List all databases.
\l+ # List all databases with extra details.
\dn # List all schemas in the current database.
\dn+ # List all schemas in the current database with privilege details.
\d # List all tables and sequences in the current database.
\dt # List all tables in the current database.
\z <TABLE> # List privileges of the table.
\z public.* # List privileges for all tables in the public schema.
\df # List all functions in the current database.
\df+ # List all functions in the current database with privilege details.
\i commands.sql # Read and execute the commands.sql file.
\q # Exit psql.

Database

Database is like a container with multiple layers to store and organize our data. Let's see what's inside a database.

First we encounter schemas. We can think of schemas as directories in a file system, isolating a group of files from other groups. Inside a schema we see tables, which are like spreadsheet files that actually store our data.

Database creation

We can start a psql terminal to create a database. A database name must start with an alphabetic character and is limited to 63 bytes in length.

psql -U <ROLE> -d postgres
postgres=# CREATE DATEBASE <DATABASE>;

Or we can use the binary provided by PostgreSQL to create a database.

createdb <DATABASE>

Database deletion

We can start a psql terminal to delete a database.

psql -U <ROLE> -d postgres
postgres=# DROP DATEBASE <DATABASE>;

Or we can use the binary provided by PostgreSQL to delete a database.

dropdb <DATABASE>

Schema

Table

Column

Row

Row level security

INSERT

When the SQL contains RETURNING, PostgreSQL creates a filter by using AND to group the USING clause of the FOR ALL policy, and the USING clause of the SELECT policy. It applies this filter first and then finds the rows to be returned.

SELECT

PostgreSQL creates a filter by using AND to group the USING clause of the FOR ALL policy, and the USING clause of the SELECT policy. It applies this filter first and then processes the WHERE clause in the SQL to find the rows to be selected.

Update

A UPDATE operation is a two-step process behind the scenes.

  1. PostgreSQL creates a filter by using AND to group the USING clause of the FOR ALL policy, the USING clause of the SELECT policy, and the USING clause of the UPDATE policy. It applies this filter first and then processes the WHERE clause in the SQL to find the rows to be updated.
  2. Once a row is found, it validates the new data with the WITH CHECK clause of the UPDATE policy to determine if the update is accepted.
Delete

A DELETE operation is a two-step process behind the scenes.

  1. PostgreSQL creates a filter by using AND to group the USING clause of the FOR ALL policy, the USING clause of the SELECT policy, and the USING clause of the DELETE policy. It applies this filter first and then processes the WHERE clause in the SQL to find the rows to be deleted.
  2. Once a row is found, it deletes the row.

Data name

A table has its name. A column also has its name.

In PostgreSQL, using unquoted names is a best practice for naming.

Unquoted names must begin with a letter or an underscore. Subsequent characters can be letters, digits, underscores, or dollar signs. Unquoted names are automatically folded to lowercase so it is recommended to just use lowercase letters.

We can use arbitrary characters like uppercase letter, spaces and hyphens for naming if we enclose the name in double quotes. However, this is a poor practice because we then have to use double quotes everywhere.

Double quotes are used for identifiers like table and column names that contain reserved keywords, special characters or spaces.

Single quotes are used for string literals.

Data types

Each column has its own data type, and here is a list of the standard SQL data types.

Space in SQL command

SQL commands ignore extra spaces.

SQL comment

Use -- before a comment and everything is ignored up to the end of the line.

Table creation

Assume we want to create a gym user table.

CREATE TABLE user (
    id int generated by default as identity primary key,
    name varchar(100),
    email varchar(254) unique,
    height_m real,
    weight_kg real,
    creation_date date default current_date
);

Table primary key

We usually use a surrogate key like an auto-increment integer ID as the table primary key, rather than using a natural key such as a string. This avoids potential technical debt because the value of a surrogate key never changes while the value of a natural key may change in the future. If we choose the natural key path and then a value change happens, we will need to update everything related to it, from primary key in the original table, foreign key in other tables, and indexes.

Table deletion

DROP TABLE user;

Row creation

-- Explicit columns
-- The column order can be altered.
-- Any unrequired column can be omitted.
INSERT INTO user (name, email, height_m, weight_kg, creation_date) VALUES ('Bob', 'bob@gmail.com', 1.8, 70, '2024-12-23');

-- Shorter version
INSERT INTO user VALUES ('Bob', 'bob@gmail.com', 1.8, 70, '2024-12-23');

We can use the COPY keyword to load data from text files too.

COPY user FROM '/tmp/user.txt';

Temporary relational data creation in Go

When we need a temporary data structure to hold some relational data, we can do either Common Table Expression (CTE) with unnest or temporary table with CopyFrom.

Row read

We use the SELECT statement to query data from a table. The SELECT statement has 3 parts.

-- Select all columns from the user table without row filtering criteria.
SELECT * FROM user;

-- Select the name column and bmi expression from the user table without row filtering criteria.
SELECT name, weight_kg/(height_m*height_m) AS bmi FROM user;

-- Select the name and height_m column from the user table with a qualification specifying that only rows with height_m > 1.8 and weight_kg < 70 are wanted. Remove rows with duplicate names. The result should be ordered by name in descending order.
SELECT DISTINCT name, height_m FROM user WHERE height_m > 1.8 AND weight_kg < 70 ORDER BY name DESC;

Table join

Assume we also have a gym course enrollment table that links a course to each enrolled user.

CREATE TABLE course_enrollment (
    id int generated by default as identity primary key,
    course_name varchar(100),
    user_name varchar(100),
    creation_date date default current_date
);

Then we can make join queries to access multiple tables at once, or access the same table in such a way that multiple rows of the table are processed at the same time.

In a join query, it is good practice to qualify all column names so that the query won't fail if a duplicate column name exists in the query. We can relabel the tables with aliases to save some typing.

Self join

A self join is joining a table against itself.

SELECT u1.name, u1.height_m AS u1_height, u2.name, u2.height_m AS u2_height FROM user u1 JOIN user u2 ON u1.height_m < u2.height_m;

Inner join

We can inner join the user table and the course enrollment table in a SQL query. Only the rows that have matching values in the user_name/name column in the course_enrollment/user table are returned.

SELECT ce.course_name, u.name, u.email FROM course_enrollment ce JOIN user u ON ce.user_name = u.name;

Full Outer join

For outer joins, columns with missing data are substituted with null values.

SELECT ce.course_name, u.name, u.email FROM course_enrollment ce FULL OUTER JOIN user u ON ce.user_name = u.name;

Exclusive outer join

SELECT ce.course_name, u.name, u.email FROM course_enrollment ce FULL OUTER JOIN user u ON ce.user_name = u.name WHERE course_enrollment.user_name IS NULL OR user.name IS NULL;

Left outer join

SELECT ce.course_name, u.name, u.email FROM course_enrollment ce LEFT OUTER JOIN user u ON ce.user_name = u.name;

Exclusive left outer join

SELECT ce.course_name, u.name, u.email FROM course_enrollment ce LEFT OUTER JOIN user u ON ce.user_name = u.name WHERE user.name IS NULL;

Right outer join

SELECT ce.course_name, u.name, u.email FROM course_enrollment ce RIGHT OUTER JOIN user u ON ce.user_name = u.name;

Exclusive right outer join

SELECT ce.course_name, u.name, u.email FROM course_enrollment ce RIGHT OUTER JOIN user u ON ce.user_name = u.name WHERE course_enrollment.user_name IS NULL;

Aggregate function

An aggregate function computes a single result from multiple input rows. Here are some common functions.

-- Get the max height from the user table.
SELECT max(height_m) FROM user;

Subquery

-- Get the user who has the max height using a subquery.
SELECT name FROM user WHERE height_m = (SELECT max(height_m) FROM user)

Group by having

We can use GROUP BY to group multiple rows into one.

-- Get the user name and number of course enrollments for each user.
SELECT user_name, count(*) FROM course_enrollment GROUP BY user_name;

We can further filter the grouped rows with HAVING.

-- Get the user name and number of course enrollments for users who have more than 2 course enrollments.
SELECT user_name, count(*) FROM course_enrollment GROUP BY user_name HAVING count(*) > 2;

Filter

The FILTER removes rows from the input of the particular aggregate function that it is attached to.

-- Get the user name and number of course enrollments created after 2025-12-24 for each user.
SELECT user_name, count(*) FILTER (WHERE creation_date > '2025-12-24') FROM course_enrollment GROUP BY user_name;

Row update

UPDATE course_enrollment SET creation_date = '2025-12-24' WHERE creation_date = '2025-12-25';

Row deletion

DELETE FROM course_enrollment WHERE course_name = 'cardio';

Note that deletion without a qualification results in the deletion of all rows in the table.

-- This delete all rows in the course enrollment table.
DELETE FROM course_enrollment;

View

We can create a view based on a query for reuse.

CREATE VIEW course_enrollment_email_view AS
    SELECT ce.course_name, u.name, u.email
        FROM course_enrollment ce
        JOIN user u
        ON ce.user_name = u.name;

SELECT * FROM course_enrollment_email_view

Foreign key referential integrity

Often times we want to make sure that a foreign key column of a table do have a matching entry in that foreign key table. This is called the referential integrity.

CREATE TABLE user (
    id int generated by default as identity primary key,
    name varchar(100),
    email varchar(254) unique,
    height_m real,
    weight_kg real,
    creation_date date default current_date
);

CREATE TABLE course_enrollment (
    id int generated by default as identity primary key,
    course_name varchar(100),
    user_id int references user(id),
    user_name varchar(100),
    creation_date date default current_date
);

Transaction

A transaction is a single, all-or-nothing operation that consists of multiple statements. The intermediate state made by each statement of a transaction is invisible to other transactions. If any statement fails, the entire operation is canceled and no change is applied to the database, leaving the database intact.

This is known as the atomic property. Here is an example of bundling multiple statements into one transaction block.

BEGIN;
-- first statement
UPDATE user SET creation_date = '2025-12-24'
    WHERE creation_date = '2025-12-25';
-- another one
COMMIT;

If we don't want to commit the ongoing transaction, we can use ROLLBACK instead of COMMIT, and then all updates will be canceled.

PostgreSQL treats every SQL statement as being executed within a transaction by implicitly wrapping every command with a BEGIN and, if successful, a COMMIT, or a ROLLBACK otherwise.

We can use SAVEPOINT and ROLLBACK TO to keep the changes earlier than the savepoint and discard the rest. A savepoint can be released if it is no longer needed. Releasing or rolling back to a savepoint automatically releases all savepoints defined after it.

Window function

A window function performs a calculation across a set of rows called the window frame. If the PARTITION BY clause is supplied, the partition is the frame unless ORDER BY is supplied. If ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that have value equal to the current row based on the ORDER BY clause.

Unlike an aggregate function, a window function doesn't group the frame rows into one row.

The OVER clause determines how the input rows are split up for processing by the window function. The PARTITION BY clause within OVER divides the rows into partitions. We can omit the PARTITION BY to operate on all rows. We can control the order of rows processed by the window function using ORDER BY within OVER.

Window functions are permitted only in the SELECT list and the ORDER BY clause and are forbidden elsewhere because they execute after other clauses like WHERE or GROUP BY.

We can use a sub-select if we need to filter rows after the window function's calculation.

When we need to put multiple window functions inside a query, it is recommended to name each windowing in a WINDOW clause and then referenced in OVER.

-- Compare each user's weight with the average weight of the users with the same height.
SELECT id, name, weight_kg, avg(weight_kg), row_number() OVER (PARTITION BY height_m ORDER BY weight_kg) FROM user;

Execution order

  1. WHERE selects the rows.
  2. Aggregate function operate on group rows obtained from GROUP BY.
  3. HAVING selects the group rows, often using aggregate functions.
  4. Window function

Inheritance

Inheritance lets an inheriting table inherit columns from an inherited table.

CREATE TABLE cities (
    name text,
    population int,
    elevation int
);

CREATE TABLE capitals (
    state char(2) UNIQUE NOT NULL
) INHERITS (cities);

The date type of the column name is text, a native type for variable length character strings.

If we want the query to be performed on just the inherited table, we can do the following.

SELECT name, elevation
    FROM ONLY cities
    WHERE elevation > 500;

References

←Previous Next→