teyuanliu.github.io

PostgreSQL is an open-source relational database.

Server
- Root server process
  - This is the manager process that creates a connection server process for each client connection and administers its lifecycle.
- Connection server process
  - This is the process that communicates with a client via a TCP/IP network connection, managing database files, and performs database actions on behalf of the client.
Client
- A client can be a text-oriented tool, a graphical application, or a web server.

Naming rule

In PostgreSQL, using unquoted names is a best practice for naming.

Unquoted names must begin with a letter or an underscore. Subsequent characters can be letters, digits, underscores, or dollar signs. Unquoted names are automatically folded to lowercase so it is recommended to just use lowercase letters.

We can use arbitrary characters like uppercase letter, spaces and hyphens for naming if we enclose the name in double quotes. However, this is a poor practice because we then have to use double quotes everywhere.

Double quotes are often used for identifiers like table and column names that contain reserved keywords, special characters or spaces.

Single quotes are used for string literals.

Role

A role is like an account. PostgreSQL by default creates a role, postgres, with superuser privilege. The superuser privilege lets it bypass all safety checks so using it comes with high risk, as we may accidentally delete things. It is recommended to use the postgres role to create a custom role with the least privilege needed and use the custom role to interact with PostgreSQL.

Role management tree

flowchart TB postgres --> project-1-admin project-1-admin --> project-1-app postgres --> project-2-admin project-2-admin --> project-2-app

postgres
- Create and manage the project-admin role for each project.
- Have privileges like superuser, createrole, createdb, replication, bypass Row Level Security (RLS).
project-admin
- Create and manage the project-app role, database, schema, table, column privilege grant, RLS.
- Have privileges like createrole, createdb.
project-app
- Provide the role for application to connect and interact with the database.

Structured Query Language (SQL) rule

SQL command ignores extra spaces.
Use -- before a comment and everything is ignored up to the end of the line.

Role creation

Start a PostgreSQL interactive terminal, psql, and create the target role using SQL.

sudo -i -u postgres psql # Execute psql as user postgres and database postgres.
postgres=# CREATE ROLE <ROLE> WITH LOGIN PASSWORD <PASSWORD> CREATEROLE CREATEDB;
postgres=# \du # List all roles.
postgres=# \q # Exit psql.

Add an entry to pg_hba.conf (/etc/postgresql/<VERSION>/main/pg_hba.conf for ubuntu) to enable the connection to server.

sudo vim /etc/postgresql/<VERSION>/main/pg_hba.conf
# Append "local <DATABASE> <ROLE> scram-sha-256" to the end of the file.
sudo systemctl reload-or-restart postgresql

Role deletion

We drop the objects owned by the target role in each database.

psql -U <ROLE> -d <DATABASE>
<DATABASE>=# DROP OWNED BY <ROLE>;

Then, we drop the target role.

sudo -i -u postgres psql
postgres=# DROP ROLE <ROLE>;

PSQL command

\h # Get help.
\du # List all roles.
\l # List all databases.
\l+ # List all databases with extra details.
\dn # List all schemas in the current database.
\dn+ # List all schemas in the current database with privilege details.
\d # List all tables and sequences in the current database.
\dt # List all tables in the current database.
\z <TABLE> # List privileges of the table.
\z public.* # List privileges for all tables in the public schema.
\df # List all functions in the current database.
\df+ # List all functions in the current database with privilege details.
\i commands.sql # Read and execute the commands.sql file.
\q # Exit psql.

Database

Database is like a container with multiple layers to store and organize our data. Let's see what's inside a database.

First we encounter schemas. We can think of schemas as directories in a file system, isolating a group of files from other groups. Inside a schema we see tables, which are like spreadsheet files that actually store our data.

Schema
- A schema groups tables together and provides isolation.
Table
- A table is used to store relational data.
Column
- A column has a specific data name and data type. A table has a set of columns that group data with different name and type together. Taking a user data table for example, one column can be name and another column can be email. Note that the columns of a table have a fixed order.
Row
- A row is like a record in the table. In the user data table example, a row represents a user, and those column entries of that row are the name, the email, and other information of that user. Note that the rows of a table aren't guaranteed to have a fixed order.

Database creation

We can start a psql terminal to create a database. A database name must start with an alphabetic character and is limited to 63 bytes in length.

psql -U <ROLE> -d postgres
postgres=# CREATE DATEBASE <DATABASE>;

Or we can use the binary provided by PostgreSQL to create a database.

createdb <DATABASE>

Database deletion

We can start a psql terminal to delete a database.

psql -U <ROLE> -d postgres
postgres=# DROP DATEBASE <DATABASE>;

Or we can use the binary provided by PostgreSQL to delete a database.

dropdb <DATABASE>

Schema

Table

Table creation

Assume we want to create a gym user table.

CREATE TABLE user (
    id int generated by default as identity primary key,
    name varchar(100),
    email varchar(254) unique,
    height_m real,
    weight_kg real,
    creation_date date default current_date
);

Table primary key

We usually use a surrogate key like an auto-increment integer ID as the table primary key, rather than using a natural key such as a string. This avoids potential technical debt because the value of a surrogate key never changes while the value of a natural key may change in the future. If we choose the natural key path and then a value change happens, we will need to update everything related to it, from primary key in the original table, foreign key in other tables, and indexes.

Table foreign key referential integrity

Often times we want to make sure that a foreign key column of a table do have a matching entry in that foreign key table. This is called the referential integrity.

CREATE TABLE user (
    id int generated by default as identity primary key,
    name varchar(100),
    email varchar(254) unique,
    height_m real,
    weight_kg real,
    creation_date date default current_date
);

CREATE TABLE course_enrollment (
    id int generated by default as identity primary key,
    course_name varchar(100),
    user_id int references user(id),
    user_name varchar(100),
    creation_date date default current_date
);

Table inheritance

Inheritance lets an inheriting table inherit columns from an inherited table.

CREATE TABLE cities (
    name text,
    population int,
    elevation int
);

CREATE TABLE capitals (
    state char(2) UNIQUE NOT NULL
) INHERITS (cities);

The date type of the column name is text, a native type for variable length character strings.

If we want the query to be performed on just the inherited table, we can do the following.

SELECT name, elevation
    FROM ONLY cities
    WHERE elevation > 500;

Table deletion

DROP TABLE user;

Column

Column data type

Each column has its own data type, and here is a list of the standard SQL data types.

smallint (2 bytes)
integer/int (4 bytes)
bigint (8 bytes)
real (4 bytes)
- single-precision floating-point number
double precision (8 bytes)
- double-precision floating-point number
char(N) (1 byte length metadata overhead + actual data length)
- string with a fixed length of N
- padding with blank
- Each character can use 1 to 4 bytes.
- data size limit is 1 GB
varchar(N) (1 byte length metadata overhead + actual data length)
- variable-length string up to N characters in length
- Each character can use 1 to 4 bytes.
- data size limit is 1 GB
text
- string with no length limit
- Each character can use 1 to 4 bytes.
- data size limit is 1 GB
date (4 bytes)
- date only
time (8 bytes)
- time of day only
timestamp (8 bytes)
- date and time without time zone
timestamptz (8 bytes)
- date and time with time zone
interval (16 bytes)
- 3 fields of months, days, seconds

Column privilege

The column privilege system lets us grant read and write privilege to role by column.

GRANT
    INSERT(<COLUMN>),
    SELECT(<COLUMN>),
    UPDATE(<COLUMN>),
    DELETE
ON <SCHEMA>.<TABLE> TO <ROLE>;

We can also revoke column privileges.

REVOKE
    INSERT,
    SELECT,
    UPDATE,
    DELETE
ON <SCHEMA>.<TABLE> FROM <ROLE>;

Row

Row creation

-- Explicit columns
-- The column order can be altered.
-- Any unrequired column can be omitted.
INSERT INTO user (name, email, height_m, weight_kg, creation_date) VALUES ('Bob', 'bob@gmail.com', 1.8, 70, '2024-12-23');

-- Shorter version
INSERT INTO user VALUES ('Bob', 'bob@gmail.com', 1.8, 70, '2024-12-23');

We can use the COPY keyword to load data from text files too.

COPY user FROM '/tmp/user.txt';

Temporary relational data creation in Go

When we need a temporary data structure to hold some relational data, we can do either Common Table Expression (CTE) with unnest or temporary table with CopyFrom.

CTE with unnest
- Faster for 10 K rows or below due to no Data Definition Language (DDL) overhead
```
SELECT unnest($1::int[]) AS data
```
Temporary table with CopyFrom
- Faster for 10 K rows or above due to higher throughput from binary copy
- Analyzable and indexable by query planner
- Need ON COMMIT DROP in the query to delete the temporary table when the transaction finishes. Otherwise, the database connection keeps the table and will throw relation already exists error if the connection is used to run the same query.

Row read

We use the SELECT statement to query data from a table. The SELECT statement has 3 parts.

Select list
- List the columns or expressions to be returned.
- Parsed from the entries after the keyword SELECT
Table list
- List the tables from which to retrieve the data.
- Parsed from the entries after the keyword FROM
Row qualification
- Specify any row filtering criteria.
- Parsed from the entries after the keyword WHERE
- Optional

-- Select all columns from the user table without row filtering criteria.
SELECT * FROM user;

-- Select the name column and bmi expression from the user table without row filtering criteria.
SELECT name, weight_kg/(height_m*height_m) AS bmi FROM user;

-- Select the name and height_m column from the user table with a qualification specifying that only rows with height_m > 1.8 and weight_kg < 70 are wanted. Remove rows with duplicate names. The result should be ordered by name in descending order.
SELECT DISTINCT name, height_m FROM user WHERE height_m > 1.8 AND weight_kg < 70 ORDER BY name DESC;

Use * to select all columns.
Use AS to relabel a output column.
Use AND, OR, and NOT in the WHERE qualification expression.
Use ORDER by to sort the returned result.

Search argument

Search argument (SARG) is the condition in a query, typically in a WHERE or HAVING clause, that PostgreSQL can look up efficiently using an index. We describe this kind of query as sargable.

Sargable

WHERE date >= '2026-01-01' AND date <= '2026-12-31'
WHERE number = 10

Non-sargable
```
WHERE YEAR(date) = 2026
WHERE number + 5 = 15
```
PostgreSQL has to apply the YEAR function/math operation to every row before the date/number can be checked.

Table join

Assume we also have a gym course enrollment table that links a course to each enrolled user.

CREATE TABLE course_enrollment (
    id int generated by default as identity primary key,
    course_name varchar(100),
    user_name varchar(100),
    creation_date date default current_date
);

Then we can make join queries to access multiple tables at once, or access the same table in such a way that multiple rows of the table are processed at the same time.

In a join query, it is good practice to qualify all column names so that the query won't fail if a duplicate column name exists in the query. We can relabel the tables with aliases to save some typing.

Self join

A self join is joining a table against itself.

SELECT u1.name, u1.height_m AS u1_height, u2.name, u2.height_m AS u2_height FROM user u1 JOIN user u2 ON u1.height_m < u2.height_m;

Inner join

We can inner join the user table and the course enrollment table in a SQL query. Only the rows that have matching values in the user_name/name column in the course_enrollment/user table are returned.

SELECT ce.course_name, u.name, u.email FROM course_enrollment ce JOIN user u ON ce.user_name = u.name;

Full Outer join

For outer joins, columns with missing data are substituted with null values.

SELECT ce.course_name, u.name, u.email FROM course_enrollment ce FULL OUTER JOIN user u ON ce.user_name = u.name;

Exclusive outer join

SELECT ce.course_name, u.name, u.email FROM course_enrollment ce FULL OUTER JOIN user u ON ce.user_name = u.name WHERE course_enrollment.user_name IS NULL OR user.name IS NULL;

Left outer join

SELECT ce.course_name, u.name, u.email FROM course_enrollment ce LEFT OUTER JOIN user u ON ce.user_name = u.name;

Exclusive left outer join

SELECT ce.course_name, u.name, u.email FROM course_enrollment ce LEFT OUTER JOIN user u ON ce.user_name = u.name WHERE user.name IS NULL;

Right outer join

SELECT ce.course_name, u.name, u.email FROM course_enrollment ce RIGHT OUTER JOIN user u ON ce.user_name = u.name;

Exclusive right outer join

SELECT ce.course_name, u.name, u.email FROM course_enrollment ce RIGHT OUTER JOIN user u ON ce.user_name = u.name WHERE course_enrollment.user_name IS NULL;

Aggregate function

An aggregate function computes a single result from multiple input rows. Here are some common functions.

count
sum
avg
min
max

-- Get the max height from the user table.
SELECT max(height_m) FROM user;

Subquery

-- Get the user who has the max height using a subquery.
SELECT name FROM user WHERE height_m = (SELECT max(height_m) FROM user)

Group by having

We can use GROUP BY to group multiple rows into one.

-- Get the user name and number of course enrollments for each user.
SELECT user_name, count(*) FROM course_enrollment GROUP BY user_name;

We can further filter the grouped rows with HAVING.

-- Get the user name and number of course enrollments for users who have more than 2 course enrollments.
SELECT user_name, count(*) FROM course_enrollment GROUP BY user_name HAVING count(*) > 2;

Filter

The FILTER removes rows from the input of the particular aggregate function that it is attached to.

-- Get the user name and number of course enrollments created after 2025-12-24 for each user.
SELECT user_name, count(*) FILTER (WHERE creation_date > '2025-12-24') FROM course_enrollment GROUP BY user_name;

View

We can create a view based on a query for reuse.

CREATE VIEW course_enrollment_email_view AS
    SELECT ce.course_name, u.name, u.email
        FROM course_enrollment ce
        JOIN user u
        ON ce.user_name = u.name;

SELECT * FROM course_enrollment_email_view

Window function

A window function performs a calculation across a set of rows called the window frame. If the PARTITION BY clause is supplied, the partition is the frame unless ORDER BY is supplied. If ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that have value equal to the current row based on the ORDER BY clause.

Unlike an aggregate function, a window function doesn't group the frame rows into one row.

The OVER clause determines how the input rows are split up for processing by the window function. The PARTITION BY clause within OVER divides the rows into partitions. We can omit the PARTITION BY to operate on all rows. We can control the order of rows processed by the window function using ORDER BY within OVER.

Window functions are permitted only in the SELECT list and the ORDER BY clause and are forbidden elsewhere because they execute after other clauses like WHERE or GROUP BY.

We can use a sub-select if we need to filter rows after the window function's calculation.

When we need to put multiple window functions inside a query, it is recommended to name each windowing in a WINDOW clause and then referenced in OVER.

-- Compare each user's weight with the average weight of the users with the same height.
SELECT id, name, weight_kg, avg(weight_kg), row_number() OVER (PARTITION BY height_m ORDER BY weight_kg) FROM user;

Execution order

WHERE selects the rows.
Aggregate function operate on group rows obtained from GROUP BY.
HAVING selects the group rows, often using aggregate functions.
Window function

Row update

UPDATE course_enrollment SET creation_date = '2025-12-24' WHERE creation_date = '2025-12-25';

Row deletion

DELETE FROM course_enrollment WHERE course_name = 'cardio';

Note that deletion without a qualification results in the deletion of all rows in the table.

-- This delete all rows in the course enrollment table.
DELETE FROM course_enrollment;

Row Level Security (RLS)

RLS lets us add extra WHERE filters to our INSERT, SELECT, UPDATE, and DELETE query. It gives a row read filter via the USING clause and a row write filter via the WITH CHECK clause. Each kind of query can use either one or both filters.

INSERT
- WITH CHECK
- USING (only when the query contains RETURNING)
SELECT
- USING
UPDATE
- USING
- WITH CHECK
DELETE
- USING

We can turn on RLS and by default it uses a deny-all policy.

ALTER TABLE <SCHEMA>.<TABLE> ENABLE ROW LEVEL SECURITY;

We can turn it off if not needed.

ALTER TABLE <SCHEMA>.<TABLE> DISABLE ROW LEVEL SECURITY;

INSERT

When we create an INSERT policy, its WITH CHECK clause is applied to the new data we are trying to write.

CREATE POLICY <POLICY> ON <SCHEMA>.<TABLE> FOR INSERT TO <ROLE> WITH CHECK (TRUE);

An INSERT policy doesn't need the USING clause except for INSERT queries that contain RETURNING.

CREATE POLICY <POLICY> ON <SCHEMA>.<TABLE> FOR INSERT TO <ROLE> WITH CHECK (TRUE) USING (TRUE);

We can delete a policy if not needed.

DROP POLICY IF EXISTS <POLICY> ON <SCHEMA>.<TABLE>;

SELECT

When we create a SELECT policy, its USING clause is applied to the data we are trying to read.

CREATE POLICY <POLICY> ON <SCHEMA>.<TABLE> FOR SELECT TO <ROLE> USING (TRUE);

UPDATE

When we create a UPDATE policy, its USING clause is applied to the row we are trying to read, and its WITH CHECK clause is applied to the new data we are trying to write.

Note that PostgreSQL uses AND to join the UPDATE policy's USING clause, with the SELECT policy's USING clause. Therefore, if the UPDATE policy has USING (TRUE) but the SELECT policy has USING (current_setting('app.user_session_token', TRUE) = token), then the result row read filter will be USING (TRUE) AND USING (current_setting('app.user_session_token', TRUE) = token) and constraint by the SELECT policy one.

CREATE POLICY <POLICY> ON <SCHEMA>.<TABLE> FOR UPDATE TO <ROLE> USING (TRUE) WITH CHECK (TRUE);

We can run the query with EXPLAIN to know whether the row read filter is blocking our query.

EXPLAIN <QUERY>

DELETE

When we create a DELETE policy, its USING clause combined with the SELECT policy's USING clause is applied to the row we are trying to read.

CREATE POLICY <POLICY> ON <SCHEMA>.<TABLE> FOR DELETE TO <ROLE> USING (TRUE);

Transaction

A transaction is a single, all-or-nothing operation that consists of multiple statements. The intermediate state made by each statement of a transaction is invisible to other transactions. If any statement fails, the entire operation is canceled and no change is applied to the database, leaving the database intact.

This is known as the atomic property. Here is an example of bundling multiple statements into one transaction block.

BEGIN;
-- first statement
UPDATE user SET creation_date = '2025-12-24'
    WHERE creation_date = '2025-12-25';
-- another one
COMMIT;

If we don't want to commit the ongoing transaction, we can use ROLLBACK instead of COMMIT, and then all updates will be canceled.

PostgreSQL treats every SQL statement as being executed within a transaction by implicitly wrapping every command with a BEGIN and, if successful, a COMMIT, or a ROLLBACK otherwise.

We can use SAVEPOINT and ROLLBACK TO to keep the changes earlier than the savepoint and discard the rest. A savepoint can be released if it is no longer needed. Releasing or rolling back to a savepoint automatically releases all savepoints defined after it.

References

PostgreSQL tutorial

←Previous Next→