- Installation
- System architecture
- Role
- PSQL commands
-
Database
- Database creation
- Database deletion
- Schema
- Table
- Column
- Row
- Data name
- Data types
- Space in SQL command
- SQL comment
- Table creation
- Table deletion
- Row creation
- Row read
- Table join
- Aggregate function
- Subquery
- Group by having
- Filter
- Row update
- Row deletion
- View
- Foreign key referential integrity
- Transaction
- Window function
- Execution order
- Inheritance
- References
PostgreSQL is an open-source relational database.
Installation
Read the official documentation for installation on Ubuntu or other OS.
System architecture
- Server
- Root server process
- This is the manager process that creates a connection server process for each client connection and administers its lifecycle.
- Connection server process
- This is the process that communicates with a client via a TCP/IP network connection, managing database files, and performs database actions on behalf of the client.
- Root server process
- Client
- A client can be a text-oriented tool, a graphical application, or a web server.
Role
A role is like an account. PostgreSQL by default creates a role, postgres, with superuser privilege. The superuser privilege lets it bypass all safety checks so using it comes with high risk, as we may accidentally delete things. It is recommended to use the postgres role to create a custom role with the least privilege needed and use the custom role to interact with PostgreSQL.
Role creation
Start a PostgreSQL interactive terminal, psql, and create the target role using Structured Query Language (SQL).
sudo -i -u postgres psql # Execute psql as user postgres and database postgres.
postgres=# CREATE ROLE <ROLE> WITH LOGIN PASSWORD <PASSWORD> CREATEROLE CREATEDB;
postgres=# \du # List all roles.
postgres=# \q # Exit psql.
Add an entry to pg_hba.conf (/etc/postgresql/<VERSION>/main/pg_hba.conf for ubuntu) to enable the connection to server.
sudo vim /etc/postgresql/<VERSION>/main/pg_hba.conf
# Append "local <DATABASE> <ROLE> scram-sha-256" to the end of the file.
sudo systemctl reload-or-restart postgresql
Role deletion
We drop the objects owned by the target role in each database.
psql -U <ROLE> -d <DATABASE>
<DATABASE>=# DROP OWNED BY <ROLE>;
Then, we drop the target role.
sudo -i -u postgres psql
postgres=# DROP ROLE <ROLE>;
Role management tree
- postgres
- Create and manage the project-admin role for each project.
- Have privileges like superuser, createrole, createdb, replication, bypass Row-Level Security (RLS).
- project-admin
- Create and manage the project-app role, database, schema, table, column privilege grant, RLS.
- Have privileges like createrole, createdb.
- project-app
- Provide the role for application to connect and interact with the database.
PSQL commands
\h # Get help.
\du # List all roles.
\l # List all databases.
\l+ # List all databases with extra details.
\dn # List all schemas in the current database.
\dn+ # List all schemas in the current database with privilege details.
\d # List all tables and sequences in the current database.
\dt # List all tables in the current database.
\z <TABLE> # List privileges of the table.
\z public.* # List privileges for all tables in the public schema.
\df # List all functions in the current database.
\df+ # List all functions in the current database with privilege details.
\i commands.sql # Read and execute the commands.sql file.
\q # Exit psql.
Database
Database is like a container with multiple layers to store and organize our data. Let's see what's inside a database.
First we encounter schemas. We can think of schemas as directories in a file system, isolating a group of files from other groups. Inside a schema we see tables, which are like spreadsheet files that actually store our data.
- Schema
- A schema groups tables together and provides isolation.
- Table
- A table is used to store relational data.
- Column
- A column has a specific data name and data type. A table has a set of columns that group data with different name and type together. Taking a user data table for example, one column can be name and another column can be email. Note that the columns of a table have a fixed order.
- Row
- A row is like a record in the table. In the user data table example, a row represents a user, and those column entries of that row are the name, the email, and other information of that user. Note that the rows of a table aren't guaranteed to have a fixed order.
Database creation
We can start a psql terminal to create a database. A database name must start with an alphabetic character and is limited to 63 bytes in length.
psql -U <ROLE> -d postgres
postgres=# CREATE DATEBASE <DATABASE>;
Or we can use the binary provided by PostgreSQL to create a database.
createdb <DATABASE>
Database deletion
We can start a psql terminal to delete a database.
psql -U <ROLE> -d postgres
postgres=# DROP DATEBASE <DATABASE>;
Or we can use the binary provided by PostgreSQL to delete a database.
dropdb <DATABASE>
Schema
Table
Column
Row
Row level security
INSERT
When the SQL contains RETURNING, PostgreSQL creates a filter by using AND to group the USING clause of the FOR ALL policy, and the USING clause of the SELECT policy. It applies this filter first and then finds the rows to be returned.
SELECT
PostgreSQL creates a filter by using AND to group the USING clause of the FOR ALL policy, and the USING clause of the SELECT policy. It applies this filter first and then processes the WHERE clause in the SQL to find the rows to be selected.
Update
A UPDATE operation is a two-step process behind the scenes.
- PostgreSQL creates a filter by using
ANDto group theUSINGclause of theFOR ALLpolicy, theUSINGclause of theSELECTpolicy, and theUSINGclause of theUPDATEpolicy. It applies this filter first and then processes theWHEREclause in the SQL to find the rows to be updated. - Once a row is found, it validates the new data with the
WITH CHECKclause of theUPDATEpolicy to determine if the update is accepted.
Delete
A DELETE operation is a two-step process behind the scenes.
- PostgreSQL creates a filter by using
ANDto group theUSINGclause of theFOR ALLpolicy, theUSINGclause of theSELECTpolicy, and theUSINGclause of theDELETEpolicy. It applies this filter first and then processes theWHEREclause in the SQL to find the rows to be deleted. - Once a row is found, it deletes the row.
Data name
A table has its name. A column also has its name.
In PostgreSQL, using unquoted names is a best practice for naming.
Unquoted names must begin with a letter or an underscore. Subsequent characters can be letters, digits, underscores, or dollar signs. Unquoted names are automatically folded to lowercase so it is recommended to just use lowercase letters.
We can use arbitrary characters like uppercase letter, spaces and hyphens for naming if we enclose the name in double quotes. However, this is a poor practice because we then have to use double quotes everywhere.
Double quotes are used for identifiers like table and column names that contain reserved keywords, special characters or spaces.
Single quotes are used for string literals.
Data types
Each column has its own data type, and here is a list of the standard SQL data types.
- smallint (2 bytes)
- integer/int (4 bytes)
- bigint (8 bytes)
- real (4 bytes)
- single-precision floating-point number
- double precision (8 bytes)
- double-precision floating-point number
- char(N) (1 byte length metadata overhead + actual data length)
- string with a fixed length of N
- padding with blank
- Each character can use 1 to 4 bytes.
- data size limit is 1 GB
- varchar(N) (1 byte length metadata overhead + actual data length)
- variable-length string up to N characters in length
- Each character can use 1 to 4 bytes.
- data size limit is 1 GB
- text
- string with no length limit
- Each character can use 1 to 4 bytes.
- data size limit is 1 GB
- date (4 bytes)
- date only
- time (8 bytes)
- time of day only
- timestamp (8 bytes)
- date and time without time zone
- timestamptz (8 bytes)
- date and time with time zone
- interval (16 bytes)
- 3 fields of months, days, seconds
Space in SQL command
SQL commands ignore extra spaces.
SQL comment
Use -- before a comment and everything is ignored up to the end of the line.
Table creation
Assume we want to create a gym user table.
CREATE TABLE user (
id int generated by default as identity primary key,
name varchar(100),
email varchar(254) unique,
height_m real,
weight_kg real,
creation_date date default current_date
);
Table primary key
We usually use a surrogate key like an auto-increment integer ID as the table primary key, rather than using a natural key such as a string. This avoids potential technical debt because the value of a surrogate key never changes while the value of a natural key may change in the future. If we choose the natural key path and then a value change happens, we will need to update everything related to it, from primary key in the original table, foreign key in other tables, and indexes.
Table deletion
DROP TABLE user;
Row creation
-- Explicit columns
-- The column order can be altered.
-- Any unrequired column can be omitted.
INSERT INTO user (name, email, height_m, weight_kg, creation_date) VALUES ('Bob', 'bob@gmail.com', 1.8, 70, '2024-12-23');
-- Shorter version
INSERT INTO user VALUES ('Bob', 'bob@gmail.com', 1.8, 70, '2024-12-23');
We can use the COPY keyword to load data from text files too.
COPY user FROM '/tmp/user.txt';
Temporary relational data creation in Go
When we need a temporary data structure to hold some relational data, we can do either Common Table Expression (CTE) with unnest or temporary table with CopyFrom.
- CTE with
unnest- Faster for 10 K rows or below due to no Data Definition Language (DDL) overhead
SELECT unnest($1::int[]) AS data - Temporary table with
CopyFrom- Faster for 10 K rows or above due to higher throughput from binary copy
- Analyzable and indexable by query planner
- Need
ON COMMIT DROPin the query to delete the temporary table when the transaction finishes. Otherwise, the database connection keeps the table and will throwrelation already existserror if the connection is used to run the same query.
Row read
We use the SELECT statement to query data from a table. The SELECT statement has 3 parts.
- Select list
- List the columns or expressions to be returned.
- Parsed from the entries after the keyword
SELECT
- Table list
- List the tables from which to retrieve the data.
- Parsed from the entries after the keyword
FROM
- Row qualification
- Specify any row filtering criteria.
- Parsed from the entries after the keyword
WHERE - Optional
-- Select all columns from the user table without row filtering criteria.
SELECT * FROM user;
-- Select the name column and bmi expression from the user table without row filtering criteria.
SELECT name, weight_kg/(height_m*height_m) AS bmi FROM user;
-- Select the name and height_m column from the user table with a qualification specifying that only rows with height_m > 1.8 and weight_kg < 70 are wanted. Remove rows with duplicate names. The result should be ordered by name in descending order.
SELECT DISTINCT name, height_m FROM user WHERE height_m > 1.8 AND weight_kg < 70 ORDER BY name DESC;
- Use
*to select all columns. - Use
ASto relabel a output column. - Use
AND,OR, andNOTin theWHEREqualification expression. - Use
ORDER byto sort the returned result.
Table join
Assume we also have a gym course enrollment table that links a course to each enrolled user.
CREATE TABLE course_enrollment (
id int generated by default as identity primary key,
course_name varchar(100),
user_name varchar(100),
creation_date date default current_date
);
Then we can make join queries to access multiple tables at once, or access the same table in such a way that multiple rows of the table are processed at the same time.
In a join query, it is good practice to qualify all column names so that the query won't fail if a duplicate column name exists in the query. We can relabel the tables with aliases to save some typing.
Self join
A self join is joining a table against itself.
SELECT u1.name, u1.height_m AS u1_height, u2.name, u2.height_m AS u2_height FROM user u1 JOIN user u2 ON u1.height_m < u2.height_m;
Inner join
We can inner join the user table and the course enrollment table in a SQL query. Only the rows that have matching values in the user_name/name column in the course_enrollment/user table are returned.
SELECT ce.course_name, u.name, u.email FROM course_enrollment ce JOIN user u ON ce.user_name = u.name;
Full Outer join
For outer joins, columns with missing data are substituted with null values.
SELECT ce.course_name, u.name, u.email FROM course_enrollment ce FULL OUTER JOIN user u ON ce.user_name = u.name;
Exclusive outer join
SELECT ce.course_name, u.name, u.email FROM course_enrollment ce FULL OUTER JOIN user u ON ce.user_name = u.name WHERE course_enrollment.user_name IS NULL OR user.name IS NULL;
Left outer join
SELECT ce.course_name, u.name, u.email FROM course_enrollment ce LEFT OUTER JOIN user u ON ce.user_name = u.name;
Exclusive left outer join
SELECT ce.course_name, u.name, u.email FROM course_enrollment ce LEFT OUTER JOIN user u ON ce.user_name = u.name WHERE user.name IS NULL;
Right outer join
SELECT ce.course_name, u.name, u.email FROM course_enrollment ce RIGHT OUTER JOIN user u ON ce.user_name = u.name;
Exclusive right outer join
SELECT ce.course_name, u.name, u.email FROM course_enrollment ce RIGHT OUTER JOIN user u ON ce.user_name = u.name WHERE course_enrollment.user_name IS NULL;
Aggregate function
An aggregate function computes a single result from multiple input rows. Here are some common functions.
- count
- sum
- avg
- min
- max
-- Get the max height from the user table.
SELECT max(height_m) FROM user;
Subquery
-- Get the user who has the max height using a subquery.
SELECT name FROM user WHERE height_m = (SELECT max(height_m) FROM user)
Group by having
We can use GROUP BY to group multiple rows into one.
-- Get the user name and number of course enrollments for each user.
SELECT user_name, count(*) FROM course_enrollment GROUP BY user_name;
We can further filter the grouped rows with HAVING.
-- Get the user name and number of course enrollments for users who have more than 2 course enrollments.
SELECT user_name, count(*) FROM course_enrollment GROUP BY user_name HAVING count(*) > 2;
Filter
The FILTER removes rows from the input of the particular aggregate function that it is attached to.
-- Get the user name and number of course enrollments created after 2025-12-24 for each user.
SELECT user_name, count(*) FILTER (WHERE creation_date > '2025-12-24') FROM course_enrollment GROUP BY user_name;
Row update
UPDATE course_enrollment SET creation_date = '2025-12-24' WHERE creation_date = '2025-12-25';
Row deletion
DELETE FROM course_enrollment WHERE course_name = 'cardio';
Note that deletion without a qualification results in the deletion of all rows in the table.
-- This delete all rows in the course enrollment table.
DELETE FROM course_enrollment;
View
We can create a view based on a query for reuse.
CREATE VIEW course_enrollment_email_view AS
SELECT ce.course_name, u.name, u.email
FROM course_enrollment ce
JOIN user u
ON ce.user_name = u.name;
SELECT * FROM course_enrollment_email_view
Foreign key referential integrity
Often times we want to make sure that a foreign key column of a table do have a matching entry in that foreign key table. This is called the referential integrity.
CREATE TABLE user (
id int generated by default as identity primary key,
name varchar(100),
email varchar(254) unique,
height_m real,
weight_kg real,
creation_date date default current_date
);
CREATE TABLE course_enrollment (
id int generated by default as identity primary key,
course_name varchar(100),
user_id int references user(id),
user_name varchar(100),
creation_date date default current_date
);
Transaction
A transaction is a single, all-or-nothing operation that consists of multiple statements. The intermediate state made by each statement of a transaction is invisible to other transactions. If any statement fails, the entire operation is canceled and no change is applied to the database, leaving the database intact.
This is known as the atomic property. Here is an example of bundling multiple statements into one transaction block.
BEGIN;
-- first statement
UPDATE user SET creation_date = '2025-12-24'
WHERE creation_date = '2025-12-25';
-- another one
COMMIT;
If we don't want to commit the ongoing transaction, we can use ROLLBACK instead of COMMIT, and then all updates will be canceled.
PostgreSQL treats every SQL statement as being executed within a transaction by implicitly wrapping every command with a BEGIN and, if successful, a COMMIT, or a ROLLBACK otherwise.
We can use SAVEPOINT and ROLLBACK TO to keep the changes earlier than the savepoint and discard the rest. A savepoint can be released if it is no longer needed. Releasing or rolling back to a savepoint automatically releases all savepoints defined after it.
Window function
A window function performs a calculation across a set of rows called the window frame. If the PARTITION BY clause is supplied, the partition is the frame unless ORDER BY is supplied. If ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that have value equal to the current row based on the ORDER BY clause.
Unlike an aggregate function, a window function doesn't group the frame rows into one row.
The OVER clause determines how the input rows are split up for processing by the window function. The PARTITION BY clause within OVER divides the rows into partitions. We can omit the PARTITION BY to operate on all rows. We can control the order of rows processed by the window function using ORDER BY within OVER.
Window functions are permitted only in the SELECT list and the ORDER BY clause and are forbidden elsewhere because they execute after other clauses like WHERE or GROUP BY.
We can use a sub-select if we need to filter rows after the window function's calculation.
When we need to put multiple window functions inside a query, it is recommended to name each windowing in a WINDOW clause and then referenced in OVER.
-- Compare each user's weight with the average weight of the users with the same height.
SELECT id, name, weight_kg, avg(weight_kg), row_number() OVER (PARTITION BY height_m ORDER BY weight_kg) FROM user;
Execution order
WHEREselects the rows.- Aggregate function operate on group rows obtained from
GROUP BY. HAVINGselects the group rows, often using aggregate functions.- Window function
Inheritance
Inheritance lets an inheriting table inherit columns from an inherited table.
CREATE TABLE cities (
name text,
population int,
elevation int
);
CREATE TABLE capitals (
state char(2) UNIQUE NOT NULL
) INHERITS (cities);
The date type of the column name is text, a native type for variable length character strings.
If we want the query to be performed on just the inherited table, we can do the following.
SELECT name, elevation
FROM ONLY cities
WHERE elevation > 500;