Diary of a Devman: October 2023

Yesterday, I was in a event, talking about Transactions, ACID, Isolation phaenomena, etc., under Postgresql.

In Postgresql or any other DBMS, Transactions might be running in sequence (serializable) or concurrently, depending on the Isolation level set (they are implemented by DBMS vendors in their own way).

The exagerated simple database table presented above, its two records of bank account holders and their balances, will be used for this article.

The focus of the discussion is about concurrent Transactions with READ COMMITED Isolation Level (default in Postgresql), but also demonstrating two transactions running, one SERIALIZABLE, and the other READ COMMITED.

READ COMMITED vs READ COMMITED

Two transactions were started. Observe the CLOCK icons on PgAdmin, indicating that a Transaction is in progress (on each Query Tool window)...

On the left: the balance of Steve Jobs was already updated to 1100

On the right: Steve Jobs still have 1000 of balance. Why?

Postgresql default Isolation level is READ COMMITED, which does not allow DIRTY READ phaenomena (data which is not commited yet by one Transaction, is visible to other Transactions).

Adding More Money

On the left: the Transaction didn't finished, yet.

On the right: more money to Steve Jobs account, but the query doesn't finish...

The database table accounts is locked by the Transaction on the left. Only when this Transaction finishes (COMMIT), is when the Transaction on the right will add 400 bucks more on Steve Jobs acccount.

On the left: Transaction has finished and 100 bucks were deposited to Steve Jobs account

On the right: still in progress, but since the Transaction on the left finished (table released from lock), then the Transaction on the right sees a balance of 1100, adding 400 to it.

But why data commited from one Transaction is visible to another Transaction that is in progress?

On Isolation level of READ COMMITED, a Transaction can read data that was COMMITED. That being said, one phaenomena that might happen is what is called NON-REPEATABLE READ, where changes commited by one Transaction in a column, are VISIBLE to Transactions with Isolation Level of READ COMMITED.

If these Transactions read the data again, they might get a different value (NON-REPEATABLE), which might lead to some data inconsistencies during READ and WRITE operations of concurrent Transactions, of course, without affecting the ACID property of "Consistency" (transactions must respect database constraints - e.g. balance >= 0).

At the end, Transaction on the right has finished, adding 400 bucks more on Steve Jobs account, leaving the account with a balance of 1500 on the left, where no Transaction is in progress.

SERIALIZABLE vs READ COMMITED

Just as before, two Transactions in progress, being the first with Isolation Level SERIALIZABLE.

Records are still the same since the previous example.

Just as the specification, the PHANTOM READ phaenomena doesn't happen with a Transaction running with Isolation Level SERIAZABLE...

...even when the READ COMMITED Transaction finishes.

Doesn't matter: SERIAZABLE Transaction will see what's happening on it's snapshot of accounts table, and nothing else that is happening in other Transactions.

Some Reflections

Weaker isolation levels promote more performance (less overhead) at the cost of temporary data inconsistency.

It really depends on the Business Rules that govern the design of a Backend service, to determine which Isolation Level best suits. Maybe a PHANTOM READ or a NONREPETABLE READ might not cause any trouble in terms of UX for an application that is performing operations on the Stock Market (prices fluctuating from seconds to seconds), but maybe for an E-commerce solution, having a price changed all of sudden for a product that is current under a sell/buy transaction with an User, is not the best scenario: would be bad to start the acquisition of a product that costs U$ 100.00, and in the middle of the transaction (before paying), the price increases to U$ 125.00.

If you have any questions, I'd be glad to address them.

I just want to have a Docker Image of Postgresql on Docker Hub, with preloaded data saved, so I can pull it and create a container whenever I want, even sharing the image with other people for testing general stuff (just as a code repo on Github). I also want to update the preloaded data with data that I'm currently working on, for personal projects (never including sensitive data).

Hope these instructions might help you too, Developer or SysAdmin.

Requirements

Of course, having a Docker Hub account, with Docker engine installed and Docker client authenticated, are necessary in order to work with Docker. But besides that, without having a image repo created on Docker Hub, there's no place to upload locally built images.

This is the repo which I created:

Postgresql-based Container

First, I need to create a container based on Postgres official image repo (version 12).

I used to be more detail-oriented on creating containers (docker pull, docker create and docker start), but for the general operations, docker run is way more practical:

$ docker run -p 5432:5432 -e POSTGRES_PASSWORD=postgres -d postgres:12
Unable to find image 'postgres:12' locally
12: Pulling from library/postgres
a803e7c4b030: Pull complete 
89aefdc7e2ee: Pull complete 
9ed34906f4b2: Pull complete 
731d5308799a: Pull complete 
91955551cb08: Pull complete 
f3e0af904c74: Pull complete 
906febd4182a: Pull complete 
e7f594a72a36: Pull complete 
0857ffe75e3a: Pull complete 
29faa49de6a6: Pull complete 
de18a1e43334: Pull complete 
caec1f1d9715: Pull complete 
c75a666c6c4d: Pull complete 
Digest: sha256:81ab067d5573cdd38f6dac3de637aa1d6e923c3504bd93a79f1fb397675342f4
Status: Downloaded newer image for postgres:12
22f9858b88b5c8fca20bf7263aca18a4335836b6519bdebfa50d99823979113d

Creating Some Data

By the way, I'm taking a course on advanced DBMS topics, and for the tests that I need right now, this is enough:

$ psql -U postgres -h 127.0.0.1
Password for user postgres: 
psql (12.16 (Ubuntu 12.16-0ubuntu0.20.04.1))
Type "help" for help.

postgres=# CREATE TABLE accounts (id SERIAL PRIMARY KEY UNIQUE NOT NULL, holder VARCHAR(64) NOT NULL, balance INTEGER NOT NULL);
CREATE TABLE
postgres=# INSERT INTO accounts (holder, balance) VALUES ('Dennis Ritchie', 1000), ('Steve Jobs', 1000);
INSERT 0 2
postgres=# SELECT * FROM accounts;
 id |     holder     | balance 
----+----------------+---------
  1 | Dennis Ritchie |    1000
  2 | Steve Jobs     |    1000
(2 rows)

postgres=# exit

Preloaded Image with Dump

The custom Postgresql image that I'm going to create, will require this dump, in order to preload the containers (creating table + insert records):

$ pg_dump -U postgres -h 127.0.0.1 > postgres_db_dump.sql

Building Custom Image

I used postgres:12 image before, in order to build the container where the dump was generated. Since the image is locally cached, it will not require further download, while building the custom image.

Besides storing the dump in /docker-entrypoint-initdb.d/ directory (Postgres will execute the .sql files which are inside of it), the environment variable POSTGRES_PASSWORD will also be set during building level, in order to not require password of postgres user, while building containers based on this custom Postgres image:

$ cat << EOF > Dockerfile
> FROM postgres:12
> ENV POSTGRES_PASSWORD=postgres
> COPY postgres_db_dump.sql /docker-entrypoint-initdb.d/
> EOF

Building an image based on Dockerfile, requires the definition of a tag (version) of the image.

I'm gonna use the version of Postgresql engine (12), since later on, I might want to have another custom Postgresql, but for the newest version, say 16:

$ docker image build . -t ivanlmj/postgres:12
[+] Building 0.0s (7/7) FINISHED                                          
 => [internal] load build definition from Dockerfile                      0.0s
 => => transferring dockerfile: 140B                                      0.0s
 => [internal] load .dockerignore                                         0.0s
 => => transferring context: 2B                                           0.0s
 => [internal] load metadata for docker.io/library/postgres:12            0.0s
 => [internal] load build context                                         0.0s
 => => transferring context: 42B                                          0.0s
 => [1/2] FROM docker.io/library/postgres:12                              0.0s
 => CACHED [2/2] COPY postgres_db_dump.sql /docker-entrypoint-initdb.d/   0.0s
 => exporting to image                                                    0.0s
 => => exporting layers                                                   0.0s
 => => writing image sha256:a6e4657bdcaadf7536bead9eb....                 0.0s
 => => naming to docker.io/ivanlmj/postgres:12                            0.0s

Pushing to Repo

$ docker push ivanlmj/postgres:12
The push refers to repository [docker.io/ivanlmj/postgres]
3ab18e48bca6: Layer already exists 
f807adebe19a: Layer already exists 
19b47a8895e5: Layer already exists 
8b806c56785b: Layer already exists 
cf3b6ab8b6e9: Layer already exists 
c3d26035203f: Layer already exists 
9bb81d691b8a: Layer already exists 
572272d8013b: Layer already exists 
78f5cd522124: Layer already exists 
4e9de6962b22: Layer already exists 
6722c8542886: Layer already exists 
20fde8a4b2e7: Layer already exists 
8fb3f4d77297: Layer already exists 
d310e774110a: Layer already exists 
12: digest: sha256:fa9cf0626b33882c2d71b4e8fffec87691b...  size: 3247

Container Based on Custom Image (testing)

Since the image was built with the environment variable with the password of postgres user, there's no need to set it when creating a container:

$ docker run -p 5432:5432 -d ivanlmj/postgres:12
b8de9890aed0600ba53cfbe5d3b4efdada6460b669b19373406cf36a379d621f

$ docker ps
CONTAINER ID   IMAGE                 COMMAND                  CREATED         STATUS        PORTS                                       NAMES
b8de9890aed0   ivanlmj/postgres:12   "docker-entrypoint.s…"   2 seconds ago   Up 1 second   0.0.0.0:5432->5432/tcp, :::5432->5432/tcp   admiring_mclean

Connecting via host client to published port 5432, works just fine, and the data instrcuted by the dump, is present on the container database:

$ psql -U postgres -h 127.0.0.1
Password for user postgres: 
psql (12.16 (Ubuntu 12.16-0ubuntu0.20.04.1))
Type "help" for help.

postgres=# SELECT * FROM accounts;
 id |     holder     | balance 
----+----------------+---------
  1 | Dennis Ritchie |    1000
  2 | Steve Jobs     |    1000
(2 rows)

$ man doadm

Eventually writing my experiences and part of the truth.

Mountaineer, immigrant, developing software mainly in Python (web services, web applications, ETL/ELT pipelines, data analytics and engineering, automation), former GNU/Linux SysAdmin.

14 years between Infrastructure and Software Engineering, most of the time working in Startup environments from LATAM to Europe.

Travelling and learning from different cultures and languages, is one of my passions.

Not afraid of saying that the emperor has no clothes.

Steering Council Member of XalapaCode community, since 2019.

P.S.: hard to keep this space updated, to be honest.

Regards,

Diary of a Devman

A Bit of Transactions and Concurrency in Postgresql