System Design

Ok, we want to build a software system for example a web application, but where should we start?

Well, let's first think about the functional and non-functional requirements of our product. This gives us a list of questions to answer.

Functional requirement (scope)

Non-functional requirement

System layout

After clarifying the functional and non-functional requirements of the system, we move on to sketching out the system layout.

Unless the non-functional requirements demand a system that supports millions or even billions of concurrent users and requests upfront, we want to start with a minimal design.

Forget about microservices, load-balancing, auto-scaling, message queue, async-processing, distributed storage, or big data. We simply don't need them.

When deciding whether we should add a specific component to our system, keep asking ourselves: will anything break if we don't add this component? If nothing actually breaks, that component is something we don't need to include in our system.

This practice minimizes the system complexity and thus the operational cost.

Minimal design

Start with one server and one database.

In general, CRUD applications are I/O-bound and a server can handle around 10 K Requests Per Second (RPS).

If the application is CPU-bound, the concurrency is limited by the number of logical cores of the server (CPU socket count x physical cores per socket x logical cores per physical core). An enterprise-grade server can have up to 1 K logical cores (8 x 64 x 2).

The basic system design layout looks like the following.

Client-server communication

We use Representational State Transfer (REST) to build the Application Programming Interface (API) for client-server communication. It is easy to write, read, and maintain. Take care of timeout and error handling via retry with jittered exponential backoff.

Server bottleneck

As the application gains more users, the server struggles to handle that many concurrent requests and users start to feel the response latency increase. Concurrency becomes the bottleneck of our system due to the OS settings or system memory shortage.

OS setting customization

If the bottleneck is caused by the OS settings, we can tune those settings such as the open file descriptor number limit and the TCP stack depth limit. For example, we can lift the open file descriptor number limit from 1 K to 100 K.

Server vertical scale up

If the bottleneck is caused by the system memory shortage, we can upgrade the server to a larger instance with more system memory.

Server horizontal scale out

If OS setting customization and server vertical scale up don't fully resolve the concurrency bottleneck, we then need to horizontally scale out the server to a server cluster. We use a load balancer to distribute the request traffic among the servers.

Database bottleneck

As more data is stored in the database and more transactions happen at the same time, the database transaction latency rises to an unacceptable level and becomes the bottleneck of our system.

Database setting customization

Customizing database settings is a good way to start optimizing a database. We can increase the max_connections from 100 to 200, shared_buffers from 128 MB to 4 GB, etc. By tuning a few parameters (max_connections, shared_buffers, min_wal_size, max_wal_size, random_page_cost, effective_cache_size, maintenance_work_mem, work_mem, max_parallel_workers_per_gather), we can get around 30% performance boost. If we fine-tune all parameters, the performance boost will be 2x or even 3x.

Database vertical scale up

We can upgrade the database server to a larger instance with more system memory.

Database caching

Adding a database cache, Content Delivery Network (CDN), and client-side cache can increase the read performance.

Database sharding

The most effective approach to boost write performance is through database sharding.

See more about read and write transaction scaling in the database post.

Availability and reliability bottleneck

As our monolithic system grows bigger in size, it will face the availability and reliability challenge. We need to refactor the system to use the microservice architecture. We use an API gateway to filter requests and route them to different services, and for each service we have a load balancer to distribute traffic among many service replicas.

External API rate limit bottleneck

If a service depends on an external API and that external API has a rate limit, we can use a message queue to control the request sending rate based on the rate limit in a centralized fashion.

Tradeoff

Next step

Once we have decided how we structure the system, we can go ahead with the development cycle.

References

←Previous Next→