A distributed system is a system consists of many services running independently on different servers (servers can be physical machines, VMs, or containers) and connected via a network. Although its services are distributed, it appears as a single entity to the end user.
Tradeoffs
In the system design post we shared that a distributed system has good:
- Availability
- Scalability
- Flexibility
But falls short at:
- Development
- Testing
- Deployment
- Monitoring
- Maintenance
- Performance
- Data consistency
CAP theorem
CAP theorem states similar tradeoffs. It says we can only achieve two out of the three guarantees at any given time:
- Consistency (C)
- Every request receives the latest data (an error otherwise).
- Availability (A)
- Every request receives a response without error (but may contain obsolete data).
- Partition Tolerance (P)
- The system continues to operate even if there are network partitions, which means some servers lost connection to others.
In a distributed system, partition tolerance is a requirement because network failures happen all the time, so we have to choose between consistency and availability.
- CP
- Preferring consistency over availability, thus the requests hitting the isolated servers fail with an error because the system cannot return the latest data.
- AP
- Preferring availability over consistency, thus the requests hitting the isolated servers receive possibly obsolete data.
Scaling
Here are a few things we need to know about scaling a distributed system.
- Stateless service
- Services don't store user data like user sessions on their servers. User data is stored in a centralized storage like a cluster of RAIDs. Therefore, users can have the same session despite hitting different servers. This makes services easier to scale horizontally.
- Routing
- Route requests to different services based on request type.
- Vertical scaling (scale-up)
- Upgrade to a more powerful server.
- Horizontal scaling (scale-out)
- Use more servers to run more copies of the same service.
- Load balancing
- Distribute requests among multiple servers using methods like round-robin.
- Auto scaling
- Scale horizontally in a dynamic way based on real-time traffic.
- Caching
- Store request-response pairs to handle repetitive requests quickly without making expensive API or database calls over and over again.
- Client-side cache
- CDN cache
- Load balancer cache
- Message queue cache
- Service
- CPU cache
- RAM cache
- Disk cache
- In-memory cache
- Database cache
- Store request-response pairs to handle repetitive requests quickly without making expensive API or database calls over and over again.
- Database selection
- Choose SQL databases for consistency or NoSQL databases for availability and scalability.
- Database replication
- Replicate the same database across multiple servers.
- Primary-replica
- We have one primary database and multiple replica databases. The primary database handles write requests and propagates the latest data to replica databases. The replica databases take care of read requests.
- This increases read speed.
- Database sharding
- Split the existing database into multiple smaller databases and each one of them handles a subset of the data based on certain criteria.
- This increases both read and write speed.
- Asynchronous processing
- Run time-consuming and resource-intensive tasks during off-peak hours or on background servers.
Scaling example
Here is a simulated example for scaling a client-server application.
- Initially we launch the application with a minimal setup:
- A client application
- An application server
- A database server
- As more and more user requests hitting the application server, our application server fails to handle them all, probably due to CPU, memory, and connection limit.
- We can upgrade the application server to a more powerful machine (vertical scaling).
- If that's not enough, we can set up more application servers and a load balancer to handle increasing traffic (horizontal scaling and load balancing).
- We can also set up a CDN to cache static assets (caching).
- As more and more user data stored in our database, database query speed slows down.
- We can upgrade the database server to a more powerful machine (vertical scaling).
- Read
- If that's not enough, we can put a cache between the application servers and the database to speed up database read operations (caching).
- If that's not enough, we can do database replication (database replication).
- Write
- If that's not enough, we can shard the database into multiple smaller ones to accelerate write operations (database sharding).
- As we add more and more features, the application server fails to handle them all, probably due to CPU and memory limit, or dependency conflicts.
- We can upgrade the application server to a more powerful machine (vertical scaling).
- If that's not enough, we can split the application into multiple services and run each of them in a separate server. We also use an API gateway (or even a cluster of gateways) to route the requests to different services (routing).
The scaled client-server application has the following final layout:
- The client application that serves as the user interface.
- A CDN that serves static assets quickly.
- A load balancer that distributes traffic across multiple API gateway instances running on different servers to maximize availability and performance.
- TLS termination
- Load balancing
- A cluster of API gateways that control access and route requests to APIs:
- Rate limiting
- DDoS
- Logging
- Authentication/authorization
- Routing
- Rate limiting
- A message broker or message queue for event-driven processing
- A distributed cache
- A collection of services that takes care of:
- User
- Business logic
- Observability
- Logging
- Metric monitoring
- Alerting
- Analytics
- A group of databases that addresses:
- Latency
- Throughput
- Availability
- Reliability
- Scalability
- A set of communication methods that utilize HTTP, TCP, or UDP protocol:
- REST
- Web socket
- RPC
Security
System security has become more and more important.