Distributed Systems —a quick introduction to the basics you should know
What is a Distributed System?
Have you ever wondered how Facebook stores and serves thousands of petabytes of user content such as photos, videos, likes, etc.? Have you ever gotten intrigued by how you can upload your photo or video on Instagram, and your followers can view the same instantly on their feeds? The answer to these questions is distributed systems!
So, now you are wondering what exactly is a distributed system? You must have heard these two words many times in MOOCs videos or read in a few articles or books. But you don’t know what these two words mean.
Wikipedia defines distributed systems as:
A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. The components interact with one another to achieve a common goal.
In simple terms, we can say that distributed systems are a group of computers working concurrently to accomplish a common goal.
Machines that can be a part of a distributed system:
- Physical servers
- Virtual Machines
- Any node that can connect to a network and communicate by passing messages
There is no global clock in distributed systems, and all components fail independently of each other. Each node has its own clock.
Importance of Distributed Systems
Sometimes, it is very complex to build and hassle to maintain a large amount of data. In those cases, distributed systems come into play. They effectively handle a large amount of load by easy scaling. Two types of scaling are possible- vertical scaling and horizontal scaling. In vertical scaling, the hardware is upgraded with a faster CPU and increased memory. But it is not possible to scale in this manner after some point of time at low cost. Whereas in horizontal scaling, a completely new computing machine is added into the network, and the load is then balanced.
Initial costs associated with horizontal scaling tend to be higher. It becomes much more efficient than vertical scaling after a particular point whereas, in vertical scalability, the cost of scaling rises sharply after a certain point.
Different Architectures of Distributed Systems
The client-server model is a popular networked model consisting of three components.
- Service: The service is the task that a particular machine can perform.
- Server: A server is a machine that performs the task.
- Client: A client is a machine that is requesting the service.
There are no additional machines used to provide services or manage resources. The load is equally distributed among the machines present in the system, known as peers, which can serve as either client or server.
In this architecture, the clients no longer need to be intelligent and can rely on a middle tier to do the processing and decision making. The middle tier could be called an agent that receives requests from clients, which could be stateless, processes the data, and then forwards it onto the servers.
Enterprise web services first created n-tier or multi-tier systems architectures. This popularized the application servers that contain the business logic and interacts both with the data tiers and presentation tiers.
Benefits of Distributed Systems
- Scalability: Scaling according to the load in distributed systems is easy and generally inexpensive.
- Reliability: Distributed systems are fault-tolerant i.e. it doesn’t experience disruptions if a single machine fails as the load gets rebalanced between the remaining machines.
- Performance: These systems are extremely efficient because workloads can be divided and sent to multiple machines.
- Sharing of other hardware resources: Resources like printers can be shared with multiple nodes rather than being restricted to just one.
Challenges of Distributed Systems
- Scheduling: Sometimes selecting which job needs to run, when it should run and where it should run leads to under-utilized hardware and unpredictable runtimes.
- Latency: As the system is widely distributed, there will be more latency that can occur between the communication of nodes.
- Security: With a large number of nodes, it is difficult to provide adequate security in distributed systems because all the nodes, as well as the connections, need to be secured.
- Losing data: Sometimes, messages and data can get lost in the network while moving from one machine to another.
With the advancement in technology and the extremely large amount of data processing going on these days, distributed systems are the ultimate and only solution to cater to such requirements.
Thanks for reading!