The idea here is to see how it is possible to use the Apache Curator a high-level API for zookeeper for Service Discovery and Load Balancing, and should Service Discovery and what are the benefits and pitfalls of such an approach.
Apache Curator is primarily a high-level Zookeeper API for Java. In order to provide the functionalities provided by Curator for other languages than Java, it provides 2 different options: Service Discovery Server and Curator RPC. In this implementation, I am using the Curator RPC. Curator RPC is a small curator client built using java which exposes all the curator APIs through Apache Thrift Interface. Using this, all the languages supported by Apache Thrift can use the Curator Functionality.
Service Discovery part of the curator is straight forward, as it provides direct APIs for doing so. The service has to register with the zookeeper using the curator API along with URL to access it, nad the name of the service. And the service has to keep one polling the Curator RPC in the required interval to make sure that the service remains available, otherwise, the Curator RPC deletes the service entry from the zookeeper.
For discovering the APIs, the client which wants to avail the server can use the Curator RPC thrift API to get the list of available service under a given service name.
Apache Curator doesn't support the load balancing part out of the box. Therefore, we need to use the features provided in the service discovery API of Apache Curator to implement the load balancing part.
This is where the advantage and pitfalls of using Apache curator for load balancing come into play. The way in which it is implemented is that Along with the registration of service in the zookeeper, Curator allows you to store an additional field which can be used for any purpose. Using this field, each time lets say we can update the Processor Usage for the instance in which the service is running. The client which needs to access the service can then use the service with least processor usage during the service discovery part.
Instead of Processor usage, technically any metric of out choice can be used for load balancing. This gives enormous flexibility over the metric that can be used for load balancing, i.e. different service can use different metrics for different service types.
The disadvantage of this approach is that it is up to the service to fetch and update this information in the zookeeper. This means that a portion of the load balancing rests on the shoulder of the services which is not an ideal solution. This can alleviate by building some common metric fetching part to the curator RPC. Unfortunately, due to the time constraint, this part won't be implemented in this implementation.
The primary advantage of this approach is the flexibility, .i.e it allows to load balance based on whatever parameters. This approach will give you flexibility to load balance using different approached for different service types.
Even though load balancing using Apache curator leveraging Apache Curarator givey you immense flexibility, it also has sever disadvantages. The first on is that the approach I have used will require atleast 2 request for acces a microservices. This can be removed by routing the requests directly through the load balancer.
The biggest problem is that it requires code changes to the microservices to fetch the metric used for load balancing of desired choice, and service registration. Even this problem can be alleviated by building an agent which will do all those things which will run separately where the service resides. This will again create an overhead of executing the agent itself.
Everything considered, it is not best approach for both load balancing and services discovery to use Zookper using Apache Curator
Source Code: Load Balancing and Service Discovery using Apache Curator Github
Wiki: Github Wiki