While working with Kafka consumer, we came across a few issues:
  • Continues rebalancing for a Kafka topic.
  • The duplicate message is being read by different consumers belonging to the same consumer group.
  • Consumer Group command is returning no Active Consumers, even if 5-different consumers are consuming messages.
  • Consumers stop consuming messages intermittently and we need to restart the service to bring message consumption back to normal.

Root-Cause

During investigation I came across a solution, that increase in session timeout can resolve this problem, but unfortunately increasing session timeout did not help. Further digging in the library code base unwrapped few assumptions, which intern became multiple areas to be improved in our code-base. I am writing this blog keeping in mind that this can help someone else as well.
  • Compatibility issue between the Sarama and Sarama Cluster versions
  • Handling of Partition channel exposed by the Sarama Cluster
  • Handling of Error channel exposed by the Sarama Cluster
  • Handling of Notification channel exposed by Sarama Cluster
Let's discuss all of these one by one and understand:

Compatibility Issue Between the Sarama and Sarama Cluster Versions

Latest version of Sarama Cluster does not work with the latest version of the Sarama. Sarama Cluster is deprecated since long and there is no further development to support newer version of Sarama.
Solution
Use latest Sarama Cluster with the Sarama version 1.0.17.

Handling of Partition Channel Exposed by The Sarama Cluster

Sarama Cluster send subscribed partition details on this blocking channel while subscribing to a partition in Kafka cluster. If we block this subscription to other partition does not happen on the Kafka with-in the specified session timeout as a result further rebalancing triggered or consumer won't be able to participate in the further rebalancing events and Kafka moves this out of Group.
Solution
Release this partition channel as soon as possible, so that we can subscribe to all the partitions with-in the specified session timeout.

Handling of Error Channel Exposed by The Sarama Cluster

Sarama Cluster send subscribed partition details on this blocking channel is there is any error. If we block this error channel on the Kafka with-in the specified session timeout as a result further rebalancing triggered.
Solution
Release this error channel as soon as possible, so that we can multiple errors with-in the specified session timeout.

Handling of Notification Channel Exposed by Sarama Cluster

Sarama Cluster send subscribed partition details on this blocking channel upon rebalancing. If we block this notification channel on the Kafka with-in the specified session timeout as a result consumer won't be able to participate in the further rebalancing events and Kafka moves this out of Group.
Solution
Release this notification channel as soon as possible, so that we can complete rebalancing with-in the specified session timeout and next rebalancing event can be consumed by consumer.
Sarama Consumer Implementation