The storm topology is not processing any data from Kafka. It was working before. OR Topologies going stale after few hours of processing.
A storm topology is comprised of multiple bolts and spouts which form a storm processing pipeline. Bottleneck in any one of them can cause issues in the whole pipeline not being able to process messages.
From the incidents that we have seen with Storm topologies, the issue usually hasn’t been with the Apache Storm project itself but the resolution has required tuning on the part of the customers to get the best out of their storm cluster.
Here is a very comprehensive article on Storm Tuning: https://community.hortonworks.com/articles/62852/feed-the-hungry-squirrel-series-storm-topology-tun.html
Storm UI and navaigate to the topology that is not processing messages under Topology Summary
https://
Topology Summary (Uptime) - The column uptime gives you how long the topology has been running… If its a few minutes it means the customer redeployed the topology and it will take sometime to catch up. So you should wait for an hour or so before you make any judgement about state of the cluster and what is happening.
If the Capacity is higher than 1 - Ask the customer to look at code optimizations to reduce capacity - Suggest increasing parallism and having more executors for the problematic bolt.
SSH into Worker Node for running these commands
Here are the steps to turn DEBUG logging ON and OFF.
Please replace [VERSION] [topology name] [logger name] [LEVEL] and [TIMEOUT] with real values corresponding to customers cluster.
cd /usr/hdp/[VERSION]/storm
./bin/storm set_log_level [topology name]-l [logger name]=[LEVEL]:[TIMEOUT]
Example:
cd /usr/hdp/2.6.2.2-5/storm
./bin/storm set_log_level cy17-binary-decoder -l ROOT=DEBUG:30
./bin/storm set_log_level cy17-binary-decoder -l ROOT=INFO:30
A redeploy of topology usually mitigates the problems.
BUT you should strongly consider tuning the topology so that these issues do not occur