HDI 3.3 for Windows initially was running a version of Storm which did not have HA support for Nimbus. To provide HA support, we deployed a custom component that copies topologies artifacts from the active nimbus to the other nimbus every 5 minutes. In very rare cases, where a nimbus node undergoes a reboot shortly after a new topology is submitted (and before the scheduled copy task could complete),both nimbus nodes may end up not having the topology code locally.
C:\hdistorm\stormdist
. The symptoms will be as follows:
2017-07-29 17:40:52.302 b.s.zookeeper [INFO] headnode0.*****.a5.internal.chinacloudapp.cn gained leadership, checking if it has all the topology code locally.
2017-07-29 17:40:52.309 b.s.zookeeper [INFO] active-topology-ids [TOPOLOGY-NAME-1498805833,TOPOLOGY-2-NAME-1500962748] local-topology-ids [TOPOLOGY-3-NAME-1498553914,TOPOLOGY-4-NAME-1498614910] diff-topology [TOPOLOGY-3-NAME-1498805833,TOPOLOGY-NAME-4-1500962748]
2017-07-29 17:40:52.309 b.s.zookeeper [INFO] code for all active topologies not available locally, giving up leadership.
[zk: localhost:2181(CONNECTED) 5] get /storm/storms
[TOPOLOGY-NAME-1498805833, TOPOLOGY-2-NAME-1500962748]
To mitigate this, copy the topology artifacts for the specific topology from the stormdist folder in one of the supervisor nodes to both nimbus nodes.
2017-08-01 02:10:26.040 b.s.zookeeper [INFO] headnode0.cmi-cdp-storm-cd-prd.a5.internal.chinacloudapp.cn gained leadership, checking if it has all the topology code locally.
2017-08-01 02:10:26.043 b.s.zookeeper [INFO] active-topology-ids [TOPOLOGY-NAME-1498805833,TOPOLOGY-NAME-1500962748] local-topology-ids [TOPOLOGY-NAME-1498805833,TOPOLOGY-NAME-1498553914,TOPOLOGY-NAME-6-1498614910,TOPOLOGY-NAME-1500962748] diff-topology []
2017-08-01 02:10:26.043 b.s.zookeeper [INFO] Accepting leadership, all active topology found localy.