Symptoms:
- Warning or critical alert “Ambari Server Performance” is fired in Ambari.
- Ambari is slow or behaving erratically.
- Scaling up and / or adding edge nodes is failing.
- Ambari and YARN keep logging off.
Debug:
- Check if jobs are submitted through Informatica Blaze engine.
- Check if Informatica Blaze engine is shutdown abruptly.
- Confirm App Timeline Server (ATS) is throttled: check if there are lots of “Cannot get a connection, pool error Timeout waiting for idle object” warnings in /var/log/hadoop-yarn/yarn/ yarn-yarn-timelineserver-*.log on both headnodes.
Cause:
This is caused by a bug in Informatica Blaze engine that continuously polls the ATS database for ATS entities. No matter what the database SKU is, it will reach 100% DTU utilization affecting all other operations.
Solution:
Informatica has a hotfix (EBF-14471) for this issue. Customers who are facing this problem may work with Informatica support team for more specifics on the bug and the fix in Blaze.