- Specifying resource limits for Kubernetes pods is really important. Had a faulty chart be rescheduled to all nodes in the cluster and eat 100% CPU and memory, effectively killing the whole cluster node by node.
- cert-manager will sometimes not recover if one of its pods is killed mid-work. Removing a Certificate resource and readding it will fix the issue.
- A source in Java Reactor using
groupBywill get stuck if number of groups exceeds