Key Responsibilities
• The Monitoring & Observability Platform team is a global team ensuring the design, development, delivery & support of the bank’s central monitoring and observability services for all TTO teams (technology domains).
• As the Lead, Cloud & Container, Observability, Central Platform Development, you will play a crucial role in ensuring the stability, reliability, and use of Machine learning of our applications and platform integrations, thereby enabling our organization to deliver predictive observability services to our internal stakeholders by adhering to the Enterprise SDLC (eSDLC) framework and guidelines.
• The ability to interpret the Group’s technical and security (ICS) control requirements and information to identify potential risks and key issues based on this information and put in place appropriate controls and measures to mitigate or minimize risk to the central monitoring & observability platform delivery.
Qualifications
• Our ideal candidate should have overall minimum of 8+ years of IT experience
• Bachelor’s Degree in computer science or Information Systems or equivalent applicable experience
• Proven experience (4 years) working as an Container Observability, Cloud Observability and Opentelemetry Lead, Data Transformation Lead or similar role, with a strong focus on enabling Opentelemetry techniques to real-world problems.
• Design and develop AI-powered solutions for IT operations (AIOps) using Machine Learning techniques and rightful used models.
• Must have experience on mentoring team in terms of creating structure to the book of work. Help team with organised product backlog.
• Participation in Weekend releases, overnight major incidents to help teams enable Observability Predictive Capability is a must as this is key capability for the role.
• Hands-on experience with machine learning frameworks (e.g., Grafana Tempo, Grafana Loki, Grafana Mimir, Victoriametrices etc.) and proficiency in programming languages such as Python, Hive, Spark.
• Must have working experience on Grizzly and Observabilityy using models.
• Enables Use of AI in responsible way and enable AI, ML technologies to identify historical trends, dynamic baselining and to drive Root cause analysis actions.
• Addressed problems through risk management and contingency planning.
• Software development life cycle knowledge in terms of analysis, development & testing phases.