Site Reliability Engineer
· Integrate systems using a wide variety of protocols like REST, SOAP, MQ, TCP/IP, JSON and others
· Design and build automated code deployment systems that simplify development work and make our work more consistent and predictable. You'll do this by orchestrating environment deployment from OS all the way through the application layers of a solution, using tools such as Kubernetes, Docker, Saltstack, Jenkins and many others
· Exhibit a deep understanding of server virtualization, networking and storage ensuring that the solution scales and performs with high availability and uptime
· Create mechanisms/architectures that enable rapid recovery, repair and cleanup of faulty migrations with good understanding of fault tolerance and failure domains
· Identify opportunities to deliver self service capability for the most common infrastructure and application management tasks
· Create Platform automation to create/manage clusters and Deployment automation for all CI/CD pipelines
· Provide deep and detailed levels of monitoring/alerting capabilities across all levels of the application
· Help mentor other engineers and technical analysts.
· Plan sprints within your project team to keep yourself and the team moving forward
· Move fast, break things, and determine how to fix them
· Help development teams with designing, capacity planning and deploying large-scale distributed systems
· Map and maintain dependencies and understand implications of service disruptions on the overall system health
· Lead incidents to quickly bring services online with minimal disruption
· Effectively and readily assess customer impact post-incident
Sthree US is acting as an Employment Agency in relation to this vacancy.