Site Reliability Engineer (Middle/Senior) ID38916
Agileengine
5 minutos atrás
•Nenhuma candidatura
Sobre
-
- AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards.
- WHY JOIN US
- If you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you!
- WHAT YOU WILL DO
- - Shift: Monday – Thursday 8AM – 7PM PST (11AM – 10PM EST) with rotating on-call;
- - On call shifts: every 6 weeks, for one week as primary responder and next week as secondary;
- - Manage alerts daily, check systems, and escalate issues as needed;
- - Be part of a team that provides 24×7 on-call support for critical SaaS events;
- - Be available in case of emergencies when team members are not available or need help;
- - Document issues and remediation steps;
- - Proactively create appropriate monitors in the EKS/K8S ecosystem;
- - Deploy to EKS/K8s cluster using Terraform and Helm;
- - Learn and maintain existing infrastructure running under Docker Swarm;
- - Improve existing infrastructure health by implementing checks and scripts to correct known issues;
- - Maintain and develop deployment code;
- - Automate manual tasks;
- - Implement/integrate new technologies in our Cloud Infrastructure;
- - Collaborate with other teams and departments to provide the highest level of support and assistance;
- - Apply a real customer focus when planning deployments/updates, having the customer in the forefront of the mind, and considering the impact on them before making changes;
- - Work closely on solutions with Support, Customer Success, Migration, and Professional Services teams to provide the best in class SaaS service to our customers;
- - Perform RCA and take necessary corrective actions to prevent the recurrence of issues;
- - Create and assign alert-related actions to the appropriate team after the investigation;
- - Handle support requests for environment-specific actions;
- - Identify and provide automation requirements to improve RCA.
- MUST HAVES
- - 2+ years of professional experience;
- - Experience working with Datadog;
- - Hands-on experience as an AWS Cloud Engineer;
- - Working knowledge of EKS/Terraform/Helm;
- - Working Experience with Docker and Docker Swarm;
- - Good understanding of AWS IAM roles and policies;
- - Experience logging and monitoring AWS resources using CloudWatch logs;
- - Experience working in a Linux environment;
- - Proficient in Bash and/or Python scripting;
- - A strong understanding of web technologies such as REST APIs;
- - Working Experience with monitoring solutions, such as Grafana and Prometheus;
- - Excellent oral and written communication skills;
- - Customer-facing communication skills to effectively explain issues and RCAs to them;
- - Experience in Product/Application Support for SaaS-based products;
- - Understanding of APIs, Databases, Systems Architecture, and Design;
- - Designing, implementing, and operating in a DevSecOps;
- - Excellent communication skills, both written and verbal;
- - Ability to work independently as well as within a collaborative environment;
- - A technical aptitude with the desire to learn new and evolving technologies;
- - Upper-Intermediate English level.
- NICE TO HAVES
- - Experience with GCP or Azure;
- - Certifications: AWS Certified DevOps Engineer – Professional or AWS Certified Advanced Networking Specialty.
- PERKS AND BENEFITS
- - Professional growth: Accelerate your professional journey with mentorship, TechTalks, and personalized growth roadmaps.
- - Competitive compensation: We match your ever-growing skills, talent, and contributions with competitive USD-based compensation and budgets for education, fitness, and team activities.
- - A selection of exciting projects: Join projects with modern solutions development and top-tier clients that include Fortune 500 enterprises and leading product brands.
- - Flextime: Tailor your schedule for an optimal work-life balance, by having the options of working from home and going to the office – whatever makes you the happiest and most productive.
-




