Job Description
Overview:
TekWissen is a global workforce management provider headquartered in Ann Arbor, Michigan that offers strategic talent solutions to our clients world-wide. Our client provider of digital technology and transformation, information technology and services
Position: Incident & Request Manager Non-Production Environments
Location: Atlanta GA / Bellevue WA
Duration: 6 Months
Job Type: Temporary Assignment
Work Type: Onsite
Job Description:
- The Incident & Request Manager leads the incident response and request management function for all non-production environments (Dev, QA, UAT, Performance).
- Acting as the escalation point for project/product delivery teams, this role ensures incidents are resolved quickly, requests are fulfilled efficiently, and learnings are embedded into continuous improvement.
- The Incident Manager directly manages a team of Incident Analysts and SREs, partners with DevOps teams to automate detection and response, and works closely with Environment and Change Managers to reduce recurrence of issues.
Key Responsibilities:
Incident Management:
- Own the incident lifecycle: detection, triage, response, resolution, and closure.
- Act as the primary escalation point for project/product delivery teams during NPE incidents.
- Lead war rooms for critical incidents, coordinating with technical and delivery stakeholders.
- Ensure timely escalation to Environment, Change, DevOps, Infra, and Security teams when required.
- Track and improve incident SLAs (MTTR, MTTD, availability SLOs).
Request Management:
- Own request fulfilment for project/product delivery teams (e.g., access, entitlements, environment service requests).
- Standardize and automate common request types in collaboration with Intake and DevOps teams.
- Ensure requests are logged, prioritized, and fulfilled within SLA.
- Provide transparency to stakeholders on request status.
Team Leadership:
- Manage and mentor Incident Analysts and SREs.
- Ensure follow-the-sun coverage via offshore/onshore teams.
- Build a culture of blameless incident management, automation-first practices, and continuous learning.
Governance & RCA:
- Ensure all incidents have documented Root Cause Analysis (RCA).
- Track corrective and preventive actions, and feed them into Change and Environment management processes.
- Provide trend reporting and insights to leadership.
SRE & DevOps Alignment:
- Work with SREs and DevOps teams to automate incident detection, rollback, and recovery.
- Integrate observability tools (Splunk, Prometheus, Grafana) into proactive monitoring.
- Stakeholder Communication:
- Provide timely updates during incidents and delays in request fulfilment.
- Publish regular reports on incident trends, RCA outcomes, and SLA adherence.
- Maintain trust with project/product delivery teams by ensuring transparent communication.
Required Skills & Experience:
- 8 10 years in Incident Management, Service Operations, or SRE leadership.
- Experience managing Incident Analysts and SRE teams.
- Strong knowledge of AWS, Kubernetes, CI/CD pipelines, and observability tools (Splunk, Prometheus, Grafana).
- Deep understanding of ITIL Incident, Problem, and Request Management processes.
- Excellent crisis management, communication, and stakeholder engagement skills.
TekWissen Group is an equal opportunity employer supporting workforce diversity.
Job Tags
Temporary work,