Senior Cloud Engineer – Observability & Performance EngineeringLocation: Washington, DC 20549Work Arrangement: Fully OnsiteClearance Requirement: Ability to obtain and maintain Public Trust Position OverviewWe are seeking a highly experienced Cloud Engineer (Observability) to lead the engineering, optimization, and operational maturity of enterprise observability platforms across hybrid cloud and containerized environments.This role is ideal for a hands-on engineer with deep expertise in Datadog, distributed tracing, APM, cloud monitoring, performance engineering, and site reliability practices. The successful candidate will partner with infrastructure, cloud, platform, and application teams to improve operational visibility, reduce alert fatigue, accelerate incident resolution, and drive data-informed operational decisions. Key ResponsibilitiesObservability Platform Engineering<ul><li>Engineer and operate enterprise observability solutions including:</li><li>Metrics</li><li>Logs</li><li>Distributed tracing</li><li>APM</li><li>Real User Monitoring (RUM)</li><li>Synthetic monitoring</li><li>Network monitoring</li><li>Build and optimize dashboards, alerts, SLOs, and SLIs</li><li>Implement OpenTelemetry and language-specific instrumentation</li><li>Integrate observability tooling with ServiceNow, CI/CD pipelines, and incident management workflows</li><li>Establish and maintain telemetry tagging standards and governance</li></ul>Cloud & Container Monitoring<ul><li>Design monitoring solutions for Azure and AWS workloads</li><li>Implement observability for:</li><li>Serverless services</li><li>Managed databases</li><li>Networking</li><li>Identity services</li><li>Cloud-native platforms</li><li>Support Kubernetes and OpenShift monitoring including clusters, nodes, workloads, and service mesh environments</li><li>Develop reusable observability modules using Infrastructure-as-Code</li></ul>Performance Engineering & Reliability<ul><li>Lead investigation and remediation of performance, latency, reliability, and capacity issues</li><li>Utilize APM, profiling, distributed tracing, and database analytics to identify bottlenecks</li><li>Define trace-based alerting and deployment correlation strategies</li><li>Support major incident response activities and root cause analysis efforts</li></ul>Capacity Planning & Operational Excellence<ul><li>Analyze telemetry and capacity trends to identify risks and opportunities</li><li>Develop reporting and dashboards for leadership and engineering teams</li><li>Improve alert quality, monitoring coverage, and operational maturity</li><li>Support enterprise SLA, KPI, and availability objectives</li></ul>

Required QualificationsBachelor's degree in Information Technology, Computer Science, Engineering, or a related field8+ years of experience in infrastructure, platform, cloud, or operations engineering5+ years of experience focused on:<ul><li>Observability</li><li>Site Reliability Engineering (SRE)</li><li>Performance Engineering</li><li>Application Performance Monitoring (APM)</li></ul>Experience administering and optimizing observability platforms such as:<ul><li>Datadog</li><li>Dynatrace</li><li>New Relic</li><li>Splunk Observability</li><li>Grafana/Prometheus</li></ul>Strong experience with:<ul><li>OpenTelemetry</li><li>Distributed tracing</li><li>Performance tuning</li><li>APM engineering</li><li>Cloud-native monitoring</li></ul>Experience supporting Azure, AWS, and containerized platformsProven ability to troubleshoot complex performance and reliability issuesAbility to obtain and maintain Public Trust clearance Preferred QualificationsExperience supporting federal or regulated environmentsExperience with:<ul><li>Kubernetes</li><li>OpenShift</li><li>Terraform</li><li>ARM</li><li>Bicep</li></ul>Strong understanding of:<ul><li>SLO/SLI engineering</li><li>Incident management</li><li>Capacity planning</li><li>Operational analytics</li></ul>Experience integrating observability platforms with ServiceNow and CI/CD tooling

<h3 class="rh-display-3--rich-text">Technology Doesn't Change the World, People Do.®</h3> Robert Half is the world’s first and largest specialized talent solutions firm that connects highly qualified job seekers to opportunities at great companies. We offer contract, temporary and permanent placement solutions for finance and accounting, technology, marketing and creative, legal, and administrative and customer support roles. Robert Half works to put you in the best position to succeed. We provide access to top jobs, competitive compensation and benefits, and free online training. Stay on top of every opportunity - whenever you choose - even on the go. <a href="https://www.roberthalf.com/us/en/mobile-app" target="_blank">Download the Robert Half app</a> and get 1-tap apply, notifications of AI-matched jobs, and much more. All applicants applying for U.S. job openings must be legally authorized to work in the United States. Benefits are available to contract/temporary professionals, including medical, vision, dental, and life and disability insurance. Hired contract/temporary professionals are also eligible to enroll in our company 401(k) plan. Visit <a href="https://roberthalf.gobenefits.net/" target="_blank">roberthalf.gobenefits.net</a> for more information. © 2025 Robert Half. An Equal Opportunity Employer. M/F/Disability/Veterans. By clicking “Apply Now,” you’re agreeing to Robert Half’s <a href="https://www.roberthalf.com/us/en/terms">Terms of Use</a> and <a href="https://www.roberthalf.com/us/en/privacy">Privacy Notice</a>.

Washington, DC
onsite
Temporary / Contract
55 - 60 USD / Hourly
Senior Cloud Engineer – Observability & Performance EngineeringLocation: Washington, DC 20549Work Arrangement: Fully OnsiteClearance Requirement: Ability to obtain and maintain Public Trust Position OverviewWe are seeking a highly experienced Cloud Engineer (Observability) to lead the engineering, optimization, and operational maturity of enterprise observability platforms across hybrid cloud and containerized environments.This role is ideal for a hands-on engineer with deep expertise in Datadog, distributed tracing, APM, cloud monitoring, performance engineering, and site reliability practices. The successful candidate will partner with infrastructure, cloud, platform, and application teams to improve operational visibility, reduce alert fatigue, accelerate incident resolution, and drive data-informed operational decisions. Key ResponsibilitiesObservability Platform Engineering<ul><li>Engineer and operate enterprise observability solutions including:</li><li>Metrics</li><li>Logs</li><li>Distributed tracing</li><li>APM</li><li>Real User Monitoring (RUM)</li><li>Synthetic monitoring</li><li>Network monitoring</li><li>Build and optimize dashboards, alerts, SLOs, and SLIs</li><li>Implement OpenTelemetry and language-specific instrumentation</li><li>Integrate observability tooling with ServiceNow, CI/CD pipelines, and incident management workflows</li><li>Establish and maintain telemetry tagging standards and governance</li></ul>Cloud & Container Monitoring<ul><li>Design monitoring solutions for Azure and AWS workloads</li><li>Implement observability for:</li><li>Serverless services</li><li>Managed databases</li><li>Networking</li><li>Identity services</li><li>Cloud-native platforms</li><li>Support Kubernetes and OpenShift monitoring including clusters, nodes, workloads, and service mesh environments</li><li>Develop reusable observability modules using Infrastructure-as-Code</li></ul>Performance Engineering & Reliability<ul><li>Lead investigation and remediation of performance, latency, reliability, and capacity issues</li><li>Utilize APM, profiling, distributed tracing, and database analytics to identify bottlenecks</li><li>Define trace-based alerting and deployment correlation strategies</li><li>Support major incident response activities and root cause analysis efforts</li></ul>Capacity Planning & Operational Excellence<ul><li>Analyze telemetry and capacity trends to identify risks and opportunities</li><li>Develop reporting and dashboards for leadership and engineering teams</li><li>Improve alert quality, monitoring coverage, and operational maturity</li><li>Support enterprise SLA, KPI, and availability objectives</li></ul>
2026-07-13T00:00:00Z

Cloud Engineer Job in Washington, DC | Robert Half