Oracle Senior Principal Service Reliability Engineer in Sandy, Utah
Analyze, design develop, troubleshoot and debug software programs for commercial or end user applications. Writes code, completes programming and performs testing and debugging of applications.
As a member of the software engineering division, you will specify, design and implement major changes to existing software architecture. Create new architecture for a moderate size product or a portion of a major product. Build and execute unit tests and unit test plans. Review integration and regression test plans created by QA. Communicate with QA and porting engineering to ensure consistency, testability and portability across products in general.
Provide leadership and expertise in the development of new products/services/processes, frequently operating at the leading edge of technology. Recommends and justifies major changes to existing products/services/processes. BS or MS degree or equivalent experience relevant to functional area. 8 or more years of software engineering or related experience.
Oracle is an Affirmative Action-Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability, protected veterans status, age, or any other characteristic protected by law.
The Oracle ERP Cloud Operations team is looking for passionate, innovative, high caliber, team oriented super stars that seek being a major part of a transformative revolution in the development of modern business cloud based applications. As part of the market leading ERP Cloud, Oracle ERP Cloud Operations offers a broad suite of modules and capabilities designed to empower the development organization with world-class service reliability engineering disciplines and deliver customer success with streamlined process, increased productivity, and improved business decisions.
Oracle, the world leader in Enterprise Cloud, is hiring the best and brightest technologists in the industry as we continue to add customer-centric, world-class, leading edge, secure, hyper-scale based solutions throughout all levels of the cloud stack. Oracle’s cloud eco-system is the only complete business cloud platform on the planet, with market leading and business transforming solutions spanning SaaS, DaaS, PaaS and IaaS. Oracle’s cloud applications, such as Enterprise Resource Management, Customer Experience Management, Human Capital Management and Supply Chain Management are used by thousands of customers across the globe and are the broadest, most innovative in the industry, providing businesses with adaptive intelligence, standardized business processes and competitive advantages at low costs.
Key Tasks and Responsibilities
Service Ownership– You will be part of the SRE team, whose mission is the shared full stack ownership of a collection of services, with our Service Development and Operations SRE partners.
Ownership Scope– You will understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of the production services you own. In partnership with your Service Development and Operations SRE partners, you will have the responsibility to ensure that services are designed and delivered to be mission critical with focus on monitoring, telemetry, security, resiliency, scale and performance.
Service Requirements– You will provide direction and prioritization to service Product Management and Service Development teams to engineer and add premier SRE capabilities to the Oracle SaaS/ERP services.
Incident Response– You will be the primary author of technical content for both customer and internal communications used throughout the incident response process, e.g. postmortem/root cause analysis, end-to-end repair item definition, and fixes in production.
Prevention– Using data-driven incident findings, you will work on solutions that will ultimately prevent the incident/problem from arising ever again, and develop interim solutions to more quickly resolve the problem next time.
Evangelize and educate– You will play a critical role in making the transformational culture change to an SRE mindset within the Service Development organization. You will be responsible for evangelizing and educating Service Product Management and Service Development on the service centric, full Techstack approach and principles of SRE as well as the architectures and solutions used for Oracle SaaS/ERP services.
Service Performance– You will work with SaaS Operations and Product Development teams to triage performance issues (both reactive and proactive). You will work with central teams to define and drive monitoring tooling and process enhancements, including identification of service metrics to enhance performance issue triage, diagnostics and improvements.
Service Health Reviews– You will represent ERP Development in periodic cross-organizational service health reviews. You will help to identify patterns that influence service performance and/or reliability. You will lead efforts to eliminate process deficiencies and drive simplification into processes and procedures.
Automation– Our goal is to eliminate human intervention wherever possible. You will be responsible for driving automation into our monitoring and recovery processes, code delivery procedures and issue resolution processes.
Skills and Qualifications
Minimum of 10 years of software development and demonstrated knowledge of professional software engineering best practices for the full software development life cycle, including coding standards, code reviews, source control, build and release processes, continuous deployment and test suite development and maintenance.
Problem solving skills with abilities in analysis, problem identification and resolution.
Experience with enterprise system components, architecture and deployments
Experience in deploying and running large scale online systems built on Cloud platforms such as Oracle Cloud, AWS, Azure, Google Cloud Platform and/or OpenStack
Experience in performance analysis and tuning of enterprise applications
Experience with monitoring and alerting using technologies like Prometheus, Sensu, Nagios, Kafka, Wavefront, BigPanda, DataDog, and/or PagerDuty.
Experience with Oracle Linux, RedHat Linux, Ubuntu, Centos, CoreOS, and/or Amazon Linux.
Experience in designing and building automated tools and solutions, including programming and data model design skills
Hands-on with web protocols and Linux/Unix tools and architecture, from kernel to shell, file systems, and client-server protocols.
Experience with solutions for platform and application layer telemetry, monitoring, scalability, performance and reliability.
Experience with working systems and network administration, application security, DevOps and/or Site Reliability Engineering will be highly preferred
Excellent written and verbal technical communications with technical and non-technical peers, customers and at times executive leadership.
Proven success in contributing in a collaborative, team-oriented environment, with the ability to establish and nurture relationships at all levels.
BS in Computer Science or related field and 10 years relevant experience.
Job: *Product Development
Title: Senior Principal Service Reliability Engineer
Location: United States
Requisition ID: 20000R92