Site Reliability Engineer - Data Operations

Posted 11 December 2018 by Sudhindra MG Rao (@sudhindramgrao1)

acceldata.io https://hasjob.co/nlogn.io/zzb5j , Bangalore · acceldata.io · Full-time employmentFull-time employment · ProgrammingProgramming

Responsibilities

Our Site reliability engineers work on improving the availability, scalability, performance, and reliability production services.
You will use your expertise to improve the reliability and performance of Hadoop clusters and data management services.
We work with open-source technologies and get involved with SRE and Hadoop community.
You will test, monitor, administer, and operate multiple clusters across data centers, primarily in Python and Java.
Troubleshoot issues across the entire stack - hardware, software, application, and network.
Dive into problems with an eye to both immediate remediations as well as the follow-through changes and automation that will prevent future occurrences.
Must demonstrate exceptional troubleshooting and strong architectural skills, and clearly and effectively describe this in both a verbal and written format

Requirements

Customer focused, Self-driven and Motivated with a Strong work ethic and a passion for problem-solving.
+ 5 years of managing services in a distributed, Internet-scale, Unix and public cloud environment.
Familiarity with infrastructure management and operations lifecycle concepts and ecosystem.
**Experience with HDFS, YARN and related hadoop technologies is must.
Participate in an on-call rotation and working knowledge of industry best practices with regards to information security.
BS or MS degree in Computer Science or Engineering, or equivalent experience.

Apply for this position

Login with Google or GitHub to see instructions on how to apply. Your identity will not be revealed to the employer.

It is NOT OK for recruiters, HR consultants, and other intermediaries to contact this employer