This post is over 30 days old. The position may no longer be available

South Asia Data Engineer

OCCRP , Remote, located in the South Asia region. · · Freelance or consulting · Programming

Application Deadline: April 4th, 2021

The Organized Crime and Corruption Reporting Project (OCCRP) is a growing, global nonprofit media organization that is reinventing investigative journalism for the public good. By developing and equipping a global network of investigative journalists and publishing their stories, we expose crime and corruption so the public can hold power to account. We see a future where organized crime and corruption are drastically reduced and democracy is strengthened. Our global team includes editors, researchers, data engineers, security specialists, administrators, technologists, and strategists, each with areas of in-depth expertise.

Position Overview

The Data Desk helps journalists get and make sense of the data they need to do their reporting. You will join a team of data specialists who use a variety of techniques to acquire, munge and analyse data needed for follow-the money-investigations. As the OCCRP network expands into South Asia, we are seeking an experienced data engineer to support our journalists, partners and projects in the region.

As a Data Engineer on our Data Desk you will scrape company registries, access APIs, ingest leaks and manipulate SQL databases. A large part of the role is to map data into Aleph - our data platform which houses thousands of datasets - leaks, company and property registries and public records - that our team has collected in order to help our reporters to follow the money.

You will clean and process the data, making it available and searchable on Aleph. You will help journalists to use Aleph. You will work with journalists from within OCCRP and from our wider network of member centers to understand and make sense of the data. This could be running cross references to find the overlap with other data, or doing statistical analysis and calculations to provide the reporters with the information they need. This work is invaluable to our reporting - you will be an essential player in finding leads and supporting stories that hold those in power to account.

You will be trained to work in-depth with Aleph and the data team’s ecosystem of tooling. Security is of the utmost importance considering the sensitivity of some of the data we work with and you will be required to adhere to our strict security standards.

Job Description

The Data Desk is responsible for supporting OCCRP’s staff and member centres to work with data in their investigations. The data investigator tasks, as directed from time to time, may include:

  • Supporting journalists to work with data in their investigations, including handling communications back and forth in easy-to-understand language, tracking queries and documenting your work.
  • Building data extraction and cleaning solutions for specific datasets using web crawlers and cleaning and importing leaked datasets.
  • Assisting in maintaining and improving existing crawlers and datasets as needed.
  • Analysing and summarising new leaks, finding angles for stories, exploring and communicating key connections and leads, suggesting other relevant resources.
  • Liaising with the Aleph product team to coordinate on investigative needs and with the Research Team on overlapping research and stories.
  • Exploring improved ways for journalists to understand and interface with OCCRP’s diverse data sources.

Person Description

Essential Skills

  • Adept at basic statistical methods
  • Passion for writing clean and maintainable code in Python and documenting work
  • Comfortable working with SQL databases (we use PostgreSQL)
  • Comfortable with configuration management (Git)
  • Skilled with Linux: bash, basic networking
  • Expert web scraping: good understanding of HTTP, HTML, CSS, Javascript
  • Strong grasp of operational security
  • Ability to communicate effectively and express ideas in a clear and concise manner. English language proficiency as well as proficiency in at least one regional South Asian language.
  • Ability to manage expectations in a fluid environment
  • Ability to work well with people from a range of countries, languages and backgrounds, online and in person.

Skills Desirable

  • Familiar with Docker and Kubernetes
  • Practiced in ElasticSearch, Kibana, Logstash
  • Competent with Google Cloud

Experience Essential

  • At least 5 years working on data-centric projects
  • At least 3 years of web scraping experience

Experience Desirable

  • Experience managing data projects with a diverse team
  • Experience teaching data skills to non-technical people
  • Experience working on data journalism projects or with journalists on investigations or with activists/NGOs on data-driven projects
  • Managed data projects using GKE and Google Cloud
  • Familiarity with finding and working with public records and registers in the region (specifically, India, Nepal, Bhutan, Sri Lanka, Maldives, Pakistan and Bangladesh

As an equal opportunity employer, OCCRP values having a diverse workforce and continuously strives to maintain an inclusive and equitable workplace. We offer competitive compensation and benefits and encourage people with a diverse range of backgrounds to apply. We do not discriminate against any person based upon race, religion, color, national origin, sex, medical conditions, family status, sexual orientation, gender identity, gender expression, age, disability, genetic information, or any other legally protected characteristics. If you are a qualified applicant requiring assistance or an accommodation to complete any step of the application process, please contact hr[at]

Apply for this position

Login with Google or GitHub to see instructions on how to apply. Your identity will not be revealed to the employer.

It is NOT OK for recruiters, HR consultants, and other intermediaries to contact this employer