Responsibilities

  • Design and implement data pipelines in Hadoop platform
  • Understand business requirement and solution design to develop and implement solutions that adhere to big data architectural guidelines and address business requirements
  • Fine-tuning of new and existing data pipelines
  • Schedule and maintain data pipelines
  • Drive optimization, testing and tooling to improve data quality
  • Assemble large, complex data sets that meet functional / non-functional business requirements
  • Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, etc
  • Build robust and scalable data infrastructure (both batch processing and real-time) to support needs from internal and external users
  • Work with data scientist and business analytics team to assist in data ingestion and data related technical issues

Requirements

  • Bachelor’s degree in IT, Computer Science, Software Engineering, Business Analytics or equivalent.
  • Minimum 4 years of experience in data warehousing / distributed system such as Hadoop
  • Experience with relational SQL and NoSQL DB
  • Experience in building and optimizing ‘big data’ data pipelines, architectures and data sets
  • Excellent experience in Scala or Python
  • Experience in ETL and / or data wrangling tools for big data environment
  • Ability to troubleshoot and find complex performance issues with queries on the Spark platform
  • Knowledgeable on structured and unstructured data design / modeling, data access and data storage techniques
  • Experience in DevOps environment
  • Highly organized, self-motivated, pro-active, and able to plan
  • Ability to analyze and understand complex problems Ability to explain technical information in business terms
  • Ability to communicate clearly and effectively, both verbally and in writing Strong in User Requirements Gathering, Maintenance and Support
  • Good experience managing users and vendors
  • Familiar with Agile Methodology