Responsibilities

  • Design and architect data storage solutions, including databases, data lakes, and warehouses, using AWS services such as Amazon S3, Amazon RDS along with Databricks' Delta Lake.
  • Integrate Informatica IDMC for metadata management and data cataloging.
  • Create, manage, and optimize data pipelines for ingesting, processing, and transforming data using AWS services like AWS Lambda, Databricks for advanced data processing, and Informatica IDMC for data integration and quality.
  • Integrate data from various sources, both internal and external, into AWS and Databricks environments, ensuring data consistency and quality, while leveraging Informatica IDMC for data integration, transformation, and governance.
  • Develop ETL (Extract, Transform, Load) processes to cleanse, transform, and enrich data, making it suitable for analytical purposes using Databricks' Spark capabilities and Informatica IDMC for data transformation and quality.
  • Monitor and optimize data processing and query performance in both AWS and Databricks environments, making necessary adjustments to meet performance and scalability requirements. Utilize Informatica IDMC for optimizing data workflows.
  • Implement security best practices and data encryption methods to protect sensitive data in both AWS and Databricks, while ensuring compliance with data privacy regulations. 
  • Employ Informatica IDMC for data governance and compliance.
  • Implement automation for routine tasks, such as data ingestion, transformation, and monitoring, using AWS services like AWS Lambda, Databricks Jobs, and Informatica IDMC for workflow automation. 
  • Maintain clear and comprehensive documentation of data infrastructure, pipelines, and configurations in both AWS and Databricks environments, with metadata management facilitated by Informatica IDMC.
  • Collaborate with cross-functional teams, including data scientists, analysts, and software engineers, to understand data requirements and deliver appropriate solutions across AWS, Databricks, and Informatica IDMC.
  • Identify and resolve data-related issues and provide support to ensure data availability and integrity in both AWS, Databricks, and Informatica IDMC environments.
  • Optimize AWS, Databricks, and Informatica resource usage to control costs while meeting performance and scalability requirements.
  • Stay up-to-date with AWS, Databricks, Informatica IDMC services, and data engineering best practices to recommend and implement new technologies and techniques.

Requirements

  • Bachelor’s or master’s degree in computer science, data engineering, or a related field.
  • Minimum 5-7 years of experience in data engineering, with expertise in AWS services, Databricks, and/or Informatica IDMC.
  • Hands on data migration (e.g. code migration from Information Power Center on premises to cloud version, data migration from Teradata/Oracle/PostgreSQL to Databricks)
  • Proficiency in programming languages such as Python or Java for building data pipelines.
  • Evaluate potential technical solutions and make recommendations to resolve data issues especially on performance assessment for complex data transformations and long running data processes.
  • Strong knowledge of SQL and NoSQL databases.
  • Familiarity with data modeling and schema design.
  • Excellent problem-solving and analytical skills.
  • Strong communication and collaboration skills.
  • AWS certifications (e.g., AWS Certified Data Analytics - Specialty), Databricks certifications, and Informatica certifications are a plus
 
Shortlisted candidates will be offered a 6 months Agency Contract employment