Responsibilities
- Design and architect data storage solutions, including databases, data lakes, and warehouses, using AWS services such as Amazon S3, Amazon RDS along with Databricks' Delta Lake.
- Integrate Informatica IDMC for metadata management and data cataloging.
- Create, manage, and optimize data pipelines for ingesting, processing, and transforming data using AWS services like AWS Lambda, Databricks for advanced data processing, and Informatica IDMC for data integration and quality.
- Integrate data from various sources, both internal and external, into AWS and Databricks environments, ensuring data consistency and quality, while leveraging Informatica IDMC for data integration, transformation, and governance.
- Develop ETL (Extract, Transform, Load) processes to cleanse, transform, and enrich data, making it suitable for analytical purposes using Databricks' Spark capabilities and Informatica IDMC for data transformation and quality.
- Monitor and optimize data processing and query performance in both AWS and Databricks environments, making necessary adjustments to meet performance and scalability requirements. Utilize Informatica IDMC for optimizing data workflows.
- Implement security best practices and data encryption methods to protect sensitive data in both AWS and Databricks, while ensuring compliance with data privacy regulations.
- Employ Informatica IDMC for data governance and compliance.
- Implement automation for routine tasks, such as data ingestion, transformation, and monitoring, using AWS services like AWS Lambda, Databricks Jobs, and Informatica IDMC for workflow automation.
- Maintain clear and comprehensive documentation of data infrastructure, pipelines, and configurations in both AWS and Databricks environments, with metadata management facilitated by Informatica IDMC.
- Collaborate with cross-functional teams, including data scientists, analysts, and software engineers, to understand data requirements and deliver appropriate solutions across AWS, Databricks, and Informatica IDMC.
- Identify and resolve data-related issues and provide support to ensure data availability and integrity in both AWS, Databricks, and Informatica IDMC environments.
- Optimize AWS, Databricks, and Informatica resource usage to control costs while meeting performance and scalability requirements.
- Stay up-to-date with AWS, Databricks, Informatica IDMC services, and data engineering best practices to recommend and implement new technologies and techniques.
Requirements
- Bachelor’s or master’s degree in computer science, data engineering, or a related field.
- Minimum 5-7 years of experience in data engineering, with expertise in AWS services, Databricks, and/or Informatica IDMC.
- Hands on data migration (e.g. code migration from Information Power Center on premises to cloud version, data migration from Teradata/Oracle/PostgreSQL to Databricks)
- Proficiency in programming languages such as Python or Java for building data pipelines.
- Evaluate potential technical solutions and make recommendations to resolve data issues especially on performance assessment for complex data transformations and long running data processes.
- Strong knowledge of SQL and NoSQL databases.
- Familiarity with data modeling and schema design.
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration skills.
- AWS certifications (e.g., AWS Certified Data Analytics - Specialty), Databricks certifications, and Informatica certifications are a plus
Shortlisted candidates will be offered a 6 months Agency Contract employment