Job Summary:
As the Lead Data Engineer, you will lead the data engineering team in building, testing, and optimizing sustainable data pipelines, architectures, and data lake platforms. You will collaborate closely with data architects and other stakeholders to ensure the seamless integration and transformation of data from various sources. Your role will be critical in setting the strategic direction for data engineering operations, ensuring data reliability, quality, and availability for business operations and advanced analytics.
Duties and Responsibilities
-Lead the design, development, and optimization of data engineering pipelines, integrating and streaming data from diverse sources using ETL/ELT processes.
- Oversee the implementation of methods to improve data reliability and quality, ensuring that raw data is transformed into consistent, machine-readable formats.
- Architect and develop data systems that enable efficient data extraction and transformation for predictive and prescriptive modelling.
- Define and implement development, testing, release, update, and support processes for data engineering operations, including troubleshooting and bug fixing.
- Develop and validate big data models and use cases, ensuring they are ready for data operations and accessible to end users.
- Lead efforts to create and execute queries on structured and unstructured data sources to identify process issues or perform mass updates.
- Ensure that batch production scheduling and report distribution processes are accurate, timely, and efficient.
- Build and maintain processes that support data transformation, data structures, metadata management, and workload management.
- Manage and prioritize ad hoc requests from users, including data research, file manipulation/transfer, and process issue resolution.
Job Requirements
Technical Skills
. Extensive hands-on experience in selecting and deploying appropriate CI/CD tools.
. Strong expertise in building continuous integration, continuous development, and deployment pipelines (CI/CD Pipeline).
. Advanced problem-solving, troubleshooting, and process-oriented skills.
. Proficiency in object-oriented/object function scripting languages such as Python, Java, C++, Scala, etc.
. Extensive experience with ETL/ELT technologies, including both streaming and batch processing.
. Proven experience in building and optimizing big data pipelines, architectures, and datasets.
. Hands-on experience with big data tools such as Hadoop, Hive, Impala, Spark, Kafka, etc.
. Expertise in stream-processing systems like Storm, Spark-Streaming, etc.
. Knowledge of the Cloudera Ecosystem is an advantage.
. Understanding of data models, data mining, and segmentation techniques is beneficial.
Qualifications and Experiences
Bachelor’s degree in Computer Science, Computer Engineering, or a related field.
Fluent in both written and spoken Arabic and English.
Preferred experience in leading or participating in large-scale Big Data projects.
7+ years of experience in data warehousing (DWH), ETL, and data engineering, with demonstrated leadership experience.