Senior Data Engineer is responsible for building and testing sustainable Data Pipelines, Data Architecture, Big Data, Data Lake platform/solution for the organization for easy data search and retrieval. Work in close collaboration with data architect to build an enterprise data architecture.
Duties and Responsibilities
1- Design, build and optimize data engineering pipelines to integrate and stream data from different sources and implement data ingestion routines, using ETL/ELT processes.
2- Implement methods to improve data reliability and quality. Combine raw data from different sources to create consistent and machine-readable formats
3- Develop and test architectures that enable data extraction and transformation for predictive or prescriptive modeling.
4- Defining and setting development, test, release, update, and support processes for data engineering operation, troubleshooting techniques and fixing the code bugs
5- Develop the big data models/use cases based on the architecture to and make it ready for data operations end users such as
6- Able to create and execute queries to structure and unstructured data sources to identify process issues or to perform mass updates, preferred.
7- Ensures that batch production scheduling and report distribution are accurate and timely.
8- Build processes supporting data transformation, data structures, metadata, dependency and workload management.
9- Performs ad hoc requests from users such as data research, file manipulation/transfer, research of process issues, etc.
Technical Skills
§ Hands-on experience on selecting and deploying appropriate CI/CD tools
§ Strive for continuous improvement and build continuous integration, continuous development, and constant deployment pipeline (CI/CD Pipeline)
§ Highly developed, process-oriented skills for troubleshooting, problem solving, and problem resolution.
§ Experience with object-oriented/object function scripting languages: Python, Java, C++, Scala, etc.
§ Hands-on experience with ETL/ELT technologies including streaming and batched.
§ Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
§ Experience with big data tools like Hadoop, Hive, Impala, Spark, Kafka, etc.
§ Experience with stream-processing systems: Storm, Spark-Streaming, etc.
§ Knowledge of Cloudera Ecosystem will be an advantage.
Knowledge of data models, data mining, and segmentation techniques will be an advantage.