We are a growing grocery app looking for an experienced Data Engineer to join our data team. As a key member of our team, you'll be responsible for building and maintaining our data infrastructure, optimizing our data processes, and ensuring data quality and accessibility across the organization.
Key Responsibilities:
1. Advanced Airflow Environment Development
- Build and maintain a robust Airflow environment to run scripts for creating datamarts and automating data logic
- Implement best practices including CI/CD, linter checking, secret key management, and Airflow variables
- Set up and manage Docker-based Airflow deployments for improved isolation and portability
- Configure and optimize Kubernetes executor for scalable and efficient task execution
- Implement advanced scheduling techniques, including dynamic task generation and cross-DAG dependencies
- Set up comprehensive monitoring and alerting for the Airflow environment
- Implement effective logging strategies for improved debuggability
- Ensure high availability and fault tolerance of the Airflow cluster
2. Data Pipeline Migration and Optimization
- Migrate existing data team scripts from the old Airflow environment to the new one
- Improve script quality and optimize performance during the migration process
- Implement data quality checks and SLAs for critical pipelines
3. Looker Governance and Optimization
- Manage Looker Enterprise implementation
- Develop and implement a strategy for tailored access to specific Looker explores for different teams
- Optimize Looker performance and ensure proper data governance
- Set up and maintain Look ML CI/CD pipelines
4. Web Scraping Projects
- Design and implement various web scraping projects to collect relevant external data
- Ensure the quality and reliability of scraped data
- Implement robust error handling and retry mechanisms for web scraping pipelines
5. General Data Engineering Tasks
- Collaborate with data scientists, analysts, and other stakeholders to understand data requirements
- Design, build, and maintain scalable data pipelines and ETL processes
- Build and maintain cloud-based data warehouses (e.g., Big Query), including schema design, optimization, and management
- Implement data modeling best practices for efficient querying and analysis
- Ensure data quality, reliability, and accessibility across the organization
- Optimize data warehouse performance and cost-efficiency
- Develop and maintain data documentation and metadata management systems