Connect and model complex distributed data sets to build repositories, such as data warehouses, data lakes, using appropriate technologies.
Manage data related contexts ranging across addressing small to large data sets, structured/unstructured or streaming data, extraction, transformation, curation, modelling, building data pipelines, identifying right tools, writing SQL/Java/Scala code, etc.
• Create and maintain optimal data pipeline architecture
• Assemble large, complex data sets that meet functional / non-functional business requirements
• Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
• Build the infrastructure required for optimal extraction transformation, and loading of data from a wide variety of data sources ‘big data’ technologies