Required Skills/Responsibilities:
?Expertise and knowledge: Cloudera Data Platform, Oozie, Hive, Spark, Spark Streaming and Presto
Data Pipeline Development:
- Design, develop, and implement scalable data pipelines using Cloudera tools like Hadoop, Spark, Hive, Impala, and HDFS.
- Write and optimize ETL processes to extract, transform, and load data into data lakes or warehouses.
Big Data Application Development:
- Develop applications to process large datasets efficiently using frameworks such as Apache Spark and MapReduce.
- Build solutions for batch and real-time data processing.
Cluster Management:
- Work with Cloudera Manager for cluster setup, configuration, monitoring, and performance optimization.
- Ensure high availability and scalability of Cloudera clusters.
-??System dimensioning (computational resources/Storage/Networks).
-??System reconfiguration in case of HW extension and/or replacement.
-??OS and Cloudera Software upgrades.
-??Cloudera SW vulnerabilities and patching management.
-??Access and permission management.
-??Installation of any other Cloudera application if needed.
Data Storage and Management:
- Design and implement data storage strategies using HDFS, HBase, and other Cloudera-supported tools.
- Optimize data storage and retrieval processes to improve performance.
Performance Tuning:
- Monitor and optimize the performance of Hadoop and Spark jobs.
- Troubleshoot and resolve performance bottlenecks in data pipelines.
-??Assist in Designing scalable architectures for high volume data.
-??Ensure E2E pipeline stability for already developed and future use cases.
-??Performance tuning of Spark workflows.
Integration and Collaboration:
- Integrate Cloudera solutions with external systems, databases, and APIs.
- Collaborate with data scientists, analysts, and other teams to understand requirements and deliver data solutions.