Niranjan Fartare

Data Engineer with 3.5 years’ of experience in AWS, PySpark & Spark. Built scalable ETL pipelines for Analytics, Finance & Telecom domains using EMR, Glue, and Redshift. Skilled in big data processing and cloud platforms.

Experience

Data Engineer

SourceFuse Inc, Remote

September 2025 - Present

Built scalable ETL pipelines using PySpark and Spark to process large datasets from AWS S3
Designed and maintained data ingestion and transformation workflows across the data lake
Deployed and managed Spark jobs on EMR for high-performance distributed processing
Migrated datasets and tables to modern storage formats like Apache Iceberg
Applied data validation and quality checks to ensure accurate and reliable outputs
Improved job performance through Spark tuning and optimized resource usage
Automated end-to-end data workflows using Apache Airflow for scheduling and monitoring
Ensured secure and compliant access to data using Apache Ranger policies
Skills Used: Airflow, Bash, Bitbucket, EMR, Git, Hive, Iceberg, PySpark, Python, Ranger, RDS, S3, Spark, SQL

Data Engineer

ERP Consulting, Remote

July 2023 – August 2025

Built scalable ETL pipelines using PySpark and Spark to process thousands of daily credit card transactions from AWS S3
Ingested and enriched raw transaction data from S3 using distributed Spark jobs on EMR
Deployed Spark applications on AWS EMR for high-performance processing of large datasets
Extracted reference data from RDS to enhance transaction datasets
Implemented validation and quality checks for accuracy
Optimized Spark jobs to cut processing time and costs
Automated workflows with Apache Airflow for scheduling and monitoring
Managed version control with Git (GitHub/Bitbucket)
Skills: Airflow, EMR, Git, Hive, PySpark, Python, RDS, Spark, S3