Niranjan Fartare

Data Engineer with 3.5 years’ of experience in AWS, PySpark & Spark. Built scalable ETL pipelines for Analytics, Finance & Telecom domains using EMR, Glue, and Redshift. Skilled in big data processing and cloud platforms.

Experience

Data Engineer

SourceFuse Inc, Remote

September 2025 - Present
  • Built scalable ETL pipelines using PySpark and Spark to process large datasets from AWS S3
  • Designed and maintained data ingestion and transformation workflows across the data lake
  • Deployed and managed Spark jobs on EMR for high-performance distributed processing
  • Migrated datasets and tables to modern storage formats like Apache Iceberg
  • Applied data validation and quality checks to ensure accurate and reliable outputs
  • Improved job performance through Spark tuning and optimized resource usage
  • Automated end-to-end data workflows using Apache Airflow for scheduling and monitoring
  • Ensured secure and compliant access to data using Apache Ranger policies
  • Skills Used: Airflow, Bash, Bitbucket, EMR, Git, Hive, Iceberg, PySpark, Python, Ranger, RDS, S3, Spark, SQL

Data Engineer

ERP Consulting, Remote

July 2023 – August 2025
  • Built scalable ETL pipelines using PySpark and Spark to process thousands of daily credit card transactions from AWS S3
  • Ingested and enriched raw transaction data from S3 using distributed Spark jobs on EMR
  • Deployed Spark applications on AWS EMR for high-performance processing of large datasets
  • Extracted reference data from RDS to enhance transaction datasets
  • Implemented validation and quality checks for accuracy
  • Optimized Spark jobs to cut processing time and costs
  • Automated workflows with Apache Airflow for scheduling and monitoring
  • Managed version control with Git (GitHub/Bitbucket)
  • Skills: Airflow, EMR, Git, Hive, PySpark, Python, RDS, Spark, S3

Data Engineer Intern

ERP Consulting, Remote

June 2022 – June 2023
  • Built centralized data warehouse in AWS Redshift for telecom customer data
  • Processed usage, billing, and service data from AWS S3
  • Developed automated ETL pipelines with AWS Glue
  • Managed schemas with AWS Glue Data Catalog
  • Queried data with AWS Athena for analytics
  • Set up CloudWatch monitoring and alerts
  • Optimized Redshift table design and query performance
  • Skills: Athena, CloudWatch, Glue, RDS, Redshift, S3

Skills

Big Data Processing: Apache Spark, Databricks, Hadoop, HDFS, Hive, PySpark

Cloud Services: Athena, EC2, EMR, Glue, RDS, Redshift, S3

Programming Languages: Bash, Java, Python, SQL

Project Management Tools: Confluence, Jira, SharePoint

Soft Skills: Adaptability, Attention to Detail, Problem Solving, Quick Learner

System & Infrastructure: Cloud infrastructure management, Distributed computing environments, Linux administration, Shell Scripting

Tools: DBeaver, Jupyter Notebooks, MySQL Workbench, PyCharm, Putty, VSCode

Version Control & Code Management: Bitbucket, Git, GitHub

Workflow & Orchestration: Apache Airflow

Latest Posts

October 22, 2024

Two Sum in Java

July 1, 2024

Hello World!