M S Dillibabu – Medium

M S Dillibabu

Alteryx to Spark Converter Tool

Tool link— https://alteryx2spark-jxbkewujwixs7547z4iaru.streamlit.app/

4 min readApr 1, 2023

--

Alteryx to Spark Converter Tool

--

M S Dillibabu

Spark df Upsert (SCD-1 and SCD-2) records to RDBMS

Through Spark code we can find upsert records by comparing source data with the target data (if you want to know about finding the upsert…

2 min readOct 12, 2022

--

Spark df Upsert (SCD-1 and SCD-2) records to RDBMS

--

M S Dillibabu

SCD Type-1 (Upsert) implementation in Spark Scala

If source don't have a date column, below is the implementation of attaching a date column with current date for each run.

3 min readSep 7, 2022

--

SCD Type-1 (Upsert) implementation in Spark Scala

--

M S Dillibabu

Important concept in spark -Hash Partitioning, Range Partitioning and Custom Partitioning

Partitioning — it means dividing the data into small parts and storing it in distributed systems for parallel computing.

3 min readJun 28, 2022

--

--

M S Dillibabu

Spark optimization in-depth part -2

Spark Optimization techniques — 2 :-

8 min readJun 3, 2022

--

1

--

1

M S Dillibabu

Spark Optimization techniques :-

1. Don’t use collect. Use take() instead

7 min readMay 29, 2022

--

1

Spark Optimization techniques :-

--

1

M S Dillibabu

Best Notebook for testing/ spark code on Google cloud clusters

I have prepared a poc kind of comparison for the best notebook among jupyter, databricks and zeppelin notebooks on top of dataproc cluster…

2 min readMay 24, 2022

--

Best Notebook for testing/ spark code on Google cloud clusters

--

M S Dillibabu

Easiest way of running spark code in jupyter notebook without heavy installation/configuration

1. First you will need to install Docker Desktop. Go to Docker’s website and download Docker Desktop as shown in the screenshot below and…

2 min readMay 9, 2022

--

--

M S Dillibabu

Important interview point on -coalesce() vs repartition() performance evaluation

We may think that coalesce is the best approach for reducing the number of partitions when compare with repartition. Yes, but not in all…

2 min readApr 4, 2022

--

2

Important interview point on -coalesce() vs repartition() performance evaluation

--

2

M S Dillibabu

SQL Interview tips

SQL order of execution

2 min readMar 1, 2022

--

--

M S Dillibabu

M S Dillibabu

Data Engineer @walmartlabs, Linkedin — https://www.linkedin.com/in/minnald/

Following

Help
Status
About
Careers
Blog
Privacy
Terms
Text to speech
Teams