Easy way of running spark code in jupyter notebook on laptop/pc’s without heavy installation/configuration

M S Dillibabu
2 min readMay 9, 2022

1.
First you will need to install Docker Desktop. Go to Docker’s website and download Docker Desktop as shown in the screenshot below and install it.

2.
Next, open the Docker App on your Mac and wait for it to indicate “Docker Desktop is running” at the top with a green dot beside it.
3.
Next, open up your terminal and simply type

docker run -p 8888:8888 jupyter/pyspark-notebook .
4.
If it succeeded, you will see something which has http url. Copy and paste the last HTTP URL into your browser and click enter.
5.
And there you go! It’s as easy as that! You now have a Jupyter Lab environment with PySpark pre-configured, running in a docker container. Feel free to create a Jupyter notebook and start playing around with some PySpark commands! (Note that you will need to keep your terminal open before you leave your Jupyter environment.)

6.
First, open a new terminal window and run docker container ls . This will provide you with the list of docker containers that are currently running. Copy the Container ID beside the jupyter/pyspark-notebook image and then run:

docker exec -it [container_id] bash

or

docker exec -it -u root container_id bash

Replace [container_id] with the Container ID you copied earlier. Now you are in a Bash shell within the docker container. You can pip install packages or maybe even start a PySpark Shell in the terminal by typing

/user/local/spark/bin/pyspark .

7. If you want to use spark with Scala instead of pyspark then follow below steps

pip install spylon-kernel

python -m spylon_kernel install

then open spylon in jupyter notebook

--

--