Easy way of running spark code in jupyter notebook on laptop/pc’s without heavy installation/configuration
1.
First you will need to install Docker Desktop. Go to Docker’s website and download Docker Desktop as shown in the screenshot below and install it.
2.
Next, open the Docker App on your Mac and wait for it to indicate “Docker Desktop is running” at the top with a green dot beside it.
3.
Next, open up your terminal and simply type
docker run -p 8888:8888 jupyter/pyspark-notebook .
4.
If it succeeded, you will see something which has http url. Copy and paste the last HTTP URL into your browser and click enter.
5.
And there you go! It’s as easy as that! You now have a Jupyter Lab environment with PySpark pre-configured, running in a docker container. Feel free to create a Jupyter notebook and start playing around with some PySpark commands! (Note that you will need to keep your terminal open before you leave your Jupyter environment.)
6.
First, open a new terminal window and run docker container ls . This will provide you with the list of docker containers that are currently running. Copy the Container ID beside the jupyter/pyspark-notebook image and then run:
docker exec -it [container_id] bash
or
docker exec -it -u root container_id bash
Replace [container_id] with the Container ID you copied earlier. Now you are in a Bash shell within the docker container. You can pip install packages or maybe even start a PySpark Shell in the terminal by typing
/user/local/spark/bin/pyspark .
7. If you want to use spark with Scala instead of pyspark then follow below steps
pip install spylon-kernel
python -m spylon_kernel install
then open spylon in jupyter notebook