This Post will walk through how to download and set-up VirtualBox with Ubuntu. Then we will walk through installing Spark, Python and the Jupiter Notebook on this VirtualBox Ubuntu.
First you will need to download (just click on the names):
VirtualBox is basically going to allow you to have a virtual computer on your own physical computer.
You have to open the download page and you will see some download options. Just click the right host (depends on the machine you are using).
You just double click the downloaded file -> follow the instructions, do everything on the defaults.
Once you downloaded VirtualBox you have to download Ubuntu. Go to the ubuntu website and there are different options to download, but we need the Ubuntu Desktop version.
Once you opened the VirtualBox manager, you will click on New that is located on the top left corner. It will ask you the name of the operating system. We will call it myspark. Change type to Linux and the version to ubuntu (64-bit).
Click next and you will have to choose the memory size. It depends on the amount of RAM your computer has. Depending on the applications we suggest you 4-8Gb.
And there is a hard disc. We are going to create a virtual disk. Choose VDI (VirtualBox Disk Image) Type. A fixed size disk may take longer to create on some systems but is often faster to use and thats why we will choose it. Give it 20Gb and click create.
Double click on your created VirtualMachine. Eventually you will see a Pop-Up that says Select start-up disk and there you are going to point to Ubuntu, that you downloaded before.
You will see a little Pop-Up that is going to say either Try Ubuntu or Install Ubuntu. So we want to install Ubuntu and it will be only installed on your VirtualMachine. Then download updates while installing Ubuntu. Click continue. On the next page you have to click erase disk and install Ubuntu. Then select your or any Timezone and select the Keyboard layout and give your credentials. And voila Ubuntu is installed.
Python and Spark
First thing we want to do ist to confirm that Python 3.5 (or later) is already on Ubuntu. Select Terminal and if you type ~$ python3 you get Python 3.5…
Now we are going to install Jupiter Notebook system. For this just execute the following code:
pip3 install jupyter
If it says that pip3 is not installed, give the following code:
sudo apt install python3-pip
Try the previous command again to install Jupiter Notebook. Once this is done just type ~$ jupiter notebook and the Notebook system automatically opens. Copy and paste the link that appear in the terminal.
To download Spark open the Apache Spark website and go to the download menu. Choose the same options as on the Screenshot below. If you have a latest version available, you are free to choose it.
We want the package in the right location. So open the file explorer, cut the package and insert it to your home folder.
Then go to your command line and type this:
sudo tar -zxvf spark(and here you can click on Tab)
This is going to unzip it for us.
Now what we want to do is to tell Python where to find Spark,
export SPARK_HOME='home/ubuntu/saprk-2.1.0-bin-hadoop2.7' export PATH=$SPARK_HOME:$PATH export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH export PYSPARK_DRIVER-PYTHON="jupyter" export PYSPARK_DRIVER_PYTHON_OPTS="notebook" export PYSPARK_PYTHON=python3