Saturday 31 March 2018

BigDL Spark deep learning library VM now available on Microsoft Azure Marketplace

BigDL deep learning library is a Spark-based framework for creating and deploying deep learning models at scale. While it has previously been deployed on Azure HDInsight and Data Science VM, it is now also available on Azure Marketplace as a fixed VM image, representing a further step by Intel to reduce deployment complexity for users.

Because BigDL is an integral part of Spark, a user does not need to explicitly manage distributed computations. A BigDL application provides high-level control “knobs”, such as the number of compute nodes, cores, and batch size, a BigDL application also leverages stable Spark infrastructure for node communications and resource management during its execution. BigDL applications can be written in either Python or Scala and achieve high performance through both algorithm optimization and taking advantage of close integration with Intel’s Math Kernel Library (MKL).

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

What is the Microsoft Azure Marketplace? The Azure Marketplace is an online applications and services marketplace that enables start-ups, independent software vendors (ISVs), and MSP/SIs to offer their Azure-based solutions or services to customers around the world. 

Introduction


This post describes two use case scenarios to deploy BigDL_v0.4 in Azure VMs:

◈ First scenario: Deploying an Azure VM with a pre-built BigDL_v0.4 image and running a basic deep learning example.

◈ Second scenario: Deploying BigDL_v0.4 on a bare-bones Ubuntu VM (for advanced users).

First scenario: Deploying a pre-built BigDL_v0.4 VM image:


Log in to your Microsoft Azure account. BigDL requires you to have an Azure subscription. You can get a free trial by visiting BigDL offering on Azure Marketplace and clicking Get it now.

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

You should see the following page. Click on the Create button at the bottom.

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

Enter the requested information in the fields at the prompts. Note that Azure imposes syntax limitations on some of the fields (such as using only alphanumeric characters and no CAPS). Use lowercase letters and digits and you will be fine. Use the following three screenshots for guidance.

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

Spark is memory-intensive, so select a machine with a larger amount of RAM. Note that not all VM types and sizes are available in certain regions. For simple tasks and testing, the virtual machine displayed in the following screenshot will meet requirements:

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

After the VM is provisioned, copy its public IP address. Note that this public IP address will change every time you stop and restart your VM. Keep this in mind if you are thinking of BigDL automation.

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

After deployment, you can modify the IP address provided in the resource group and set it up as a static IP address:

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

You are now ready to SSH into your BigDL VM. You can use your favorite SSH client. For this example, MobaXterm is used.

Enter the IP address and the username you selected when creating the VM.

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

Check the versions of installed dependencies:

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

Before using pre-installed BigDL, you will need to change ownership of the directory.

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

BigDL was pre-installed into the bigdlazrmktplc directory. Yet ‘testuser’ does not have full privileges to access it.

To change this, type:

$sudo chown -R testuser:testuser bigdlazrmktplc

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

Now ‘testuser’ owns the bigdlazrmktplc directory.

Finally, test that BigDL actually works in this VM by entering the following commands:

$cd bigdlazrmktplc/BigDL
$export SPARK_HOME=/usr/local/spark/spark-2.2.0-bin-hadoop2.7
$export BIGDL_HOME=/home/bigdlazrmktplc/BigDL
$BigDL/bin/pyspark-with-bigdl.sh --master local[*]

If the commands are successful you will see the following:

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

At the command prompt, copy and paste the following example code, the source can be found on GitHub

from bigdl.util.common import *
from pyspark import SparkContext
from bigdl.nn.layer import *
import bigdl.version
# create sparkcontext with bigdl configuration
sc = SparkContext.getOrCreate(conf=create_spark_conf().setMaster("local[*]"))
init_engine() # prepare the bigdl environment
bigdl.version.__version__ # Get the current BigDL version
linear = Linear(2, 3) # Try to create a Linear layer

If the commands are successful, you will see the following:

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

BigDL is now ready for you to use.

Second Scenario: Deploying BigDL_v0.4 on a bare-bones Ubuntu Azure VM


First, you will need to create an Azure subscription. You can get a free trial by navigating to BigDL offering on Azure Marketplace and clicking Get it now.

Log in to the Azure Portal, go to New, and select Ubuntu server 16.04 LTS VM (LTS = Long Term Support).

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

Enter the basic VM attributes using only lower-case letters and numbers.

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

For Spark jobs you want to select VMs with a large amount of RAM available.

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

After your VM has been created, you can SSH into it using the username and password which you created previously.

Copy the Public IP address of the VM:

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

This creates a very basic Ubuntu machine, so you must install the following addtional components to run BigDL:

◈ Java Runtime Environment (JRE)
◈ Scala
◈ Spark
◈ Python packages
◈ BigDL

Installing the Java Runtime Environment (JRE)


At the command prompt, type the following commands:

$sudo add-apt-repository ppa:webupd8team/java
$sudo apt-get update
$sudo apt-get install oracle-java8-installer
$sudo apt-get install oracle-java8-set-default

Confirm the installation and JRE version by typing

$java -version

Installing Scala and confirming version


At the command prompt, type the following commands:

$sudo apt-get install scala
$scala -version

Installing Spark 2.2.x


At the command prompt, type the following commands:

$sudo wget http://mirrors.ibiblio.org/apache/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz
$sudo tar xvzf spark-2.2.0-bin-hadoop2.7.tgz
$rm spark-2.2.0-bin-hadoop2.7.tgz
$sudo mkdir /usr/local/spark
$sudo mv spark-2.2.0-bin-hadoop2.7 /usr/local/spark
Verify Spark installation:
$cd  /usr/local/spark/spark-2.2.0-bin-hadoop2.7/
$./bin/spark-submit –version

Microsoft Azure BigDL, Microsoft Azure Certifications, Microsoft Azure Learning, Microsoft Azure Guides, Microsoft Azure VM

Installing BigDL


The main repo for BigDL downloadable releases.

For Spark 2.2.0 and Scala 2.11.x, select Dist-spark-2.2.0-scala-2.11.8-all-0.4.0-dist.zip

At the command prompt, type the following commands:

$cd ~
$mkdir BigDL
$cd BigDL
$sudo wget https://s3-ap-southeast-1.amazonaws.com/bigdl-download/dist-spark-2.2.0-scala-2.11.8-all-0.4.0-dist.zip

 $sudo apt-get install unzip
 $unzip dist-spark-2.2.0-scala-2.11.8-all-0.4.0-dist.zip
 $rm dist-spark-2.2.0-scala-2.11.8-all-0.4.0-dist.zip

Installing Python 2.7 packages


Ubuntu 16x on Azure comes with pre-installed python 2.7. However, there are a couple of additional packages that must be installed.
At the command prompt, type the following commands:

$sudo apt-get install python-numpy
$sudo apt-get install python-six

Update all packages and dependencies by typing

$sudo apt-get update

Verify BigDL installation


The instructions to verify that BigDL was installed correctly are available.

At the command prompt, type the following commands:

$export SPARK_HOME=/usr/local/spark/spark-2.2.0-bin-hadoop2.7
$export BIGDL_HOME=/home/bigdlazrmktplc/BigDL
Launch PySpark (from BigDL directory)

$bin/pyspark-with-bigdl.sh --master local[*]

At the prompt, copy and paste the following code, this code can also be found at Github.

from bigdl.util.common import *
from pyspark import SparkContext
from bigdl.nn.layer import *
import bigdl.version
# create sparkcontext with bigdl configuration
sc = SparkContext.getOrCreate(conf=create_spark_conf().setMaster("local[*]"))
init_engine() # prepare the bigdl environment
bigdl.version.__version__ # Get the current BigDL version
linear = Linear(2, 3) # Try to create a Linear layer

You should see the following:

creating: createLinear
cls.getname: com.intel.analytics.bigdl.python.api.Sample
BigDLBasePickler registering: bigdl.util.common  Sample
cls.getname: com.intel.analytics.bigdl.python.api.EvaluatedResult
BigDLBasePickler registering: bigdl.util.common  EvaluatedResult
cls.getname: com.intel.analytics.bigdl.python.api.JTensor
BigDLBasePickler registering: bigdl.util.common  JTensor
cls.getname: com.intel.analytics.bigdl.python.api.JActivity
BigDLBasePickler registering: bigdl.util.common  JActivity
>>>

Finally, install Maven to allow you to build BigDL applications by typing the following:

$sudo apt-get install maven

Your VM is now ready for running deep learning examples at scale.

Related Posts

0 comments:

Post a Comment