Thursday, 30 August 2018

Multi-member consortium support with Azure Blockchain Workbench 1.3.0

Continuing our monthly release cadence for Azure Blockchain Workbench, we’re excited to announce the availability of version 1.3.0. You can either deploy a new instance of Workbench through the Azure Portal or upgrade your existing deployment to 1.3.0 using our upgrade script.

This update includes the following improvements:

Faster and more reliable deployment


We look at telemetry every day to identify issues that affect our customers, and as a result made some changes to make deploying Workbench not only more reliable, but faster as well.

Better transaction reliability


Continuing from the monitoring improvements we made as part of 1.1.0, we’ve made reliability improvements to the DLT Watcher and DLT Consumer microservices. Practically speaking, you’ll notice fewer errors saying, “It looks like something went wrong …”

Ability to deploy Workbench in a multi-member Ethereum PoA consortium


With release 1.2.0 you could deploy Workbench and connect that deployment to an existing Ethereum-based network. This past week we announced the availability of a new standalone Ethereum PoA solution, which can be deployed across members within a consortium. With these two updates, you can deploy Workbench in three different configurations:

1. Single-Member System: The default configuration of Workbench, where Workbench is deployed in a blockchain network with only one member.

Azure Blockchain, Azure Certification, Azure Guides, Azure Learning, Azure Study Materials

2. Multi-Member System Deployed in One Member’s Subscription: You can use the new multi-member PoA consortium solution to deploy a blockchain network across several members. The, you can deploy Workbench in one member’s subscription. Everyone who wants to use Workbench will go through the one member’s Workbench deployment. This topology can be useful for PoCs and initial pilot deployments.

Azure Blockchain, Azure Certification, Azure Guides, Azure Learning, Azure Study Materials

3. Multi-Member System Deployed in One Shared Subscription: This configuration is similar to the topology described above, except that Workbench is deployed in a shared subscription. Think of this shared subscription as the operator subscription for the consortium.

Azure Blockchain, Azure Certification, Azure Guides, Azure Learning, Azure Study Materials

We are investigating other topologies, such as one where Workbench is deployed into each subscription.

Simpler pre-deployment script for AAD


We know going through Workbench deployment instructions can feel like a lot of work, especially setting up AAD and registering an app for the Workbench API. To make things easier, we’ve created a new PowerShell script, which automates most of the AAD setup steps and outputs the parameters you need for Workbench deployment.

Azure Blockchain, Azure Certification, Azure Guides, Azure Learning, Azure Study Materials

Sample code and tool for working with the Workbench API


Some of you have asked for more sample code and tools related to generating authentication bearer tokens for Workbench API. We’re excited to announce a new tool, which you can use to generate tokens for your Workbench instance. Source code is also available and can be used to create your own client authentication experience. Try it out by cloning the repo, running the Web page, and plugging in your Application Id.

Tuesday, 28 August 2018

Respond to threats faster with Security Center’s Confidence Score

Azure Security Center provides you with visibility across all your resources running in Azure and alerts you of potential or detected issues. The volume of alerts can be challenging for a security operations team to individually address. Due to the volume of alerts, security analysts have to prioritize which alerts they want to investigate. Investigating alerts can be complex and time consuming, so as a result, some alerts are ignored.

Security Center can help your team triage and prioritize alerts with a new capability called Confidence Score. The Confidence Score automatically investigates alerts by applying industry best practices, intelligent algorithms, and processes used by analysts to determine whether a threat is legitimate and provides you with meaningful insights.

How is the Azure Security Center Confidence Score triggered?


Alerts are generated due to detected suspicious processes running on your virtual machines. Security Center reviews and analyzes these alerts on Windows virtual machines running in Azure. It performs automated checks and correlations using advanced algorithms across multiple entities and data sources across the organization and all your Azure resources.

Results of Azure Security Center Confidence Score


The Confidence Score ranges between 1 to 100 and represents the confidence that the alert should be investigated. The higher the score is, the higher the confidence is that this alert indicates true malicious activity. Additionally, the Confidence Score provides a list of the top reasons why the alert received its Confidence Score. The Confidence Score makes it easier for the security analyst to prioritize his or her response to alerts and address the most pressing attacks first, ultimately reducing the amount of time it takes to respond to attacks and breaches.

You can find the Confidence Score in the Security alerts blade. The alerts and incidents are ordered based on Security Center’s confidence that they are legitimate threats. Here, you can see that the incident Suspicious screensaver process execution received a confidence score of 91.

Azure Certification, Azure Learning, Azure Tutorial and Material, Azure Security, Azure Study Materials

When drilling down in the Security alert blade, in the Confidence section, you can view the observations that contributed to the confidence score and gain insights related to the alert. This enables you to get more insight into the nature of the activities that caused the alert.

Azure Certification, Azure Learning, Azure Tutorial and Material, Azure Security, Azure Study Materials

Use Security Center’s Confidence Score to prioritize alert triage in your environment. The confidence score saves you time and effort by automatically investigating alerts, applying industry best practices and intelligent algorithms, and acting as a virtual analyst to determine which threats are real and where you need to focus your attention.

Saturday, 25 August 2018

How to enhance HDInsight security with service endpoints

HDInsight enterprise customers work with some of the most sensitive data in the world. They want to be able to lock down access to this data at the networking layer as well. However, while service endpoints have been available in Azure data sources, HDInsight customers couldn’t leverage this additional layer of security for their big data pipelines due to the lack of interoperability between HDInsight and other data stores. As we have recently announced, HDInsight is now excited to support service endpoints for Azure Blob Storage, Azure SQL databases and Azure Cosmos DB.

With this enhanced level of security at the networking layer, customers can now lock down their big data storage accounts to their specified Virtual Networks (VNETs) and still use HDInsight clusters seamlessly to access and process that data.

In the rest of this post we will explore how to enable service endpoints and point out important HDInsight configurations for Azure Blob Storage, Azure SQL DB, and Azure CosmosDB.

Azure Blob Storage


When using Azure Blob Storage with HDInsight, you can configure selected VNETs on a blob storage firewall settings. This will ensure that only traffic from those subnets can access this storage account.

It is important to check the "Allow trusted Microsoft services to access this storage account." This will ensure that HDInsight service will have access to storage accounts and provision the cluster in a seamless manner.

HDInsight Security, Microsoft Certification, Microsoft Learning, Azure Study Materials

If the storage account is in a different subscription than the HDInsight cluster, please make sure that HDInsight resource provider is registered with the storage subscription. If HDInsight resource provider is not registered properly you might get this error message, which can be solved by registration of the resource provider.

NOTE: HDInsight cluster must be deployed into one of the subnets allowed in the blob storage firewall. This will ensure that the traffic from cluster VMs can reach the storage.

Azure SQL DB


If you are using an external SQL DB for Hive or Oozie metastore, you can configure service endpoints. “Allow access to Azure services” is not a required step from HDInsight point of view, since accessing these databases will happen after the cluster is created and the VMs are injected to the VNET.

HDInsight Security, Microsoft Certification, Microsoft Learning, Azure Study Materials

NOTE: HDInsight cluster must be deployed into one of the subnets allowed in the SQL DB firewall. This will ensure that the traffic from cluster VMs can reach the SQL DB.

Azure Cosmos DB


If you are using the spark connector for Azure Cosmos DB you can enable service endpoints in Cosmos DB firewall settings and seamlessly connect to it from HDInsight cluster.

HDInsight Security, Microsoft Certification, Microsoft Learning, Azure Study Materials

NOTE: HDInsight cluster must be deployed into one of the VNETs allowed in the Cosmos DB firewall. This will ensure that the traffic from cluster VMs can reach the SQL DB.

About HDInsight


Azure HDInsight is Microsoft’s premium managed offering for running open source workloads on Azure. Today, we are excited to announce several new capabilities across a wide range of OSS frameworks.

Azure HDInsight powers some of the top customer’s mission critical applications ranging in a wide variety of sectors including, manufacturing, retail education, nonprofit, government, healthcare, media, banking, telecommunication, insurance and many more industries ranging in use cases from ETL to Data Warehousing, from Machine Learning to IoT and many more.

Thursday, 23 August 2018

Logic Apps, Flow connectors will make Automating Video Indexer simpler than ever

Video Indexer recently released a new and improved Video Indexer V2 API. This RESTful API supports both server-to-server and client-to-server communication and enables Video Indexer users to integrate video and audio insights easily into their application logic, unlocking new experiences and monetization opportunities.

To make the integration even easier, we also added new Logic Apps and Flow connectors that are compatible with the new API. Using the new connectors, you can now set up custom workflows to effectively index and extract insights from a large amount of video and audio files, without writing a single line of code! Furthermore, using the connectors for your integration gives you better visibility on the health of your flow and an easy way to debug it.

To help you get started quickly with the new connectors, we’ve added Microsoft Flow templates that use the new connectors to automate extraction of insights from videos. In this blog, we will walk you through those example templates.

Upload and index your video automatically


This scenario is comprised of two different flows that work together. The first flow is triggered when a new file is added to a designated folder in a OneDrive account. It uploads the new file to Video Indexer with a callback URL to send a notification once the indexing operation completes. The second flow is triggered based on the callback URL and saves the extracted insights back to a JSON file in OneDrive. The reason that two flows are used is to support async upload and indexing of larger files effectively.

Setting up the file upload flow


Navigate to the first template page.

To set up this flow, you will need to provide your Video Indexer API Key and OneDrive credentials.

Azure Certification, Azure Guides, Azure Learning, Azure Study Materials, Azure Tutorial and Materials

Once both keys are provided, green marks will appear near your accounts, and you can click to continue to the flow itself and configure it for your needs:

Select a folder that you will place videos in:

Azure Certification, Azure Guides, Azure Learning, Azure Study Materials, Azure Tutorial and Materials

Fill in your account Location and ID to get the Video Indexer account token and call the file upload request.

Azure Certification, Azure Guides, Azure Learning, Azure Study Materials, Azure Tutorial and Materials

For file upload, you can decide to use the default values, or click on the connector to add additional settings. Notice that you will leave the callback URL empty for now … you’ll add it only after finishing the second template, where the callback URL is created).

Azure Certification, Azure Guides, Azure Learning, Azure Study Materials, Azure Tutorial and Materials

Click “Save flow,” and let’s move on to configure the second flow, to extract the insights once the upload completed.

Setting up the JSON extraction flow


Navigate to the second template page.

To set up this flow, you will need to provide your Video Indexer API Key and OneDrive credentials. You will need to update the same parameters as you did for the first flow.

Azure Certification, Azure Guides, Azure Learning, Azure Study Materials, Azure Tutorial and Materials

Then, continue to configure the flow.

Fill in your account Location and ID to get your Video Indexer Account Token and the Indexing result.

Azure Certification, Azure Guides, Azure Learning, Azure Study Materials, Azure Tutorial and Materials

Azure Certification, Azure Guides, Azure Learning, Azure Study Materials, Azure Tutorial and Materials

And select the OneDrive folder to save the insights to. You can also edit the default parameter, if you would like to change the name of the JSON file containing the insights.

Azure Certification, Azure Guides, Azure Learning, Azure Study Materials, Azure Tutorial and Materials

Click “Save flow.”

Once the flow is saved, a URL is created in the trigger. Copy the URL from the trigger:

Azure Certification, Azure Guides, Azure Learning, Azure Study Materials, Azure Tutorial and Materials

Now, go back to the first flow created and paste the URL in the “Upload video and index” operation under the Callback URL parameter:

Azure Certification, Azure Guides, Azure Learning, Azure Study Materials, Azure Tutorial and Materials

Make sure both templates are saved, and you’re good to go!

Try out your newly created flow by adding a video to your OneDrive folder, and go back a few minutes later to see that the insights appear in the destination folder.

Endless integration possibilities

And this is just one example! You can use the new connector for any API call provided by Video Indexer. to upload and retrieve insights, translate the results, get delightful embeddable widgets and even customize your models. Additionally, you can choose to trigger those actions based on different sources like updates to file repositories or emails sent, and to have the results update to our relevant infrastructure or application, or even to generate any number of action items. And you can do all that for large number of files and without coding or having to do a repetitive manual work. Go ahead and try it now for any flow that works for your business needs, it's easy.

Tuesday, 21 August 2018

Azure Block Blob Storage Backup

Azure Blob Storage is Microsoft's massively scalable cloud object store. Blob Storage is ideal for storing any unstructured data such as images, documents and other file types.

The data in Azure Blob Storage is always replicated to ensure durability and high availability. Azure Storage replication copies your data so that it is protected from planned and unplanned events ranging from transient hardware failures, network or power outages, massive natural disasters, and so on. You can choose to replicate your data within the same data center, across zonal data centers within the same region, and even across regions.

Although Blob storage supports replication out-of-box, it's important to understand that the replication of data does not protect against application errors. Any problems at the application layer are also committed to the replicas that Azure Storage maintains. For this reason, it can be important to maintain backups of blob data in Azure Storage.

Currently Azure Blob Storage doesn’t offer an out-of-the-box solution for backing up block blobs. In this blog post, I will design a back-up solution that can be used to perform weekly full and daily incremental back-ups of storage accounts containing block blobs for any create, replace, and delete operations. The solution also walks through storage account recovery should it be required.

The solution makes use of the following technologies to achieve this back-up functionality:

In our scenario, we will publish events to Azure Storage Queues to support daily incremental back-ups.

◈ Azcopy – AzCopy is a command-line utility designed for copying data to/from Microsoft Azure Blob, File, and Table storage, using simple commands designed for optimal performance. You can copy data between a file system and a storage account, or between storage accounts. In our scenario we will use AzCopy to achieve full back-up functionality and will use it to copy the content from one storage account to another storage account.

◈ EventGrids – Azure Storage events allow applications to react to the creation and deletion of blobs and it does so without the need for complicated code or expensive and inefficient polling services. Instead, events are pushed through Azure Event Grids to subscribers such as Azure Functions, Azure Logic Apps, or Azure Storage Queues.

◈ Event Grid extension –  To store the storage events to Azure Queue storage. At the time of writing this blog, this feature is in preview. To use it, you must install the Event Grid extension for Azure CLI. You can install it with az extension add --name eventgrid.

◈ Docker Container – To host the listener to read the events from Azure Queue Storage. Please note the sample code given with the blog is a .Net core application and can be hosted on a platform of your choice and it has no dependency on docker containers.

◈ Azure Table Storage – This is used to keep the events metadata of incremental back-up and used while performing the re-store. Please note, you can have the events metadata stored in a database of your choice like Azure SQL, Cosmos DB etc. Changing the database will require code changes in the samples solution.

Introduction


Based on my experience in the field, I have noticed that most customers require full and incremental backups taken on specific schedules. Let’s say you have a requirement to have weekly full and daily incremental backups. In the case of a disaster, you need a capability to restore the blobs using the backup sets.

High Level architecture/data flow


Here is the high-level architecture and data flow of the proposed solution to support incremental back-up.

Azure Certification, Azure Guides, Azure Learning, Azure Tutorial and Materials

Azure Certification, Azure Guides, Azure Learning, Azure Tutorial and Materials
Here is the detailed logic followed by the .Net Core based listener while copying the data for an incremental backup from the source storage account to the destination storage account.

Azure Certification, Azure Guides, Azure Learning, Azure Tutorial and Materials
While performing the back-up operation, the listener performs the following steps:image

1. Creates a new blob container in the destination storage account for every year like “2018”.

2. Creates a logical sub folder for each week under the year container like “wk21”. In case there are no files created or deleted in wk21 no logical folder will be created. CalendarWeekRule.FirstFullWeek has been used to determine the week number.

3. Creates a logical sub folder for each day of the week under the year and week container like dy0, dy1, dy2. In case there are no files created or deleted for a day no logical folder will be created for that day.

4. While copying the files, the listener changes the source container names to logical folder names in the destination storage account.

Example:

SSA1 (Source Storage Account) -> Images (Container) –> Image1.jpg

Will move to:

DSA1 (Destination Storage Account) -> 2018 (Container)-> WK2 (Logical Folder) -> dy0 (Logical Folder) -> Images (Logical Folder) –> Image1.jpg

Here are the high-level steps to configure incremental backup

1. Create a new storage account (destination) where you want to take the back-up.
2. Create an event grid subscription for the storage account (source) to store the create/replace and delete events into Azure Storage queue. The command to set up the subscription is provided on the samples site.
3. Create a table in Azure Table storage where the event grid events will finally be stored by the .Net Listener.
4. Configure the .Net Listener (backup.utility) to start taking the incremental backup. Please note there can be as many as instances of this listener as needed to perform the backup, based the load on your storage account.

Here are the high-level steps to configure full backup

1. Schedule AZCopy on the start of week, i.e., Sunday 12:00 AM to move the complete data from the source storage account to the destination storage account.

2. Use AZcopy to move the data in a logical folder like “fbkp” to the corresponding year container and week folder in the destination storage account.

3. You can schedule AZCopy on a VM, on a Jenkins job, etc., depending on your technology landscape.

In case of a disaster, the solution provides an option to restore the storage account by choosing one weekly full back-up as a base and applying the changes on top of it from an incremental back-up. Please note the suggested option is one of the options: you may choose to restore by applying only the logs from incremental backup, but it can take longer depending on the period of re-store.

Here are the high-level steps to configure restore

1. Create a new storage account (destination) where the data needs to be restored.

2. Move data from full back up folder “fbkp” using AZCopy to the destination storage account.

3. Initiate the incremental restore process by providing the start date and end date to restore.utility. Details on the configuration is provided on samples site.

For example: Restore process reads the data from the table storage for the period 01/08/2018 to 01/10/2018 sequentially to perform the restore.

For each read record, the restore process adds, updates, or deletes the file in the destination storage account.

Supported Artifacts


Find source code and instructions to setup the back-up solution.

Considerations/limitations


◈ Blob Storage events are available in Blob Storage accounts and in General Purpose v2 storage accounts only. Hence the storage account configured for the back-up should either be Blob storage account or General Purpose V2 account.

◈ Blob storage events are fired for create, replace and deletes. Hence, modifications to the blobs are not supported at this point of time but it will be eventually supported.

◈ In case a user creates a file at T1 and deletes the same file at T10 and the backup listener has not copied that file, you won’t be able to restore that file from the backup. For these kind of scenarios, you can enable soft delete on your storage account and either modify the solution to support restoring from soft delete or recover these missed files manually.

◈ Since restore will execute the restore operation by reading the logs sequentially it can take considerable amount of time to complete. The actual time can span hours or days and the correct duration can be determined only by performing a test.

◈ AZCopy to be used to perform the weekly full back up. The duration of execution will depend on the data size and can span hours or days.

Saturday, 18 August 2018

Azure HDInsight Interactive Query: simplifying big data analytics architecture

Traditional approach to fast interactive BI


Deep analytical queries processed on Hadoop systems have traditionally been slow. MapReduce jobs or hive queries are used for heavy processing of large datasets, however, not suitable for the fast response time required by interactive BI usage.

Faced with user dissatisfaction due to lack of query interactivity, data architects used techniques such as building OLAP cubes on top of Hadoop. An OLAP cube is a mechanism to store all the different dimensions, measures and hierarchies up front. Processing the cube usually takes place at the pre-specified interval. Post processing, results are available in advance, so once the BI tool queries the cube it just needs to locate the result, thereby limiting the query response time and making it a fast and interactive one. Since all measures get pre-aggregated by all levels and categories in the dimension, it is highly suitable for interactivity and fast response time. This approach is especially suitable if you need to light up summary views.

Azure HDInsight Interactive Query, Azure Guides, Azure Learning, Azure Certification, Azure Analytics

Above approach works for certain scenarios but not all. It tends to break down easily with large big data implementations, especially with use cases where many power users and data scientists are writing many ad-hoc queries.

Here are the key challenges:

◈ OLAP cubes require precomputations for creating aggregates which introduces latency. Businesses across all industries are demanding more from their reporting and analytics infrastructure within shorter business timeframes. OLAP cubes can’t deliver real-time analysis.

◈ In big data analytics, precomputation puts heavy burden on underlying Hadoop system creating unsustainable pressure on entire big data pipeline which severely hampers performance, reliability and stability of entire pipeline.

◈ This type of architecture forces large dataset movement between different systems which works well at small scale. However, falls apart at large data scale. Keeping data hot and fresh across multiple tiers is challenging.

◈ Power users and data scientists requires a lot more agility and freedom in terms of their ability to experiment using sophisticated ad-hoc queries that puts additional burden on overall system.

Azure HDInsight Interactive query overview


One of the most exciting new features of Hive 2 is Low Latency Analytics Processing (LLAP), which produces significantly faster queries on raw data stored in commodity storage systems such as Azure Blob store or Azure Data Lake Store.

This reduces the need to introduce additional layers to enable fast interactive queries.

Key benefits of introducing Interactive Query in your big data BI architecture:


Extremely fast Interactive Queries: Intelligent caching and optimizations in Interactive Query produces blazing-fast query results on remote cloud storage, such as Azure Blob and Azure Data Lake Store. Interactive Query enables data analysts to query data interactively in the same storage where data is prepared, eliminating the need for moving data from storage to another analytical engine for analytical needs.

HDInsight Interactive Query (LLAP) leverages set of persistent daemons that execute fragments of Hive queries. Query execution on LLAP is very similar to Hive without LLAP, except that worker tasks run inside LLAP daemons, and not in containers.

Azure HDInsight Interactive Query, Azure Guides, Azure Learning, Azure Certification, Azure Analytics

Lifecycle of a query: After client submits the JDBC query, query arrives at Hive Server 2 Interactive which is responsible for query planning, optimization, as well as security trimming. Since each query is submitted via Hive Server 2, it becomes the single place to enforce security policies.

Azure HDInsight Interactive Query, Azure Guides, Azure Learning, Azure Certification, Azure Analytics

File format versatility and Intelligent caching: Fast analytics on Hadoop have always come with one big catch: they require up-front conversion to a columnar format like ORCFile, Parquet or Avro, which is time-consuming, complex and limits your agility.

With Interactive Query Dynamic Text Cache, which converts CSV or JSON data into optimized in-memory format on-the-fly, caching is dynamic, so the queries determine what data is cached. After text data is cached, analytics run just as fast as if you had converted it to specific file formats.

Interactive Query SSD cache combines RAM and SSD into a giant pool of memory with all the other benefits the LLAP cache brings. With the SSD Cache, a typical server profile can cache 4x more data, letting you process larger datasets or supporting more users. Interactive query cache is aware of the underlying data changes in remote store (Azure Storage). If underlying data changes and user issues a query, updated data will be loaded in the memory without requiring any additional user steps.

Azure HDInsight Interactive Query, Azure Guides, Azure Learning, Azure Certification, Azure Analytics

Concurrency: With the introduction of much improved fine-grain resource management, preemption and sharing cached data across queries and users, Interactive Query [Hive on LLAP] makes it better for concurrent users.

In addition, HDInsight supports creating multiple clusters on shared Azure storage and Hive metastore helps in achieving a high degree of concurrency, so you can scale the concurrency by simply adding more cluster nodes or adding more clusters pointing to same underlying data and metadata.

Simplified and scalable architecture with HDInsight Interactive Query


By introducing Interactive Query to your architecture, you can now route power users, data scientists, and data engineers to hit Interactive Query directly. This architectural improvement will reduce the overall burden from the BI system as well as increases user satisfaction due to fast interactive query response as well as increases flexibility to run ad-hoc queries at will.

Azure HDInsight Interactive Query, Azure Guides, Azure Learning, Azure Certification, Azure Analytics

In above described architecture, users who wants to see the summary views can still be served with OLAP cubes. However, all other users leverage Interactive Query for submitting their queries.

Security model


Like Hadoop and Spark clusters, HDInsight Interactive Query leverages Azure Active Directory and Apache Ranger to provide fine-grain access control and auditing. 

In HDInsight Interactive Query, access restriction logic is pushed down into the Hive layer and Hive applies the access restrictions every time data access is attempted. This helps simplify authoring of the Hive queries and provides seamless behind-the-scenes enforcement without having to add this logic to the predicate of the query. 

User adoption with familiar tools


In big data analytics, organizations are increasingly concerned that their end users aren’t getting enough value out of the analytics systems because it is often too challenging and requires using unfamiliar and difficult-to-learn tools to run the analytics. HDInsight Interactive Query addresses this issue by requiring minimal to no new user training to get insight from the data. Users can write SQL queries (hql) in the tools they already use and love the most. HDInsight Interactive query out of the box supports BI tools such as Visual Studio Code, Power BI, Apache Zeppelin, Visual Studio, Ambari Hive View, Beeline, and Hive ODBC.

Azure HDInsight Interactive Query, Azure Guides, Azure Learning, Azure Certification, Azure Analytics

Built to complement Spark, Hive, Presto, and other big data engines


HDInsight Interactive query is designed to work well with popular big data engines such as Apache Spark, Hive, Presto, and more. This is especially useful because your users may choose any one of these tools to run their analytics. With HDInsight’s shared data and metadata architecture, users can create multiple clusters with the same or different engine pointing to same underlying data and metadata. This is very powerful concept as you are no longer bounded by one technology for analytics.

Azure HDInsight Interactive Query, Azure Guides, Azure Learning, Azure Certification, Azure Analytics

Try HDInsight now


We hope you will take full advantage fast query capabilities of HDInsight Interactive Query. We are excited to see what you will build with Azure HDInsight. Read this developer guide and follow the quick start guide to learn more about implementing these pipelines and architectures on Azure HDInsight.

About HDInsight


Azure HDInsight is Microsoft’s premium managed offering for running open source workloads on Azure. Azure HDInsight powers mission critical applications ranging in a wide variety of sectors including, manufacturing, retail education, nonprofit, government, healthcare, media, banking, telecommunication, insurance, and many more industries ranging in use cases from ETL to Data Warehousing, from Machine Learning to IoT, and more.

Thursday, 16 August 2018

New customizations in Azure Migrate to support your cloud migration

You can accelerate your cloud migration using intelligent migration assessment services like Azure Migrate. Azure Migrate is a generally available service, offered at no additional charge, that helps you plan your migration to Azure.

Azure Migrate discovers servers in your on-premises environment and assesses each discovered server’s readiness to run as an IaaS VM in Azure. In addition to Azure readiness, it helps you identify the right VM size in Azure after considering the utilization history of the on-premises VM. Azure Migrate also provides cost analysis for running the on-premises VMs in Azure. Additionally, if you have legacy applications, identifying the servers that constitute your application can become very complex. Azure Migrate helps you visualize dependencies of your on-premises VMs, so you can create high-confidence move groups and ensure that you are not leaving anything behind when you are moving to Azure.

Azure Guides, Azure Learning, Azure Certification, Azure Migration, Azure Study Materials

Azure Migrate currently supports discovery and assessment of VMware virtualized Windows and Linux VMs; support for Hyper-V will be enabled in future.

When it comes to migration planning, every organization is unique and there is no one-size-fits-all solution. Each organization will have its own needs and in this blog post, I am going to talk about some of the new features added in Azure Migrate that can help you effectively plan your migration to cloud, based on your own needs.

Here is a summary of the features recently enabled in Azure Migrate allowing you to customize the assessments to meet your migration needs.

◈ Reserved Instances (RI): Azure Migrate now allows you to plan migration to Azure with your virtual machines running in Azure VM Reserved Instances (RIs). Azure Migrate will model the cost advantages of RIs in the assessments. You can customize your assessment to include use of RIs by editing the assessment settings.

◈ VM series: The Azure Migrate assessment provides a new control to specify the VM series to be used for size recommendations. Each VM series in Azure is designed for a certain kind of workload, for example, if you have a production workload which is IO intensive, you may not want to move the VMs to the A-series VMs in Azure, which are more suitable for entry level workloads. Using Azure Migrate, you can now apply your organization preferences and get recommendations based on your choices for the selected VM series.

◈ Storage type: Many organizations want to move their VMs to VM sizes in Azure that provide a single instance VM SLA of > 99.9 percent. You can achieve the same if all disks attached to the VM are premium disks. If you have such SLA requirements, you can use the storage type property in an assessment and specify premium disks as your storage type to ensure that the VM size recommendations are done accordingly.

◈ VM uptime: Not all workloads need to run 24x7 on the cloud. For example, if you have Dev-Test VMs that you only run 10 hours a day and only five days a week, you can now use the VM uptime property in assessment to specify the uptime details of the VMs and the cost estimation of running the VMs in Azure will be done accordingly.

◈ Azure Government: If you are planning to migrate your workloads to the Azure Government cloud, you can now use Azure Migrate to plan your migrations for Azure Government as a target location. In addition to Azure Government, Azure Migrate already supports more than 30 different regions as assessment target location.

◈ Windows Server 2008 (32-bit & 64-bit): If you have Windows Server 2008 VMs in your environment, you can now leverage the option to get free extended security updates by migrating these VMs to Azure. Azure Migrate will now help you identify such VMs as Azure ready VMs which can be migrated to Azure using Azure Site Recovery.

Azure Guides, Azure Learning, Azure Certification, Azure Migration, Azure Study Materials

With the enhanced features, Azure Migrate provides you a lot of power and flexibility to customize your migration plan based on your own requirements.

Once you are ready with your migration plan, you can use services like Azure Site Recovery and Database Migration Service to migrate your workloads to Azure. Use the Azure migration center to learn when to use these services in your selected migration strategy.

We are continuously improving Azure Migrate based on the feedback that we have been receiving from all of you. If you have any ideas or see any gaps in Azure Migrate, we would love to hear it from you.

Tuesday, 14 August 2018

What is Artificial Intelligence?

It has been said that Artificial Intelligence will define the next generation of software solutions. If you are even remotely involved with technology, you will almost certainly have heard the term with increasing regularity over the last few years. It is likely that you will also have heard different definitions for Artificial Intelligence offered, such as:

“The ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent beings.” – Encyclopedia Britannica

“Intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans.” – Wikipedia

How useful are these definitions? What exactly are “tasks commonly associated with intelligent beings”? For many people, such definitions can seem too broad or nebulous. After all, there are many tasks that we can associate with human beings! What exactly do we mean by “intelligence” in the context of machines, and how is this different from the tasks that many traditional computer systems are able to perform, some of which may already seem to have some level of intelligence in their sophistication? What exactly makes the Artificial Intelligence systems of today different from sophisticated software systems of the past?

Artificial Intelligence, Azure Certification, Azure Guides, Azure Learning, Azure Tutorial and Materials

It could be argued that any attempt to try to define “Artificial Intelligence” is somewhat futile, since we would first have to properly define “intelligence”, a word which conjures a wide variety of connotations. Nonetheless, this article attempts to offer a more accessible definition for what passes as Artificial Intelligence in the current vernacular, as well as some commentary on the nature of today’s AI systems, and why they might be more aptly referred to as “intelligent” than previous incarnations.

Firstly, it is interesting and important to note that the technical difference between what used to be referred to as Artificial Intelligence over 20 years ago and traditional computer systems, is close to zero. Prior attempts to create intelligent systems known as expert systems at the time, involved the complex implementation of exhaustive rules that were intended to approximate intelligent behavior. For all intents and purposes, these systems did not differ from traditional computers in any drastic way other than having many thousands more lines of code. The problem with trying to replicate human intelligence in this way was that it requires far too many rules and ignores something very fundamental to the way intelligent beings make decisions, which is very different from the way traditional computers process information.

Let me illustrate with a simple example. Suppose I walk into your office and I say the words “Good Weekend?” Your immediate response is likely to be something like “yes” or “fine thanks”. This may seem like very trivial behavior, but in this simple action you will have immediately demonstrated a behavior that a traditional computer system is completely incapable of. In responding to my question, you have effectively dealt with ambiguity by making a prediction about the correct way to respond. It is not certain that by saying “Good Weekend” I actually intended to ask you whether you had a good weekend. Here are just a few possible intents behind that utterance:

◈ Did you have a good weekend?
◈ Weekends are good (generally).
◈ I had a good weekend.
◈ It was a good football game at the weekend, wasn’t it?
◈ Will the coming weekend be a good weekend for you?

And more.

Artificial Intelligence, Azure Certification, Azure Guides, Azure Learning, Azure Tutorial and Materials

The most likely intended meaning may seem obvious, but suppose that when you respond with “yes”, I had responded with “No, I mean it was a good football game at the weekend, wasn’t it?”. It would have been a surprise, but without even thinking, you will absorb that information into a mental model, correlate the fact that there was an important game last weekend with the fact that I said “Good Weekend?” and adjust the probability of the expected response for next time accordingly so that you can respond correctly next time you are asked the same question. Granted, those aren’t the thoughts that will pass through your head! You happen to have a neural network (aka “your brain”) that will absorb this information automatically and learn to respond differently next time.

The key point is that even when you do respond next time, you will still be making a prediction about the correct way in which to respond. As before, you won’t be certain, but if your prediction fails again, you will gather new data, which leads to my suggested definition of Artificial Intelligence, as it stands today:

“Artificial Intelligence is the ability of a computer system to deal with ambiguity, by making predictions using previously gathered data, and learning from errors in those predictions in order to generate newer, more accurate predictions about how to behave in the future”.

This is a somewhat appropriate definition of Artificial Intelligence because it is exactly what AI systems today are doing, and more importantly, it reflects an important characteristic of human beings which separates us from traditional computer systems: human beings are prediction machines. We deal with ambiguity all day long, from very trivial scenarios such as the above, to more convoluted scenarios that involve playing the odds on a larger scale. This is in one sense the essence of reasoning. We very rarely know whether the way we respond to different scenarios is absolutely correct, but we make reasonable predictions based on past experience.

Just for fun, let’s illustrate the earlier example with some code in R! If you are not familiar with R, but would like to follow along. First, lets start with some data that represents information in your mind about when a particular person has said “good weekend?” to you.

Artificial Intelligence, Azure Certification, Azure Guides, Azure Learning, Azure Tutorial and Materials

In this example, we are saying that GoodWeekendResponse is our score label (i.e. it denotes the appropriate response that we want to predict). For modelling purposes, there have to be at least two possible values in this case “yes” and “no”. For brevity, the response in most cases is “yes”.

We can fit the data to a logistic regression model:

library(VGAM)
greetings=read.csv('c:/AI/greetings.csv',header=TRUE)
fit <- vglm(GoodWeekendResponse~., family=multinomial, data=greetings)

Now what happens if we try to make a prediction on that model, where the expected response is different than we have previously recorded? In this case, I am expecting the response to be “Go England!”. Below, some more code to add the prediction. For illustration we just hardcode the new input data, output is shown in bold:

response <- data.frame(FootballGamePlayed="Yes", WorldCup="Yes", EnglandPlaying="Yes", GoodWeekendResponse="Go England!!")
greetings <- rbind(greetings, response)
fit <- vglm(GoodWeekendResponse~., family=multinomial, data=greetings)
prediction <- predict(fit, response, type="response")
prediction
index <- which.max(prediction)
df <- colnames(prediction)
df[index]

            No Yes Go England!!
1 3.901506e-09 0.5          0.5
> index <- which.max(prediction)
> df <- colnames(prediction)
> df[index]
[1] "Yes"

The initial prediction “yes” was wrong, but note that in addition to predicting against the new data, we also incorporated the actual response back into our existing model. Also note, that the new response value “Go England!” has been learnt, with a probability of 50 percent based on current data. If we run the same piece of code again, the probability that “Go England!” is the right response based on prior data increases, so this time our model chooses to respond with “Go England!”, because it has finally learnt that this is most likely the correct response!

            No       Yes Go England!!
1 3.478377e-09 0.3333333    0.6666667
> index <- which.max(prediction)
> df <- colnames(prediction)
> df[index]
[1] "Go England!!"

Do we have Artificial Intelligence here? Well, clearly there are different levels of intelligence, just as there are with human beings. There is, of course, a good deal of nuance that may be missing here, but nonetheless this very simple program will be able to react, with limited accuracy, to data coming in related to one very specific topic, as well as learn from its mistakes and make adjustments based on predictions, without the need to develop exhaustive rules to account for different responses that are expected for different combinations of data. This is this same principle that underpins many AI systems today, which, like human beings, are mostly sophisticated prediction machines. The more sophisticated the machine, the more it is able to make accurate predictions based on a complex array of data used to train various models, and the most sophisticated AI systems of all are able to continually learn from faulty assertions in order to improve the accuracy of their predictions, thus exhibiting something approximating human intelligence.

Machine learning


You may be wondering, based on this definition, what the difference is between machine learning and Artificial intelligence? After all, isn’t this exactly what machine learning algorithms do, make predictions based on data using statistical models? This very much depends on the definition of machine learning, but ultimately most machine learning algorithms are trained on static data sets to produce predictive models, so machine learning algorithms only facilitate part of the dynamic in the definition of AI offered above. Additionally, machine learning algorithms, much like the contrived example above typically focus on specific scenarios, rather than working together to create the ability to deal with ambiguity as part of an intelligent system. In many ways, machine learning is to AI what neurons are to the brain. A building block of intelligence that can perform a discreet task, but that may need to be part of a composite system of predictive models in order to really exhibit the ability to deal with ambiguity across an array of behaviors that might approximate to intelligent behavior.

Practical applications


There are a number of practical advantages in building AI systems, but as discussed and illustrated above, many of these advantages are pivoted around “time to market”. AI systems enable the embedding of complex decision making without the need to build exhaustive rules, which traditionally can be very time consuming to procure, engineer and maintain. Developing systems that can “learn” and “build their own rules” can significantly accelerate organizational growth.

Microsoft’s Azure cloud platform offers an array of discreet and granular services in the AI and Machine Learning domain, that allow AI developers and Data Engineers to avoid re-inventing wheels, and consume re-usable APIs. These APIs allow AI developers to build systems which display the type of intelligent behavior discussed above.

Saturday, 11 August 2018

New Azure CosmosDB JavaScript SDK 2.0 now in public preview

The Azure Cosmos DB team is excited to announce version 2.0 RC of the JavaScript SDK for SQL API, now in public preview!

Azure Certification, Azure Learning, Azure Tutorial and Materials, Azure SDK

What is Azure Cosmos DB?


Azure Cosmos DB is a globally distributed, multi-model database service. It offers turnkey global distribution, guarantees single-digit millisecond latencies at the 99th percentile, and elastic scaling of throughput and storage.

For the SQL API, we support a JavaScript SDK to enable development against Azure Cosmos DB from JavaScript and Node.js projects. Version 2.0 of the SDK is written completely in TypeScript, and we’ve redesigned the object model and added support for promises. Let’s dive into these updates.

New object model


Based on user feedback, we’ve redesigned the object model to make it easier to interact with and perform operations against Cosmos DB. 

If you’re familiar with the previous version of the JavaScript SDK, you’ve likely noticed that the entire API surface hangs off DocumentDBClient. While the previous design makes it easy to find the entry point for methods, it also came at the cost of a cluttered IntelliSense experience, as seen below.

Azure Certification, Azure Learning, Azure Tutorial and Materials, Azure SDK

We also got feedback that it was difficult to do operations off databases, collections, or documents since each method needed to reference the URL of that resource. 

To address this, we’ve created a new top level CosmosClient class to replace DocumentDBClient, and split up its methods into modular Database, Container, and Items classes.

For example, in the new SDK, you can create a new database, container, and add an item to it, all in 10 lines of code!

Azure Certification, Azure Learning, Azure Tutorial and Materials, Azure SDK

This is called a “builder” pattern, and it allows us to reference resources based on the resource hierarchy of Cosmos DB, which is similar to the way your brain thinks about Cosmos DB. For example, to create an item, we first reference its database and container, and call items.create().

Containers and Items


In addition, because Cosmos DB supports multiple API models, we’ve introduced the concepts of Container and Item into the SDK, which replace the previous Collection and Document concepts. In other words, what was previously known as a “Collection” is now called a “Container.”

An account can have one or more databases, and a database consists of one or more containers. Depending on the API, the container is projected as either a collection (SQL or Mongo API), graph (Gremlin API), or table (Tables API).

Azure Certification, Azure Learning, Azure Tutorial and Materials, Azure SDK

Support for promises


Finally, we’ve added full support for promises so you no longer have write custom code to wrap the SDK yourself. Now, you can use async/await directly against the SDK.

To see the difference, to create a new database, collection, and add a document in the previous SDK, you would have to do something like this:

Azure Certification, Azure Learning, Azure Tutorial and Materials, Azure SDK

In the new SDK, you can simply await the calls to Cosmos DB directly from inside an async function, as seen below.

We’ve also added a convenience method createIfNotExists() for databases and containers, which wraps the logic to read the database, check the status code, and create it if it doesn’t exist.

Here’s the same functionality, using the new SDK:

Azure Certification, Azure Learning, Azure Tutorial and Materials, Azure SDK

Open source model


The Azure Cosmos DB JavaScript SDK is open source, and our team is planning to do all development in the open. To that end, we will be logging issues, tracking feedback, and accepting PR’s in GitHub.

Thursday, 9 August 2018

Azure HDInsight Interactive Query: Ten tools to analyze big data faster

Customers use HDInsight Interactive Query (also called Hive LLAP, or Low Latency Analytical Processing) to query data stored in Azure storage & Azure Data Lake Storage in super-fast manner. Interactive query makes it easy for developers and data scientist to work with the big data using BI tools they love the most. HDInsight Interactive Query supports several tools to access big data in easy fashion. In this blog we have listed most popular tools used by our customers:

Microsoft Power BI


Microsoft Power BI Desktop has a native connector to perform direct query against  HDInsight Interactive Query cluster. You can explore and visualize the data in interactive manner.

Azure HDInsight, Azure Big Data, Azure Certification, Azure Study Materials

Apache Zeppelin


Apache Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. You can access Interactive Query from Apache Zeppelin using a JDBC interpreter.

Azure HDInsight, Azure Big Data, Azure Certification, Azure Study Materials

Visual Studio Code


With HDInsight Tools for VS Code, you can submit interactive queries as well at look at job information in HDInsight interactive query clusters.

Azure HDInsight, Azure Big Data, Azure Certification, Azure Study Materials

Visual Studio


Visual Studio integration helps you create and query tables in visual fashion. You can create a Hive tables on top of data stored in Azure Data Lake Storage or Azure Storage. 

Azure HDInsight, Azure Big Data, Azure Certification, Azure Study Materials

Ambari Hive View


Hive View is designed to help you author, optimize, and execute queries. With Hive Views you can:

◈ Browse databases.
◈ Write queries or browse query results in full-screen mode, which can be particularly helpful with complex queries or large query results.
◈ Manage query execution jobs and history.
◈ View existing databases, tables, and their statistics.
◈ Create/upload tables and export table DDL to source control.
◈ View visual explain plans to learn more about query plan.

Azure HDInsight, Azure Big Data, Azure Certification, Azure Study Materials

Beeline


Beeline is a Hive client that is included on the head nodes of HDInsight cluster. Beeline uses JDBC to connect to HiveServer2, a service hosted on HDInsight cluster. You can also use Beeline to access Hive on HDInsight remotely over the internet.

Azure HDInsight, Azure Big Data, Azure Certification, Azure Study Materials

Hive ODBC


Open Database Connectivity (ODBC) API, a standard for the Hive database management system, enables ODBC compliant applications to interact seamlessly with Hive through a standard interface.

Azure HDInsight, Azure Big Data, Azure Certification, Azure Study Materials

Tableau


Tableau is very popular data visualization tool. Customers can build visualizations by connecting Tableau with HDInsight interactive Query.

Azure HDInsight, Azure Big Data, Azure Certification, Azure Study Materials

Azure HDInsight, Azure Big Data, Azure Certification, Azure Study Materials

Apache DBeaver


Apache DBeaver is SQL client and a database administration tool. It is free and open-source (ASL). DBeaver use JDBC API to connect with SQL based databases.

Azure HDInsight, Azure Big Data, Azure Certification, Azure Study Materials

Excel


Microsoft Excel is the most popular data analysis tool and connecting it with big data is even more interesting for our customers. Azure HDInsight Interactive Query cluster can be integrated with Excel with ODBC connectivity.

Azure HDInsight, Azure Big Data, Azure Certification, Azure Study Materials