Tuesday, 30 April 2019

Hybrid storage performance comes to Azure

When it comes to adding a performance tier between compute and file storage, Avere Systems has led the way with its high-performance caching appliance known as the Avere FXT Edge Filer. This week at NAB, attendees will get a first look at the new Azure FXT Edge Filer, now with even more performance, memory, SSD, and support for Azure Blob. Since Microsoft’s acquisition of Avere last March, we’ve been working to provide an exciting combination of performance and efficiency to support hybrid storage architectures with the Avere appliance technology.

Linux performance over NFS


Microsoft is committed to meeting our customers where we’re needed. The launch of the new Azure FXT Edge Filer is yet another example of this as we deliver high-throughput and low-latency NFS to applications running on Linux compute farms. The Azure FXT Edge Filer solves latency issues between Blob storage and on-premises computing with built-in translation from NFS to Blob. It sits at the edge of your hybrid storage environment closest to on-premises compute, caching the active data to reduce bottlenecks. Let’s look at common applications:

◉ Active Archives in Azure Blob – When Azure Blob is a target storage location for aging, but not yet cold data, the Azure FXT Edge Filer accelerates access to files by creating an on-premises cache of active data.

Azure Tutorials and Materials, Azure Guides, Azure Certifications, Azure Learning, Azure Study Materials

◉ WAN Caching – Latency across wide area networks (WANs) can slow productivity. The Azure FXT Edge Filer caches active data closest to the users and hides that latency as they reach for data stored in data centers or colos. Remote office engineers, artists, and other power users achieve fast access to files they need, and meanwhile backup, mirroring, and other data protection activities run seamlessly in the core data center.

Azure Tutorials and Materials, Azure Guides, Azure Certifications, Azure Learning, Azure Study Materials

◉ NAS Optimization – Many high-performance computing environments have large NetApp or Dell EMC Isilon network-attached storage (NAS) arrays. When demand is at its peak, these storage systems can become bottlenecks. The Azure FXT Edge Filer optimizes these NAS systems by caching data closest to the compute, separating performance from capacity and better delivering both.

Azure Tutorials and Materials, Azure Guides, Azure Certifications, Azure Learning, Azure Study Materials
When datasets are large, hybrid file-storage caching provides performance and flexibility that are needed to keep core operations productive.

Azure FXT Edge Filer model specifications


We are currently previewing the FXT 6600 model at customer sites, with a second FXT 6400 model becoming available with general availability. The FXT 6600 is an impressive top-end model with 40 percent more read performance and double the memory of the FXT 5850. The FXT 6400 is a great mid-range model for customers who don’t need as much memory and SSD capacity or are looking to upgrade FXT 5600 and FXT 5400 models at an affordable price.

Azure Tutorials and Materials, Azure Guides, Azure Certifications, Azure Learning, Azure Study Materials


Azure FXT Edge Filer – 6600 Model Azure FXT Edge Filer – 6400 Model 
Highest performance, largest cache High-performance, large cache
Specifications per node:  Specifications per note: 
1536 GB DRAM  768 GB DRAM 
25.6 TB SSD  12.8 TB SSD 
6x25/10Gb + 2x1Gb Network Ports  6x25/10Gb + 2x1Gb Network Ports 
Minimum 3-node cluster  Minimum 3-node cluster 
Uses 256 AES encryption  Uses 256 AES encryption 

Key features


◉ Scalable to 24 FXT server nodes as demand grows

◉ High-performance DRAM/memory for faster access to active data and large SSD cache sizes to support big data workloads

◉ Single mountpoint provides simplified management across heterogeneous storage

◉ Hybrid architecture – NFSv3, SMB2 to clients and applications; support for NetApp, Dell EMC Isilon, Azure Blob, and S3 storage

The Azure FXT Edge Filer is a combination of hardware provided by Dell EMC and software provided by Microsoft. For ease, a complete solution will be delivered to customers as a software-plus-hardware appliance through a system integrator. If you are interested in learning more about adding the Azure FXT Edge Filer to your on-premises infrastructure or about upgrading existing Avere hardware, you can reach out to the team now. Otherwise, watch for update on the Azure FXT Edge Filer homepage.

Azure FXT Edge Filer for render farms


High-performance file access for render farms and artists is key to meeting important deadlines and building efficiencies into post-production pipelines. At NAB 2019 in Las Vegas, visit the Microsoft Azure booth #SL6716 to learn more about the new Azure FXT Edge Filer for rendering. You’ll find technology experts, presentations, and support materials to help you render faster with Azure.

Saturday, 27 April 2019

Azure Tips and Tricks - Become more productive with Azure

We’re pleased to re-introduce a web resource called “Azure Tips and Tricks” that helps existing developers using Azure learn something new within a couple of minutes. Since inception in 2017, the collection has grown to over 200+ tips as well as videos, conference talks, and several eBooks spanning the entire universe of the Azure platform. Featuring a new weekly tip and video it is designed to help you boost your productivity with Azure, and all tips are based off of practical real-world scenarios. The series spans the entire universe of the Azure platform from App Services, to containers, and more!

Azure Tips and Tricks, Azure Certifications, Azure Tutorials and Materials, Azure Learning

Figure 1: The Azure Tips and Tricks homepage.

With the new site, we’ve included the much-needed ability to navigate between Azure services, so that you can quickly browse your favorite categories.

Azure Tips and Tricks, Azure Certifications, Azure Tutorials and Materials, Azure Learning

Figure 2: The new Azure Tips and Tricks navigation capabilities.

There is search functionality to assist you to quickly find what you are looking for.

Azure Tips and Tricks, Azure Certifications, Azure Tutorials and Materials, Azure Learning

Figure 3: The new Azure Tips and Tricks search function.

The site is also open-source on GitHub, so anyone can help contribute to the site, ask questions, and jump-in wherever they want! While you are on the page go ahead and star us to keep up to date.

Azure Tips and Tricks, Azure Certifications, Azure Tutorials and Materials, Azure Learning
Figure 4: The Azure Tips and Tricks GitHub repo.

Friday, 26 April 2019

5 tips to get more out of Azure Stream Analytics Visual Studio Tools

Azure Stream Analytics is an on-demand real-time analytics service to power intelligent action. Azure Stream Analytics tools for Visual Studio make it easier for you to develop, manage, and test Stream Analytics jobs. This year we provided two major updates in January and March, unleashing new useful features. In this blog we’ll introduce some of these capabilities and features to help you improve productivity.

Test partial scripts locally


In the latest March update we enhanced local testing capability. Besides running the whole script, now you can select part of the script and run it locally against the local file or live input stream. Click Run Locally or press F5/Ctrl+F5 to trigger the execution. Note that the selected portion of the larger script file must be a logically complete query to execute successfully.

Azure Certifications, Azure Guides, Azure Learning, Azure Tutorials and Materials

Share inputs, outputs, and functions across multiple scripts


It is very common for multiple Stream Analytics queries to use the same inputs, outputs, or functions. Since these configurations and code are managed as files in Stream Analytics projects, you can define them only once and then use them across multiple projects. Right-click on the project name or folder node (inputs, outputs, functions, etc.) and then choose Add Existing Item to specify the input file you already defined. You can organize the inputs, outputs, and functions in a standalone folder outside your Stream Analytics projects to make it easy to reference in various projects.

Azure Certifications, Azure Guides, Azure Learning, Azure Tutorials and Materials

Duplicate a job to other regions


All Stream Analytics jobs running in the cloud are listed in Server Explorer under the Stream Analytics node. You can open Server Explorer by choosing from the View menu.

Azure Certifications, Azure Guides, Azure Learning, Azure Tutorials and Materials

If you want to duplicate a job to another region, just right-click on the job name and export it to a local Stream Analytics project. Since the credentials cannot be downloaded to local environment, you must specify the correct credentials in the job inputs and outputs files. After that, you are ready to submit the job to another region by clicking Submit to Azure in the script editor.

Azure Certifications, Azure Guides, Azure Learning, Azure Tutorials and Materials

Azure Certifications, Azure Guides, Azure Learning, Azure Tutorials and Materials

Local input schema auto-completion


If you have specified a local file for an input to your script, the IntelliSense feature will suggest input column names based on the actual schema of your data file.

Azure Certifications, Azure Guides, Azure Learning, Azure Tutorials and Materials

Testing queries against SQL database as reference data


Azure Stream Analytics supports Azure SQL Database as an input source for reference data. When you add a reference input using SQL Database, two SQL files are generated as code, behind files under your input configuration file.

Azure Certifications, Azure Guides, Azure Learning, Azure Tutorials and Materials

In Visual Studio 2017 or 2019, if you have already installed SQL Server Data tools, you can directly write the SQL query and test by clicking Execute in the query editor. A wizard window will pop up to help you connect to the SQL database and show the query result in the window at the bottom.

Azure Certifications, Azure Guides, Azure Learning, Azure Tutorials and Materials

Wednesday, 24 April 2019

Customize your Azure best practice recommendations in Azure Advisor

Cloud optimization is critical to ensuring you get the most out of your Azure investment, especially in complex environments with many Azure subscriptions and resource groups. Azure Advisor helps you optimize your Azure resources for high availability, security, performance, and cost by providing free, personalized recommendations based on your Azure usage and configurations.

In addition to consolidating your Azure recommendations into a single place, Azure Advisor has a configuration feature that can help you focus exclusively on your most important resources, such as those in production, and save you remediation time. You can also configure thresholds for certain recommendations based on your business needs.

Save time by configuring Advisor to display recommendations only for resources that matter to you


You can configure Azure Advisor to provide recommendations exclusively for the subscriptions and resource groups you specify. By narrowing your Advisor recommendations down to the resources that matter the most to you, you can save time optimizing your Azure workloads. To get you started we’ve created a step-by-step guide on how to configure Advisor in the Azure portal (UI).

Azure Certifications, Azure Learning, Azure Tutorials and Material, Azure Study Materials, Azure Guides

Please note that there’s a difference between Advisor configuration and the filtering options available in the Azure portal. Configuration is persistent and prevents recommendations from showing for the unselected scope (shown in the screenshot above). Filtering in the UI (shown in the screenshot below) temporarily displays a subset of recommendations. Available UI filters include subscription, service, and active versus postponed recommendations.

Azure Certifications, Azure Learning, Azure Tutorials and Material, Azure Study Materials, Azure Guides

Configuring thresholds for cost recommendations to find savings


You can also customize the CPU threshold for one of our most popular recommendations, “Right-size or shutdown underutilized virtual machines,” which analyzes your usage patterns and identifies virtual machines (VMs) with low usage. While certain scenarios can result in low utilization by design, you can often save money by managing the size and number of your VMs.

Azure Certifications, Azure Learning, Azure Tutorials and Material, Azure Study Materials, Azure Guides

You can modify the average CPU utilization threshold Advisor uses for this recommendation to a higher or lower value so you can find more savings depending on your business needs.

Azure Certifications, Azure Learning, Azure Tutorials and Material, Azure Study Materials, Azure Guides

Tuesday, 23 April 2019

Detecting threats targeting containers with Azure Security Center

In this blog post, we will focus on the security concerns of container environments.

Azure Security Center announced new features for containers security, including Docker recommendations and compliance based on the CIS benchmark for containers. We’ll go over several security concerns in containerized environments, from the Docker level to the Kubernetes cluster level, and we will show how Azure Security Center can help you detect and mitigate threats in the environment as they’re occurring in real time.

Docker analytics


When it comes to Docker a common access vector for attackers is a misconfigured daemon. By default the Docker engine is accessible only via a UNIX socket. This setting guarantees that the Docker engine won’t be accessible remotely. However, in many cases, remote management is required. Therefore, Docker support also TCP sockets. Docker supports an encrypted and authenticated remote communication. However running the daemon with a TCP socket, without explicitly specifying the “tlsverify” flag in the daemon execution, will enable anyone with a network access to the Docker host to send unauthenticated API requests to the Docker engine.

Azure Security Center, Azure Tutorial and Material, Azure Guides, Azure Certifications

Fig. 1 – Exposed Docker Daemon that is accessible over the network

A host that runs an exposed Docker daemon would be compromised very quickly. In Microsoft Threat Intelligence Center’s honeypots, scanners that are searching for exposed Docker daemon are seen frequently. Azure Security Center can detect and alert on such behavior.

Azure Security Center, Azure Tutorial and Material, Azure Guides, Azure Certifications

Fig 2.  – Exposed Docker alert

Another security concern could be running your containers with higher privileges than they really need. A container with high privileges can access the host’s resources. Thus, a compromised privileged container may lead to a compromised host. Azure Security Center detects and alerts when a privileged container runs.

Azure Security Center, Azure Tutorial and Material, Azure Guides, Azure Certifications

Fig. 3 – privileged container alert

There are additional suspicious behaviors that Azure Security Center can detect including running an SSH server in the container and running malicious images.

Cluster level security


Usually running a single instance of Docker is not enough and a container cluster is needed. Most people use Kubernetes for their container orchestration. A major concern in managing clusters is the possibility of privilege escalation and lateral movements inside the cluster.  We will demonstrate several scenarios and will show how Azure Security Center can help identify those malicious activities.

For the first demonstration, we’ll use a cluster without RBAC enabled.

In such a scenario (Fig. 4), the service account that is mounted by default to the pods has high cluster privileges. If one of the containers is compromised, an attacker can access the service account that is mounted to that container and use it for communicating with the API server.

Azure Security Center, Azure Tutorial and Material, Azure Guides, Azure Certifications

Fig. 4 – Vulnerable web application container accesses the API Server

In our case, one of the containers in the cluster is running a web application that is vulnerable with a remote code execution vulnerability and exposed to the Internet. There are many examples of vulnerabilities in web applications that allow remote code execution, including CVE-2018-7600.

We will use this RCE vulnerability to send a request to the API sever from the compromised application that is running in the cluster. Since the service account has high privileges, we can perform any action in the cluster. In the following example, we retrieve the secrets from the cluster and save the output on the filesystem of the web application so we can access it later:

Azure Security Center, Azure Tutorial and Material, Azure Guides, Azure Certifications

Fig. 5 – The payload send request to the API server

In fig. 5., we send a request to the API server (in the IP 10.0.0.1) that lists all the secrets in the default namespace. We do this by using the service account token that is located at /var/run/secretes/kubernetes.io/serviceaccount/token on the compromised container.

Now we can access the file secrets.txt that stores the secrets:

Azure Security Center, Azure Tutorial and Material, Azure Guides, Azure Certifications

Fig. 6 – dump of the cluster’s secrets

We can also list, delete, and create new containers and change other cluster resources.

Azure Security Center can identify and alert on suspicious requests to the API server from Kubernetes nodes (auditd on the cluster’s nodes required):

Azure Security Center, Azure Tutorial and Material, Azure Guides, Azure Certifications

Fig. 7 – Suspicious API request alert

One mitigation for this attack is to manage permissions in the cluster with RBAC. RBAC enables the user to grant different permissions to different accounts. By default, service accounts have no permissions to perform actions in the cluster.

However, many times even if RBAC is enabled attackers can still use such vulnerable containers for malicious purposes. A very convenient way to monitor and manage the cluster is through the Kubernetes Dashboard. The Dashboard, a container by itself, gets the default RBAC permissions that also does not enable any significant action. In order to use the dashboard many users grant permissions to the kubernetes-dashboard service account. In such cases attackers can perform actions in the cluster by using the dashboard container as a proxy instead of using the API server directly. The following payload retrieves the overview page of the default namespaces from the Kubernetes dashboard which contains information about main resources in the namespace:


Fig. 8 – request to the dashboard

In Fig. 8, a request is sent from the compromised container to the dashboard’s cluster IP (10.0.182.140 in this case). Fig. 9 describes the attack vector when the dashboard is used.

Azure Security Center, Azure Tutorial and Material, Azure Guides, Azure Certifications

Fig. 9 – Vulnerable container accesses the Kubernetes Dashboard

Azure Security Center can also identify and alert on suspicious requests to the dashboard container from Kubernetes nodes (auditd on the cluster’s nodes required).

Azure Security Center, Azure Tutorial and Material, Azure Guides, Azure Certifications

Fig. 10 – Suspicious request to the dashboard alert

Even if specific permissions were not given to any container, attackers with access to a vulnerable container can still gain valuable information about the cluster. Every Kubernetes node runs the Kubernetes agent named Kubelet which manages the containers that run on the specific node. Kubelet exposes a read-only API that does not require any authentication in port 10255. Anyone with network access to the node can query this API and get useful information about the node. Specifically querying http://[NODE IP]:10255/pods/ will retrieve all the running pods on the node.

http://[NODE IP]:10255/spec/ will retrieve information about the node itself such as CPU and memory consumption. Attackers can use this information for better understanding the environment of the compromised container.

Lateral movement and privilege escalation are among the top security concerns in container clusters. Detecting abnormal behavior in the cluster can help you detect and mitigate those threats.

Saturday, 20 April 2019

Rewrite HTTP headers with Azure Application Gateway

We are pleased to share the capability to rewrite HTTP headers in Azure Application Gateway. With this, you can add, remove, or update HTTP request and response headers while the request and response packets move between the client and backend application. You can also add conditions to ensure that the headers you specify are rewritten only when the conditions are met. The capability also supports several server variables which help store additional information about the requests and responses, thereby enabling you to make powerful rewrite rules.

Azure Tutorial and Materials, Azure Certifications, Azure Guides, Azure Learning, Azure Study Materials

Figure 1: Application Gateway removing the port information from the X-Forwarded-For header in the request and modifying the Location header in the response.

Rewriting the headers helps you accomplish several important scenarios. Some of the common use cases are mentioned below.

Remove port information from the X-Forwarded-For header


Application gateway inserts X-Forwarded-For header to all requests before it forwards the requests to the backend. The format of this header is a comma-separated list of IP:Port. However, there may be scenarios where the backend applications require the header to contain only the IP addresses. One such scenario is when the backend application is a Content Management System (CMS) because most CMS are not able to parse the additional port information in the header. For accomplishing such scenarios, you can set the header to the add_x_forwarded_for_proxy server variable which contains the X-Forwarded-For client request header without the port information.

Azure Tutorial and Materials, Azure Certifications, Azure Guides, Azure Learning, Azure Study Materials

Figure 2: Application Gateway configuration for removing the port information from the X-Forwarded-For header.

Better integration with App service and other multi-tenant backends


When a backend application sends a redirection response, you may want to redirect the client to a different URL than the one specified by the backend application. One such scenario is when an app service is hosted behind an application gateway.

Since app service is a multi-tenant service, it uses the host header in the request to route to the correct endpoint. App services have a default domain name of *.azurewebsites.net (say contoso.azurewebsites.net) which is different from the application gateway's domain name (say contoso.com). Since the original request from the client has application gateway's domain name contoso.com as the host name, the application gateway changes the hostname to contoso.azurewebsites.net, so that the app service in the backend can route it to the correct endpoint. But when the app service sends a redirection response, it uses the same hostname in the location header of its response as the one in the request it receives from the application gateway. Therefore, when the app service performs a redirection to its relative path (redirect from /path1 to /path2), the client will make the request directly to contoso.azurewebsites.net/path2, instead of going through the application gateway (contoso.com/path2). This will bypass the application gateway which is not desirable.

This issue can be resolved by setting the hostname in the location header to the application gateway's domain name. To do this, you can create a rewrite rule with a condition that evaluates if the location header in the response contains azurewebsites.net and performs an action to rewrite the location header to have application gateway's hostname.

Azure Tutorial and Materials, Azure Certifications, Azure Guides, Azure Learning, Azure Study Materials

Figure 3: Application Gateway configuration for modifying the location header.

Implement security-related HTTP headers to prevent vulnerabilities
Several security vulnerabilities can be fixed by implementing necessary headers in the application response. Some of these security headers are X-XSS-Protection, Strict-Transport-Security, Content-Security-Policy, X-Frame-Options, etc. You can use application gateway to set these headers for all responses.

Thursday, 18 April 2019

Move your data from AWS S3 to Azure Storage using AzCopy

AzCopy v10 (Preview) now supports Amazon Web Services (AWS) S3 as a data source. You can now copy an entire AWS S3 bucket, or even multiple buckets, to Azure Blob Storage using AzCopy.

Azure Tutorial and Materials, Azure Learning, Azure Certifications

Customers who wanted to migrate their data from AWS S3 to Azure Blob Storage have faced challenges because they had to bring up a client between the cloud providers to read the data from AWS to then put it in Azure Storage. This meant the scale and speed of the data transfer was limited to the client in the middle adding to the complexity of the move.

We have now addressed this issue in the latest release of AzCopy using a scale out technique thanks to the new Blob API. AzCopy v10, the next generation data transfer utility for Azure Storage, has been redesigned from scratch to provide data movement at greater scale with built-in resiliency. AzCopy v10 supports copying data efficiently both from a local file system to Azure Storage and between Azure Storage accounts. The latest release (AzCopy v10.0.9) adds support for AWS S3 as a source to help you move your data using a simple and efficient command-line tool.

New Blob API, Put from URL, helps move data efficiently


AzCopy copies data from AWS S3 with high throughput by scaling out copy jobs to multiple Azure Storage servers. AzCopy relies on the new Azure Storage REST API operation Put Block from URL, which copies data directly from a given URL. Using Put Block from URL, AzCopy v10 moves data from an AWS S3 bucket to an Azure Storage account, without first copying the data to the client machine where AzCopy is running. Instead, Azure Storage performs the copy operation directly from the source. Thanks to this method, the client in the middle is no longer the bottleneck.

Get started


To copy an S3 bucket to a Blob container, use the following command:

azcopy cp "https://s3.amazonaws.com/mybucket/" "https://mystorageaccount.blob.core.windows.net/mycontainer<SAS>" --recursive

In testing copy operations from an AWS S3 bucket in the same region as an Azure Storage account, we hit rates of 50 Gbps – higher is possible! This level of performance makes AzCopy a fast and simple option when you want to move large amounts of data from AWS. AzCopy also provides resiliency. Each failure is automatically retried a number of times to mitigate network glitches. In addition, a failed or canceled job can be resumed or restarted so that you can easily move TBs of data at once.

Azure Tutorial and Materials, Azure Learning, Azure Certifications

Azure Data Factory


Alternatively, if you are looking for a fully managed Platform-as-a-Service (PaaS) option for migrating data from AWS S3 to Azure Storage, consider Azure Data Factory (ADF), which provides these additional benefits:

◈ Azure Data Factory provides a code-free authoring experience and a rich built-in monitoring dashboard.

◈ Easily scale up the amount of horsepower to move data in a serverless manner and only pay for what you use.

◈ Use Azure Integration Runtime (IR) for moving data over the public Internet, or use a self-hosted IR for moving data over AWS DirectConnect peered with Azure ExpressRoute.

◈ The ability to perform one-time historical load, as well as scheduled incremental load.

◈ Integrates with Azure Key Vault for credential management to achieve enterprise-grade security.

◈ Provides 80+ connectors out of box and native integration with all Azure data services so that you can leverage ADF for all your data integration and ETL needs across hybrid environments.

Wednesday, 17 April 2019

Machine Learning powered detections with Kusto query language in Azure Sentinel

As cyberattacks become more complex and harder to detect. The traditional correlation rules of a SIEM are not enough, they are lacking the full context of the attack and can only detect attacks that were seen before. This can result in false negatives and gaps in the environment. In addition, correlation rules require significant maintenance and customization since they may provide different results based on the customer environment.

Advanced Machine Learning capabilities that are built in into Azure Sentinel can detect indicative behaviors of a threat and helps security analysts to learn the expected behavior in their enterprise. In addition, Azure Sentinel provides out-of-the-box detection queries that leverage the Machine Learning capabilities of Azure Monitor Logs query language that can detect suspicious behaviors in such as abnormal traffic in firewall data, suspicious authentication patterns, and resource creation anomalies.

Below you can find three examples for detections leveraging built in Machine Learning capabilities to protect your environment.

Time series analysis of authentication of user accounts from unusual large number of locations


A typical organization may have many users and many applications using Azure Active Directory for authentication. Some applications (for example Office365 Exchange Online) may have many more authentications than others (say Visual Studio) and thus dominate the data. Users may also have a different location profile depending on the application. For example high location variability for email access may be expected, but less so for development activity associated with Visual Studio authentications. The ability to track location variability for every user/application combination and then investigate just some of the most unusual cases can be achieved by leveraging the built in query capabilities using the operators make-series and series_fit_line.

SigninLogs
| where TimeGenerated >= ago(30d)
| extend  locationString= strcat(tostring(LocationDetails["countryOrRegion"]), "/", tostring(LocationDetails["state"]), "/", tostring(LocationDetails["city"]), ";")
| project TimeGenerated, AppDisplayName , UserPrincipalName, locationString
| make-series dLocationCount = dcount(locationString) on TimeGenerated in range(startofday(ago(30d)),now(), 1d)
by UserPrincipalName, AppDisplayName
| extend (RSquare,Slope,Variance,RVariance,Interception,LineFit)=series_fit_line(dLocationCount)
| where Slope >0.3

Azure Security, Azure Tutorial and Material, Azure Certifications, Azure Learning

Creation of an anomalous number of resources


Resource creation in Azure is a normal operation in the environment. Operations and IT teams frequently spin up environments and resources based on the organizational needs and requirements. However, an anomalous creation of resource by users that don’t have permissions or aren’t supposed to create these resources is extremely interesting. Tracking anomalous resources creation or suspicious deployment activities in azure activity log can provide a lead to spot an execution technique done by an attacker.

AzureActivity
| where TimeGenerated >= ago(30d)
| where OperationName == "Create or Update Virtual Machine" or OperationName == "Create Deployment"
| where ActivityStatus == "Succeeded"
| make-series num = dcount(ResourceId)  default=0 on EventSubmissionTimestamp in range(ago(30d), now(), 1d) by Caller
| extend  outliers=series_outliers(num, "ctukey", 0, 10, 90)
| project-away num
| mvexpand outliers
| where outliers > 0.9
| summarize by Caller

Azure Security, Azure Tutorial and Material, Azure Certifications, Azure Learning

Firewall traffic anomalies


Firewall traffic can be an additional indicator of a potential attack in the organization. The ability to establish a baseline that represents the usual firewall traffic behavior on a weekly or an hourly basis can help point out the anomalous increase in traffic. Using the built-in capabilities in the Log Analytics query language can point directly to the traffic anomaly and be investigated.

CommonSecurityLog
| summarize count() by bin(TimeGenerated, 1h)

Azure Security, Azure Tutorial and Material, Azure Certifications, Azure Learning

With Azure Sentinel, you can create the above advanced detection rules to detect anomalies and suspicious activities in your environment, create your own detection rules or leverage the rich GitHub library that contains detections written by Microsoft security researchers.

Tuesday, 16 April 2019

Deploying Grafana for production deployments on Azure

Grafana is one of the popular and leading open source tools for visualizing time series metrics. Grafana has quickly become the preferred visualization tool of choice for developers and operations teams for monitoring server and application metrics. Grafana dashboards enable operation teams to quickly monitor and react to performance, availability, and overall health of the service. You can now also use it to monitor Azure services and applications by leveraging the Azure Monitor data source plugin, built by Grafana Labs. This plugin enables you to include all metrics from Azure Monitor and Application Insights in your Grafana dashboards. If you would like to quickly setup and test Grafana with Azure Monitor and Application Insights metrics.

Azure Tutorial and Material, Azure Guides, Azure Certifications, Azure Learning

Grafana server image in Azure Marketplace provides a great QuickStart deployment experience. The image provisions a virtual machine (VM) with a pre-installed Grafana dashboard server, SQLite database  and the Azure plugin. The default setup with a single VM deployment is great for a proof of concept study and testing. For high availability of monitoring dashboards for your critical applications and services, it’s essential to think of high availability of Grafana deployments on Azure. The following is the proposed and proven architecture to setup Grafana for high availability and security on Azure.

Setting up Grafana for production deployments


Azure Tutorial and Material, Azure Guides, Azure Certifications, Azure Learning

Grafana Labs recommends setting up a separate highly available shared MySQL server for setting up Grafana for high availability. The Azure Database for MySQL and MariaDB are managed relational database services based on the community edition of MySQL and the MariaDB database engine. The service provides high availability at no additional cost, predictable performance, elastic scalability, automated backups and enterprise grade security with secure sockets layer (SSL) support, encryption at rest, advanced threat protection, and VNet service endpoint support. Utilizing a remote configuration database with Azure Database for MySQL or Azure Database for MariaDB service allows for horizontal scalability and high availability of Grafana instances required for enterprise production deployments.

Leveraging Bitnami Multi-Tier Grafana templates for production deployments


Bitnami lets you deploy a multi-node, production ready Grafana solution from the Azure Marketplace with just a few clicks. This solution uses several Grafana nodes with a pre-configured load balancer and Azure Database for MariaDB for data storage. The number of nodes can be chosen at deployment time depending on your requirements. Communication between the nodes and the Azure Database for MariaDB service is also encrypted with SSL to ensure security.

A key feature of Bitnami's Grafana solution is that it comes pre-configured to provide a fault-tolerant deployment. Requests are handled by the load balancer, which continuously tests nodes to check if they are alive and automatically reroutes requests if a node fails. Data (including session data) is stored in the Azure Database for MariaDB and not on the individual nodes. This approach improves performance and protects against data loss due to node failure.

Configuring existing installations of Grafana to use Azure Database for MySQL service


If you have an existing installation of Grafana that you would like to configure for high availability, you can use the following steps that demonstrate configuring Grafana instance to use Azure Database for MySQL server as the backend configuration database. In this walkthrough, we will be using an example of Ubuntu with Grafana installed and configure Azure Database for MySQL as a remote database for Grafana setup.

1. Create an Azure Database for MySQL server with the General Purpose tier which is recommended for production deployments. If you are not familiar with the database server creation, you can read the QuickStart tutorial to familiarize yourself with the workflow. If you are using Azure CLI, you can simply set it up using az mysql up.

2. If you have already installed Grafana on the Ubuntu server, you’ll need to edit the grafana.ini file to add the Azure Database for MySQL parameters. We will focus on the database parameters noted in the documentation. Please note: The username must be in the format user@server due to the server identification method of Azure Database for MySQL. Other formats will cause connections to fail.

3. Azure Database for MySQL supports SSL connections. For enterprise production deployments, it is recommended to always enforce SSL. Most modern installations of Ubuntu will have the necessary Baltimore Cyber Trust CA certificate already installed in your /etc/ssl/certs location. If needed, you can download the SSL Certificate CA used for Azure Database for MySQL from  this location. The SSL mode can be provided in two forms, skip-verify and true. With skip-verify we will not validate the certificate provided but the connection is still encrypted. With true we are going to ensure that the certificate provided is validated   by the Baltimore CA. This is useful for preventing “man in the middle” attacks. Note that in both situations, Grafana expects the certificate authority (CA) path to be provided.

4. Next, you have the option to store the sessions of users in the Azure DB for MySQL in the table session. This is configured in the same grafana.ini under the session section. This is beneficial for instance in situations where you have load balanced environments to maintain sessions for users accessing Grafana. In the provider_config parameter, we need to include the user@server, password, full server and the TLS/SSL method. In this manner, this can be true or ssl-verify. 

5. After this is all set, you should be able to start Grafana and verify the status with the commands below:

◈ systemctl start grafana-server
◈ systemctl status grafana-server

If you see any errors or issues, the default path for logging is /var/log/grafana/ where you can confirm what is preventing the startup. The following is a sample error where the username was not provided as user@server but rather just user.

lvl=eror msg="Server shutdown" logger=server reason="Service init failed: Migration failed err: Error 9999: An internal error has occurred. Please retry or report your issues.

Otherwise you should see the service in an Ok status and the initial startup will build all the necessary tables in the Azure DB for MySQL database.

Key takeaways


◈ The single VM setup for Grafana is great for quick start, testing and a proof of concept study but it may not be suitable for production deployments.

◈ For enterprise production deployments of Grafana, separating the configuration database to the dedicated server enables high availability and scalability.

◈ The Bitnami Grafana Multi-Tier template provides production ready template leveraging the scale out design and security to provision Grafana with a few clicks with no extra cost.

◈ Using managed database services like Azure Database for MySQL for production deployments provides built-in high availability, scalability, and enterprise security for the database repository.

Saturday, 13 April 2019

How to accelerate DevOps with Machine Learning lifecycle management

DevOps is the union of people, processes, and products to enable the continuous delivery of value to end users. DevOps for machine learning is about bringing the lifecycle management of DevOps to Machine Learning. Utilizing Machine Learning, DevOps can easily manage, monitor, and version models while simplifying workflows and the collaboration process.

Effectively managing the Machine Learning lifecycle is critical for DevOps’ success. And the first piece to machine learning lifecycle management is building your machine learning pipeline(s).

What is a Machine Learning Pipeline? 


DevOps for Machine Learning includes data preparation, experimentation, model training, model management, deployment, and monitoring while also enhancing governance, repeatability, and collaboration throughout the model development process. Pipelines allow for the modularization of phases into discrete steps and provide a mechanism for automating, sharing, and reproducing models and ML assets. They create and manage workflows that stitch together machine learning phases. Essentially, pipelines allow you to optimize your workflow with simplicity, speed, portability, and reusability.

There are four steps involved in deploying machine learning that data scientists, engineers and IT experts collaborate on:

1. Data Ingestion and Preparation
2. Model Training and Retraining
3. Model Evaluation
4. Deployment

Azure Certifications, Azure Learning, Azure Tutorial and Material, Azure Guides, Azure All Certifications

“Using distinct steps makes it possible to rerun only the steps you need, as you tweak and test your workflow. A step is a computational unit in the pipeline. As shown in the preceding diagram, the task of preparing data can involve many steps. These include, but aren't limited to, normalization, transformation, validation, and featurization. Data sources and intermediate data are reused across the pipeline, which saves compute time and resources.”

4 benefits of accelerating Machine Learning pipelines for DevOps


1. Collaborate easily across teams


◈ Data scientists, data engineers, and IT professionals using machine learning pipelines need to collaborate on every step involved in the machine learning lifecycle: from data prep to deployment.

◈ Azure Machine Learning service workspace is designed to make the pipelines you create visible to the members of your team. You can use Python to create your machine learning pipelines and interact with them in Jupyter notebooks, or in another preferred integrated development environment.

2. Simplify workflows


◈ Data prep and modeling can last days or weeks, taking time and attention away from other business objectives.

◈ The Azure Machine Learning SDK offers imperative constructs for sequencing and parallelizing the steps in your pipelines when no data dependency is present. You can also templatize pipelines for specific scenarios and deploy them to a REST endpoint, so you can schedule batch-scoring or retraining jobs. You only need to rerun the steps you need, as you tweak and test your workflow when you rerun a pipeline.

3. Centralized Management


◈ Tracking models and their version histories is a hurdle many DevOps teams face when building and maintaining their machine learning pipelines.

◈ The Azure Machine Learning service model registry tracks models, their version histories, their lineage and artifacts. Once the model is in production, the Application Insights service collects both application and model telemetry that allows the model to be monitored in production for operational and model correctness. The data captured during inferencing is presented back to the data scientists and this information can be used to determine model performance, data drift, and model decay, as well as the tools to train, manage, and deploy machine learning experiments and web services in one central view.

◈ The Azure Machine Learning SDK also allows you to submit and track individual pipeline runs. You can explicitly name and version your data sources, inputs, and outputs instead of manually tracking data and result paths as you iterate. You can also manage scripts and data separately for increased productivity. For each step in your pipeline. Azure coordinates between the various compute targets you use, so that your intermediate data can be shared with the downstream compute targets easily. You can track the metrics for your pipeline experiments directly in the Azure portal.

4. Track your experiments easily


◈ DevOps capabilities for machine learning further improve productivity by enabling experiment tracking and management of models deployed in the cloud and on the edge. All these capabilities can be accessed from any Python environment running anywhere, including data scientists’ workstations. The data scientist can compare runs, and then select the “best” model for the problem statement.

◈ The Azure Machine Learning workspace keeps a list of compute targets that you can use to train your model. It also keeps a history of the training runs, including logs, metrics, output, and a snapshot of your scripts. Create multiple workspaces or common workspaces to be shared by multiple people.

Friday, 12 April 2019

How do teams work together on an automated machine learning project?

Azure Certifications, Azure Tutorial and Material, Azure Certifications, Azure Guides

When it comes to executing a machine learning project in an organization, data scientists, project managers, and business leads need to work together to deploy the best models to meet specific business objectives. A central objective of this step is to identify the key business variables that the analysis needs to predict. We refer to these variables as the model targets, and we use the metrics associated with them to determine the success of the project.

We’ll see how a data scientist, project manager, and business lead at a retail grocer can leverage automated machine learning and Azure Machine Learning service to reduce product overstock. Azure Machine Learning service is a cloud service that you use to train, deploy, automate, and manage machine learning models, all at the broad scale that the cloud provides. Automated machine learning within Azure Machine Learning service is the process of taking training data with a defined target feature, and iterating through combinations of algorithms and feature selections to automatically select the best model for your data based on the training scores.

Excess stock quickly becomes a liquidity problem, as it is not converted back to cash unless margins are reduced by means of discounts and promotions or, even worse, when it accumulates to be sent to other channels such as outlets, delaying its sale. Identifying in advance which products will not have the level of rotation they expect and controlling replenishment with stock cover that is aligned with sales forecasts are key factors in helping retailers achieve ROI on their investments. Let’s see how the team goes about solving this problem and how automated machine learning enables the democratization of artificial intelligence across the company.

Identify the right business objective for the company


Strong sales and profits are the result of having the right product mix and level of inventory. Achieving this ideal mix requires having current and accurate inventory information. Manual processes not only take time, causing delays in producing current and accurate inventory information, but also increase the likelihood of errors. These delays and errors are likely to cause lost revenue due to inventory overstocks, understocks, and out-of-stocks.

Overstock inventory can also take valuable warehouse space and tie up cash that ought to be used to purchase new inventory. But selling it in liquidation mode can cause its own set of problems, such as tarnishing your reputation and cannibalizing sales of other current products.

The project manager, being the bridge between data scientists and business operations, reaches out to the business lead to discuss the possibilities of using some of their internal and historical sales to solve their overstock inventory problem. The project manager and the business lead define project goals by asking and refining tangible questions that are relevant for the business objective.

There are two main tasks addressed in this stage:

◈ Define objectives: The project manager and the business lead need to identify the business problems and, most importantly, formulate questions that define the business goals that the data science techniques can target.
◈ Identify data sources: The project manager and data scientist need to find relevant data that helps answer the questions that define the objectives of the project.

Look for the right data and pipeline


It all starts with data. The project manager and the data scientist need to identify data sources that contain known examples of answers to the business problem. They look for the following types of data:

◈ Data that is relevant to the question. Do they have measures of the target and features that are related to the target?
◈ Data that is an accurate measure of their model target and the features of interest.

There are three main tasks that the data scientist needs to address in this stage:

1. Ingest the data into the target analytics environment
2. Explore the data to determine if the data quality is adequate to answer the question
3. Set up a data pipeline to score new or regularly refreshed data

After setting up the process to move the data from the source locations to the target locations where it’s possible to run analytics operations, the data scientist starts working on raw data to produce a clean, high-quality data set whose relationship to the target variables is understood. Before training machine learning models, the data scientist needs to develop a sound understanding of the data and create a data summarization and visualization to audit the quality of the data and provide the information needed to process the data before it's ready for modeling.

Finally, the data scientist is also in charge of developing a solution architecture of the data pipeline that refreshes and scores the data regularly.

Forecast orange juice sales with automated machine learning


The data scientist and project manager decide to use automated machine learning for a few reasons: automated machine learning empowers customers, with or without data science expertise, to identify an end-to-end machine learning pipeline for any problem, achieving higher accuracy while spending far less of their time. And it also enables a significantly larger number of experiments to be run, resulting in faster iteration toward production-ready intelligent experiences.

Let’s look at how their process using automated machine learning for orange juice sales forecasting delivers on these benefits.

After agreeing on the business objective and what type of internal and historical data should be used to meet that objective, the data scientist creates a workspace. This workspace is the top-level resource for the service and provides data scientists with a centralized place to work with all the artifacts they need to create. When a workspace is created in an AzureML service, the following Azure resources are added automatically (if they are regionally available):

◈ Azure Container Registry
◈ Azure Storage
◈ Azure Application Insights
◈ Azure Key Vault

To run automated machine learning, the data scientist also needs to create an Experiment. An Experiment is a named object in a workspace that represents a predictive task, the output of which is a trained model and a set of evaluation metrics for the model.

The data scientist is now ready to load the historical orange juice sales data and loads the CSV file into a plain pandas DataFrame. The time column in the CSV is called WeekStarting, so it will be specially parsed into the datetime type.

Each row in the DataFrame holds a quantity of weekly sales for an orange juice brand at a single store. The data also includes the sales price, a flag indicating if the orange juice brand was advertised in the store that week, and some customer demographic information based on the store location. For historical reasons, the data also includes the logarithm of the sales quantity.

The task is now to build a time series model for the Quantity column. It’s important to note that this data set is comprised of many individual time series; one for each unique combination of Store and Brand. To distinguish the individual time series, we thus define the grain—the columns whose values determine the boundaries between time series.

After splitting the data into a training and a testing set for later forecast evaluation, the data scientist starts working on the modeling step for forecasting tasks, and automated machine learning uses pre-processing and estimation steps that are specific to time series. Automated machine learning will undertake the following pre-processing steps:

◈ Detect the time series sample frequency (e.g., hourly, daily, weekly) and create new records for absent time points to make the series regular. A regular time series has a well-defined frequency and has a value at every sample point in a contiguous time span.

◈ Impute missing values in the target via forward-fill and feature columns using median column values.

◈ Create grain-based features to enable fixed effects across different series.

◈ Create time-based features to assist in learning seasonal patterns.

◈ Encode categorical variables to numeric quantities.

The AutoMLConfig object defines the settings and data for an automated machine learning training job. Below is a summary of automated machine learning configuration parameters that were used for training the orange juice sales forecasting model:

Azure Certifications, Azure Tutorial and Material, Azure Certifications, Azure Guides

Each iteration runs within an experiment and stores serialized pipelines from the automated machine learning iterations until they retrieve the pipeline with the best performance on the validation data set.

Once the evaluation has been performed, the data scientist, project manager, and business lead meet again to review the forecasting results. It’s the project manager and business lead’s job to make sense of the outputs and choose practical steps based on those results. The business lead needs to confirm that the best model and pipeline meet the business objective and that the machine learning solution answers the questions with acceptable accuracy to deploy the system to production for use by their internal sales forecasting application.

Microsoft invests in Automated Machine Learning


Automated machine learning is based on a breakthrough from the Microsoft Research division. The approach combines ideas from collaborative filtering and Bayesian optimization to search an enormous space of possible machine learning pipelines intelligently and efficiently. It’s essentially a recommender system for machine learning pipelines. Similar to how streaming services recommend movies for users, automated machine learning recommends machine learning pipelines for data sets.

As you’ve seen here, Automated machine learning empowers customers, with or without data science expertise, to identify an end-to-end machine learning pipeline for any problem and save time while increasing accuracy. It also enables a larger number of experiments to be run and faster iterations. How could automated machine learning benefit your organization? How could your team work more closely on using machine learning to meet your business objectives?