Friday 31 May 2019

Integrating Azure CNI and Calico: A technical deep dive

We are pleased to share the availability of Calico Network Policies in Azure Kubernetes Service (AKS). Calico policies lets you define filtering rules to control flow of traffic to and from Kubernetes pods. In this blog post, we will explore in more technical detail the engineering work that went into enabling Azure Kubernetes Service to work with a combination of Azure CNI for networking and Calico for network policy.

First, some background. Simplifying somewhat, there are three parts to container networking:

1. Allocating an IP address to each container as it’s created, this is IP address management or IPAM.

2. Routing the packets between container endpoints, which in turn splits into:

◈ Routing from host to host (inter-node routing).
◈ Routing within the host between the external network interface and the container, as well as routing between containers on the same host (intra-node routing).

3. Ensuring that packets that should not be allowed are blocked (network policy).

Typically, a single network plug-in technology addresses all these aspects. However, the open API used by Kubernetes Container Network Interface (CNI), actually allows you to combine different implementations.

The choice of configurations brings you opportunities, but also calls for a plan to make sure that the mechanisms you choose are compatible and enable you to achieve your networking goals. Let’s look a bit more closely into those details

Networking: Azure CNI


Cloud networks, like Azure, were originally built for virtual machines with typically just one or a small number of relatively static IP addresses. Containers change all that, and introduce a host of new challenges for the cloud networking layer, as dozens or even hundreds of workloads are rapidly created and destroyed on a regular basis, each of which is its own IP endpoint on the underlying network.

The first approach at enabling container networking in the cloud leveraged overlays, like VXLAN, to ensure only the host IP was exposed to the underlying network. Such overlay network solutions like flannel, or AKS’s kubenet (basic) networking mode, do a great job of hiding the underlying network from the containers. Unfortunately that is also the downside, the containers are not actually running in the underlying VNET, meaning they cannot be addressed like a regular endpoint and can only communicate outside of the cluster via network address translation (NAT).

With Azure CNI, which is enabled with advanced mode networking in AKS, we added the ability for each container to get its own real IP address within the same VNET as the host. When a container is created, the Azure CNI IPAM component assigns it an IP address from the VNET, and ensures that the address is configured on the underlying network through the magic of the Azure software-defined network layer, taking care of the inter-node routing piece.

So with IPAM and inter-node routing taken care of, we now need to consider intra-node routing. How do we do intra-node routing, i.e. get a packet between two containers, or between the host’s network interface (typically eth0) and the virtual ethernet (veth) interface of the container?

It turns out the Linux kernel is rich in networking capabilities, and there are many different ways to achieve this goal. One of the simplest and easiest is with a virtual bridge device. With this approach, all the containers are connected on a local layer two segment, just like physical machines that are connected via an ethernet switch.

◈ Packets from the ‘real’ network are switched through the bridge to the appropriate container via standard layer two techniques (ARP and address learning).

◈ Packets to the real network are passed through the bridge, to the NIC, where they are routed to the remote node.

◈ Packets from one container to another also flow through the bridge, just like two PCs connected on an ethernet switch.

This approach, which is illustrated in figure one, has the advantage of being high performance and requiring little control plane logic to maintain, helping to ensure robustness.

Azure Certifications, Azure Learning, Azure Guides, Azure Study Materials

Figure 1: Azure CNI networking

Network policy with Azure


Kubernetes has a rich policy model for defining which containers are allowed to talk to which other ones, as defined in the Kubernetes Network Policy API. 

We translate the Kubernetes network policy model to a set of allowed IP address pairs, which are then programmed as rules in the Linux kernel iptables module. These rules are applied to all packets going through the bridge. This is shown in figure two.

Azure Certifications, Azure Learning, Azure Guides, Azure Study Materials

Figure 2: Azure CNI with Azure Policy Manager

Network policy with Calico


Kubernetes is also an open ecosystem, and Tigera’s Calico is well known as the first, and most widely deployed, implementation of Network Policy across cloud and on-premise environments. In addition to the base Kubernetes API, it also has a powerful extended policy model which supports a range of features such as global network policies, network sets, more flexible rule specification, the ability to run the policy enforcement agent on non-Kubernetes nodes, and application layer policy via integration with Istio. Furthermore, Tigera offers a commercial offering built on Calico, Tigera Secure, that adds a host of enterprise management, controls, and compliance features.

Given Kubernetes’ aforementioned modular networking model, you might think you could just deploy Calico for network policy along with Azure CNI, and it should all just work. Unfortunately, it is not this simple.

While Calico uses iptables for policy, it does so in a subtly different way. It expects containers to be established with separate kernel routes, and it enforces the policies that apply to each container on that specific container’s virtual ethernet interface. This has the advantage that all container-to-container communications are identical (always a layer 3 routed hop, whether internal to the host or across the underlying network), and security policies are more narrowly applied to the specific container’s context.

To make Azure CNI compatible with the way Calico works we added a new intra-node routing capability to the CNI, ,which we call ‘transparent’ mode. When configured to run in this mode, Azure CNI sets up local routes for containers instead of creating a virtual bridge device. This is shown in Figure 3.

Azure Certifications, Azure Learning, Azure Guides, Azure Study Materials

Figure 3: Azure CNI with Calico Network Policy

Onward and upstream


A Kubernetes cluster with the enhanced Azure CNI and Calico policies can be created using AKS-Engine by specifying the following configuration in the cluster definition file.

"properties": {

"orchestratorProfile": {

"orchestratorType": "Kubernetes",

"kubernetesConfig":

{ "networkPolicy": "calico", "networkPlugin": "azure" }

These options have also been integrated into AKS itself, enabling you to provision a cluster with Azure networking and Calico network policy by simply specifying the options --network-plugin azure --network-policy Calico at cluster create time.

Thursday 30 May 2019

Key causes of performance differences between SQL managed instance and SQL Server

Migrating to a Microsoft Azure SQL Database managed instance provides a host of operational and financial benefits you can only get from a fully managed and intelligent cloud database service. Some of these benefits come from features that optimize or improve overall database performance. After migration many of our customers are eager to compare workload performance with what they experienced with on-premises SQL Server, and sometimes they're surprised by the results. In many cases, you might get better results on the on-premises SQL Server database because a SQL Database managed instance introduces some overhead for manageability and high availability. In other cases, you might get better results on a SQL Database managed instance because the latest version of the database engine has improved query processing and optimization features compared to older versions of SQL Server.

SQL managed instance, SQL Server, Azure Study Materials, Azure Guides, Azure Tutorials and Materials

This article will help you understand the underlying factors that can cause performance differences and the steps you can take to make fair comparisons between SQL Server and SQL Database.

If you're surprised by the comparison results, it's important to understand what factors could influence your workload and how to configure your test environments to ensure you have a fair comparison. Some of the top reasons why you might experience lower performance on a SQL Database managed instance compared to SQL Server are listed below. You can mitigate some of these by increasing and pre-allocating file sizes or adding cores; however, the others are prerequisites for guaranteed high availability and are part of the PaaS service.

Simple or bulk recovery model


The databases placed on the SQL Database managed instance are using a full database recovery model to provide high availability and guarantee no data loss. In this scenario, one of the most common reasons why you might get worse performance on a SQL Database managed instance is the fact that your source database uses a simple or bulk recovery model. The drawback of the full recovery model is that it generates more log data than the simple/bulk logged recovery model, meaning your DML transaction processing in the full recovery model will be slower.

You can use the following query to determine what recovery model is used on your databases:

select name, recovery_model_desc from sys.databases

If you want to compare the workload running on SQL Server and SQL Database managed instances, for a fair comparison make sure the databases on both sides are using the full recovery model.

Resource governance and HA configuration


SQL Database managed instance has built-in resource governance that ensures 99.99% availability, and guarantees that management operations such as automated backups will be completed even under high workloads. If you don’t use similar constraints on your SQL Server, the built-in resource governance on SQL Database managed instance might limit your workload.

For example, there's an instance log throughput limit (up to 22MBs on the general purpose and up to 48MBs on the business critical tier) that ensures you can't load more data than the instance can backup. In this case, you might see higher INSTANCE_LOG_GOVERNOR wait statistics that don’t exist in your SQL Server instance. These resource governance constraints might slow down operations such as bulk load or index rebuild because these operations require higher log rates.

In addition, the secondary replicas in business critical tier instances might slow down the primary database if they can't catch-up the changes and apply them, so you might see additional HADR_DATABASE_FLOW_CONTROL or HADR_THROTTLE_LOG_RATE_SEND_RECV wait statistics.

If you're comparing your SQL Server workload running on local SSD storage to the business critical tier, note that the business critical instance is an Always On availability group cluster with three secondary replicas. Make sure that your source SQL Server has an HA implementation similarly using Always On availability groups with at least one synchronous commit replica. If you're comparing the business critical tier with a single SQL Server instance writing to the local disk, this would be an unrealistic comparison due to the absence of HA on your source instance. If you are using async always on replicas, you would have HA with better performance, but in this case you are making the trade-off between the possibility of data-loss in favor of performance, and you will get the better results on the SQL Server instance.

Automated backup schedule


One of the main reasons why you would choose the SQL Database managed instance is the fact that it guarantees you will always have backups of your databases, even under heavy workloads. The databases in a SQL Database managed instance have scheduled full, incremental, and log backups. Full backups are taken every seven days, incremental every twelve hours, and log backups are taken every five to ten minutes. If you have multiple databases on the instance there's a high chance there is at least one backup currently running.

Since the backup operations are using some instance resources (CPU, disk, network), they can affect workload performance. Make sure the databases on the system that you compare with the managed instance have similar backup schedules. Otherwise, you might need to accept that you're getting better results on your SQL Server instance because you're making a trade-off between database recovery and performance, which is not possible on a SQL Database managed instance.

If you're seeing unexpected performance differences, check if there is some ongoing full/differential backup either on the SQL Database managed instance or SQL Server instance that can affect performance of the currently running workload, using the following query:

SELECT r.command, query = a.text, start_time, percent_complete,
      eta = dateadd(second,estimated_completion_time/1000, getdate())
FROM sys.dm_exec_requests r
    CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) a
 WHERE r.command IN ('BACKUP DATABASE','BACKUP LOG')

If you see currently running full or incremental backup during the short-running benchmark, you might pause your workload and resume it once the backup finishes.

Connection and App to Database proximity


The application accessing the databases and executing the benchmark queries on the SQL Database managed instance and SQL Server instance must be in a similar network proximity range in both cases. If you are placing your application and SQL Server database in the local environment (or running an app like HammerDB from the same machine where the SQL Server is installed) you will get better results on SQL Server compared to the SQL Database managed instance, which is placed on a distributed cloud environment with respect to the application. Make sure that in both cases you're running the benchmark application or query on separate virtual machines in the same region as SQL Database managed instance to get the valid results. If you're comparing an on-premises environment with the equivalent cloud environments, try to measure bandwidth and latency between the app and database and try to ensure they are similar.

SQL Database managed instance is accessed via proxy gateway nodes that accept the client requests and redirect them to the actual database engine nodes. In order to provide the results closer to your environment, enable ProxyOverride mode on your instance using Set-AzSqlInstance PowerShell command to enable direct access from the client to the nodes currently hosting your SQL Database managed instance.

In addition, due to compliance requirements, a SQL Database managed instance enforces SSL/TLS transport encryption which is always enabled. Encryption can introduce overhead in case of a large number of queries. If your on-premises environment does not enforce SSL encryption you will see additional network overhead in the SQL Database managed instance.

Transparent data encryption


The databases on SQL Database managed instance are encrypted by default using Transparent Data Encryption. Transparent Data Encryption encrypts/decrypts every page that is exchanged with the disk storage. This spends more CPU resources, and introduces additional latency in the process of fetching and saving the data pages to or from disk storage. Make sure that both databases on SQL Database managed instance and SQL Server have Transparent Data Encryption either turned on or off, and that database encryption/decryption operations have completed before starting performance testing.

You can use the following query to determine whether the databases are encrypted:

select name, is_encrypted from sys.databases

Another important factor that might affect your performance is encrypted TempDB. TempDB is encrypted if at least one database on your SQL Server or SQL Database managed instance is encrypted. As a result, you might compare two databases that are not encrypted, but due to some other SQL Database managed instance being encrypted (although it's not involved in the workload) the TempDB will also be encrypted. The unencrypted databases will still use encrypted TempDB and any query that creates temporary objects or uses spills would be slower. Note that TempDB will only get decrypted once all user databases on an instance are decrypted, and the instance restarts. Scaling a SQL Database managed instance to a new pricing tier and back is one way to restart it.

Database engine settings


Make sure the database engine setting such as database compatibility levels, trace flags, system configurations (‘cost threshold for parallelism’, ’max degree of parallelism’), database scoped configurations (LEGACY_CARDINALITY_ESTIMATOR, PARAMETER_SNIFFING, QUERY_OPTIMIZER_HOTFIXES, etc.), and database settings (AUTO_UPDATE_STATISTICS, DELAYED_DURABILITY) on the SQL Server and SQL Database managed instances are the same on both databases.

The following sample queries can help you to identify setting on SQL Server and Azure SQL Database managed instance:

select compatibility_level, snapshot_isolation_state_desc, is_read_committed_snapshot_on,

  is_auto_update_stats_on, is_auto_update_stats_async_on, delayed_durability_desc
from sys.databases;
GO

select * from sys.database_scoped_configurations;
GO

dbcc tracestatus;
GO

select * from sys.configurations;

Compare the results of these queries on the SQL Database managed instance and SQL Server and try to align the differences if you identify some.

Note: The list of trace flags and configurations might be very long so we recommend filtering them or lookng only on the trace flags you've changed or know are affecting performance. Some of the trace flags are pre-configured on SQL Database managed instance as part of PaaS configurations and they are not affecting performance.

You might experiment with changing the compatibility level to a higher value, turning on the legacy cardinality estimator, or the automatic tuning feature on the SQL Database managed instance, which might give you better results than your SQL Server database.

Also note that SQL Database managed instance might provide better performance even if you align all parameters because it has the latest improvements, or fixes that are not bound to compatibility level, or some features, like forcing last good plan, that might improve your workload.

Hardware and environment specification


SQL Database managed instance runs on standardized hardware with pre-defined technical characteristics that are probably different than your environment. Some of the characteristics you might need to consider when comparing your environment with the environment where the SQL Database managed instance is running are:

1. Number of cores should be the same both on SQL Server and the SQL Database managed instance. Note that a SQL Database managed instance uses 2.3-2.4 GHz processors, which might be different than your processor speed. It might consume more or less CPU for the same operation due to the CPU differences. If possible, check if hyperthreading is used on the SQL Server environment when comparing to the Gen4 and Gen5 hardware generations on a SQL Database managed instance. One on Gen4 hardware does not use hyperthreading, while on Gen5 it does. If you are comparing SQL Server running on a bare-metal machine with a SQL Database managed instance or SQL Server running on a virtual machine you'll probably get better results on a bare-metal instance.

2. Amount of memory including memory/core ratio (5.1 GB/core on Gen5, 7 GB/core on Gen4). Higher memory/core ratio provides bigger buffer pool cache and increases cache hit ratio. If your workload does not perform well on a managed interface with the memory/core ratio 5, then you probably need to choose a virtual machine with the appropriate memory/core ratio instead of a SQL Database managed instance.

3. IO characteristics – You need to be aware that performance of the storage system might be very different compared to your on-premises environment. A SQL Database managed instance is a cloud database and relies on Azure cloud infrastructure.

◈ The general purpose tier uses remote Azure Premium disks where IO performance depends on the file sizes. If you reach the log limit that depends on the file size, you might notice WRITE_LOG waits and less IOPS in file statistics. This issue might occur on a SQL Database managed instance if the log files are small and not pre-allocated. You might need to increase the size of some files in the general purpose tier to get better performance.

◈ A SQL Database managed instance does not use instant file initialization, so you might see additional PREEMPTIVE_OS_WRITEFILEGATHER wait statistics since the date files are filled with zero bytes during file growth.

4. Local or remote storage types – Make sure you're considering local SSD versus remote storage while doing the comparison. The general purpose tier uses remote storage (Azure Premium Storage) that can't match your on-premises environment if it uses local SSD or a high-performance SAN. In this case you would need to use the business critical tier as a target. The general purpose tier can be compared with other cloud databases like SQL Server on Azure Virtual Machines that also use remote storage (Azure Premium Storage). In addition, beware that remote storage used by a general purpose instance is still different than remote storage used by a SQL Virtual Machine because:

◈ The general purpose tier uses a dedicated IO resource per each database file that depends on the size of the individual files, while SQL Server on Azure Virtual Machine uses shared IO resources for all files where IO characteristics depend on the size of the disk. If you have many small files, you will get better performance on a SQL Virtual Machine, while you can get better performance on a SQL Database managed instance if the usage of files can be parallelized because there are no noisy neighbors who are sharing the same IO resources.

◈ SQL Virtual Machines use a read-caching mechanism that improves read speed.

If your hardware specs and resource allocation are different, you might expect different performance results that can be resolved only by changing the service tier or increasing file size. If you are comparing a SQL Database managed instance with SQL Server on Azure Virtual Machines, make sure that you are choosing a virtual machine series that has memory/cpu ratio similar to SQL Database managed instance, such as DS series.

Azure SQL Database managed instance provides a powerful set of tools that can help you troubleshoot and improve performance of your databases, in addition to built-in intelligence that could automatically resolve potential issues. 

Tuesday 28 May 2019

HB-series Azure Virtual Machines achieve cloud supercomputing milestone

New HPC-targeted cloud virtual machines are first to scale to 10,000 cores


Azure Virtual Machine HB-series are the first on the public cloud to scale a MPI-based high performance computing (HPC) job to 10,000 cores. This level of scaling has long been considered the realm of only the world’s most powerful and exclusive supercomputers, but now is available to anyone using Azure.

HB-series virtual machines (VMs) are  optimized for HPC applications requiring high memory bandwidth. For this class of workload, HB-series VMs are the most performant, scalable, and price-performant ever launched on Azure or elsewhere on the public cloud.

With the AMD EPYC processors, the HB-series delivers more than 260 GBs of memory bandwidth, 128 MB L3 cache, and SR-IOV-based 100 Gbs InfiniBand. At scale, a customer can utilize up to 18,000 physical CPU cores and more than 67 terabytes of memory for a single distributed memory computational workload.

For memory-bandwidth bound workloads, the HB-series delivers something many in HPC thought may never happen. Azure-based VMs are now as or more capable as bare-metal, on-premises status quo that dominates the HPC market, and at a highly competitive price point.

World-class HPC technology


HB-series VMs feature the cloud’s first deployment of AMD EPYC 7000-series CPUs explicitly for HPC customers. AMD EPYC features 33 percent more memory bandwidth than any x86 alternative, and even more than leading POWER and ARM server platforms. In context, this 263 GBs of memory bandwidth the HB-series VM delivers 80 percent more than competing cloud offerings in the same memory per core class.

HB-series VMs expose 60 non-hyperthreaded CPU cores and 240 GB of RAM, with a baseclock of 2.0 GHz, and an all-cores boost speed of 2.55 GHz. HB VMs also feature a 700 GB local NVMe SSD, and support up to four Managed Disks including the new Azure P60/P70/P80 Premium Disks.

A flagship feature of HB-series VMs is 100 GBs InfiniBand from Mellanox. HB-series VMs expose the Mellanox ConnectX-5 dedicated back-end NIC via SR-IOV, meaning customers can use the same OFED driver stack that they’re accustomed to in a bare metal context. HB-series VMs deliver MPI latencies as low as 2.1 microseconds, with consistency, bandwidth, and message rates in line with bare-metal InfiniBand deployments.

Cloud HPC scaling achievement


As part of early acceptance testing, the Azure HPC team benchmarked many widely used HPC applications. One common class of applications are those that simulate computational fluid dynamics (CFD). To see how far HB-series VMs could scale, we selected the Le Mans 100 million cell model available to Star-CCM+ customers, with results as follows:

Azure Virtual Machines, Azure Learning, Azure Certifications, Azure Tutorials and Materials

Azure Virtual Machines, Azure Learning, Azure Certifications, Azure Tutorials and Materials

Azure Virtual Machines, Azure Learning, Azure Certifications, Azure Tutorials and Materials

Azure Virtual Machines, Azure Learning, Azure Certifications, Azure Tutorials and Materials

The Le Mans 100 million cell model scaled to 256 VMs across multiple configurations accounting for as many as 11,520 CPU cores. Our testing revealed that maximum scaling efficiency could be had with two MPI ranks per NUMA domain yielding a top-end scaling efficiency of 71.3 percent efficiency. For top-end performance, three MPI ranks per NUMA domain yielded the fastest overall performance. Customers can choose which metric they find most valuable based on a wide variety of factors.

Delighting HPC customers on Azure


The unique capabilities and cost-performance of HB-series VMs are a big win for scientists and engineers who depend on high-performance computing to drive their research and productivity to new heights. Organizations spanning aerospace, automotive, defense, financial services, heavy equipment, manufacturing, oil & gas, public sector academic, and government research have shared feedback on how the HB-series has increased product performance and provided new insights through detailed simulation models.

Rescale partners with Azure to provide HPC resources for computationally complex simulations and analytics. Launching today, Azure Virtual Machine HB-series VM can be consumed through Rescale’s ScaleX® as the new “Amber” compute resource.

Sunday 26 May 2019

Managing Versions and Revisions using the HTTP API

From a high level perspective, working with a current Revision is identical to the way working with an API has always been. Working with a different Version of an API, is just like working with a different API. Each Version has its own apiId identifier. Working with a non-current revision just requires a little magic. Instead of just using the apiId of the current revision, there is an extra suffix ;rev=n where n is the revision number.

Azure Certifications, Azure Learning, Azure Study Materials, Azure Guides

If you are accessing the API via ARM you will use this base URL:

https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.ApiManagement/service/{serviceName}/

And you will need a bearer token from ARM in your Authorization header. Using ArmClient is the easiest way to get a token for testing.

However, if you are using the classic service Management API, your base URL will be:

https://{serviceName}.management.azure-api.net/

You can get the required Authorization header from the Azure Portal for the API Management instance under Security->Management API.

The advantage of the ARM API is you can give non-admin access to the API, and there is an audit trail of operations. The only extra complication is the API response data is embedded in the standard ARM payload envelope. All of the scenarios described here do work via both APIs, but some are only illustrated for one API.

List of APIs and versions


When making a request for a list of APIs you will see current revisions of all versioned and non-versioned APIs. Non-current revisions do not show in this list.

GET {{baseUrl}}/apis?api-version=2017-01-01 HTTP/1.1

Authorization: {{authValue}}

HTTP/1.1 200 OK

Content-Type: application/json; charset=utf-8

{

  "value": [

        {

      "id": "/subscriptions/6b7f02d9-1f17-43e0-a02c-24e99753d14a/resourceGroups/Api-Default-East-US/providers/Microsoft.ApiManagement/service/conference/apis/5942d3b49ea6ed985e913bc4",

      "type": "Microsoft.ApiManagement/service/apis",

      "name": "5942d3b49ea6ed985e913bc4",

      "properties": {

        "displayName": "Big Conference API-v2",

        "apiRevision": "1",

        "description": "Sample hypermedia API running in an API App on App Service.",

        "path": "bigconf",

        "protocols": [

          "https"

        ],

        "authenticationSettings": null,

        "subscriptionKeyParameterNames": null,

        "isCurrent": true,

        "apiVersion": "v2",

        "apiVersionSet": {

          "id": "/subscriptions/6b7f02d9-1f17-43e0-a02c-24e99753d14a/resourceGroups/Api-Default-East-US/providers/Microsoft.ApiManagement/service/conference/api-version-sets/5e02bd74-4323-4531-b889-85ab4b51563e",

          "name": "Big Conference API",

          "description": null,

          "versioningScheme": "Segment",

          "versionQueryName": null,

          "versionHeaderName": null

        }

      }

    },

  ],

  ...

}

The apiVersion property will be empty for non-versioned APIs and original versions. i.e. The API that existed before it became versioned.

The apiRevision property will identify the revision number of the API and will be defaulted to 1 for all existing APIs. In this list the isCurrent property will always be true and in all cases is a read-only property.

The apiVersionSet object indicates which version set an API belongs to. Api Version sets define the versioning rules for all APIs in the version set.

Accessing a current revision


Interacting with a current revision works just as it has in the past:

GET {{baseUrl}}/apis/{{apiId}}?api-version=2017-03-01 HTTP/1.1

Authorization: {{authValue}}

Accessing a non-current revision


To access a non-current revision it is necessary to modify the apiId slightly by adding a revision suffix,

GET {{baseUrl}}/apis/5942d3b49ea6ed985e913bc4;rev=2?api-version=2017-03-01 HTTP/1.1

Authorization: {{authValue}}

This will enable accessing any non-current revision that is still marked as online. The API definition has a new boolean property isOnline.

Creating a new Revision


A revision can be created just like any other API. The only difference is the revision suffix added to the apiId. The first revision created for any apiId will be considered the current revision.

PUT {{classicBaseUrl}}/apis/myApiId;rev=1?api-version=2017-03-01

Authorization: {{authValue}}

Content-Type: application/json

{

  "name" : "My Api",

  "path" : "api",

  "serviceUrl" : "http://example.org",

  "protocols" : ["https"]

}

The rev parameter is represented as a matrix parameter to highlight the that a revision is just another API definition but that there is a relationship between the set of revisions with the same ApiId. A Revision is not a concept that is distinct from that of an API. It is simply an API with a special identifier.

Using PUT without a rev suffix will create a new API as revision 1, if it does not already exist.

Creating a Revision based on another


The easiest way to create an API Revision is based off an existing API revision. To do this, instead of passing the normal API contract payload, we send a special payload, which in the classic API is indicated with a new media type named application/vnd.ms-azure-apim.revisioninfo+json.

PUT {{classicBaseUrl}}/apis/myapiId;rev=2?api-version=2017-03-01

Authorization: {{authValue}}

Content-Type: application/vnd.ms-azure-apim.revisioninfo+json

{

      "sourceApiId":"/apis/myapiId",

      "apiRevisionDescription":"My new revision"

}

Note that the target URL points to the new revision to be created and the sourceApiId is the path of the source Api Revision following the base path. In this case we are creating a revision of the current API.

In the Azure portal, the user interface ensures that the revisions are a linear sequence. i.e. adding a new revision creates a revision n+1 based on revision n. However, that is not constrained via the API. Be wary of straying from the linear path or you may find yourself losing updates.

Due to the way this operation works it likely will not be supported on Azure Deployment Templates. We are considering how best to support revisions in deployment templates. Stay tuned.

However, the response does conform to an ARM payload.

{

  "id": "/subscriptions/6b7f02d9-1f17-43e0-a02c-24e99753d14a/resourceGroups/Api-Default-East-US/providers/Microsoft.ApiManagement/service/conference/apis/55bae80192ff5c0314040001;rev=2",

  "type": "Microsoft.ApiManagement/service/apis",

  "name": "55bae80192ff5c0314040001;rev=2",

  "properties": {

    "displayName": "Echo API",

    "apiRevision": "2",

    "description": null,

    "serviceUrl": "http://echoapi.cloudapp.net/api",

    "path": "echo",

    "protocols": [

      "https"

    ],

    "authenticationSettings": {

      "oAuth2": null,

      "openid": null

    },

    "subscriptionKeyParameterNames": {

      "header": "Ocp-Apim-Subscription-Key",

      "query": "subscription-key"

    },

    "apiRevisionDescription": "My new revision"

  }

}

Creating Version from a revision


When changes need to be made to an API that a customer needs to opt into, a new Version of the API should be created.

New API versions have a different apiId but are associated to ApiVersionSet that is common to all versions of the API and contains the metadata to define the versioning scheme. Just as Revisions are specially identified API resources, so are Versions.

Creating a new API Version from an existing API Revision is very similar to creating a revision based on another revision. The major difference is the inclusion of the apiVersionSet object. If the apiVersionSet includes a valid id property, the API will added to an existing apiVersionSet. Otherwise a new apiVersionSet is created.

PUT {{classicBaseUrl}}/apis/{newApiId}

Content-Type: application/vnd.ms-azure-apim.revisioninfo+json

{

    "sourceApiId" : "/apis/{existingApiId[rev]}",

    "apiVersionName" : "v2",

    "apiVersionDescription" : "Description",

    "apiVersionSet" : {

        "versioningScheme" : "Segment"

    }

}

The apiVersionName is a required property and the versioningScheme must be one of Segment, Header or Query. If Header or Query versioning is selected then a headerParameterName or queryParameterName must be provided.

If Segment versioning scheme is used, then when calling your API exposed by Azure API Management the apiVersionName must appear in the segment following the Api suffix and prior to the Operation UrlTemplate.

Create a Versioned API from scratch


It is possible to create a new API definition that pre-defines what the versioning strategy is going to be. All that is required is to include a reference to an existing apiVersionSet in the API definition representation.

PUT {{classicBaseUrl}}/apis/{{apiId}}?api-version=2017-03-01

Authorization: {{authValue}}

Content-Type: application/json

{

  "name" : "My Api",

  "path" : "api",

  "serviceUrl" : "http://example.org",

  "protocols" : ["https"]

  "apiVersion" : "v1",

  "apiVersionDescription" : "Initial Version",

  "apiVersionSetId" : "/api-version-sets/myapiversionset"

}

List of Revisions


To get a view of the set of revisions for a particular apiId you can make a request as follows.

GET https://{{baseUrl}}/apis/{{apiId}}/revisions?api-version=2017-03-01 HTTP/1.1

Authorization: {{authValue}}

HTTP/1.1 200 OK

Content-Type: application/json; charset=utf-8

{

  "value": [

    {

      "apiId": "/apis/echo-api;rev=2",

      "apiRevision": "2",

      "createdDateTime": "2017-05-30T18:32:06.463",

      "updatedDateTime": "2017-05-30T18:32:06.463",

      "description": "its a test",

      "privateUrl": "/api;rev=2/",

      "isOnline": true,

      "isCurrent": false

    },

    {

      "apiId": "/apis/echo-api;rev=1",

      "apiRevision": "1",

      "createdDateTime": "2017-05-30T00:21:33.037",

      "updatedDateTime": "2017-05-30T00:21:33.037",

      "description": null,

      "privateUrl": null,

      "isOnline": true,

      "isCurrent": true

    }

  ],

  "count": 2,

  "nextLink": null

}

The returned representation is simply a read-only summary view of information about the revisions.

Delete all the Revisions


Despite having said that the .../revisions resource is read-only it can be used to quickly delete all the revisions associated to an apiId.

DELETE {{baseUrl}}/apis/{{apiId}}/revisions?api-version=2017-03-01 HTTP/1.1

If-match: *

Authorization: {{authValue}}

Create a Release of a Revision


At any one time, there is only one revision for each API that is marked as the current revision. The Api representation contains an isCurrent flag to indicate if the revision is current. This is a read-only attribute. In order to change which Revision is current, you must create a release resource. The request payload should identify the API revision that is to become current.

PUT {{classicBaseUrl}}/apis/{{apiId}}/releases/{{releaseId}}?api-version=2017-03-01 HTTP/1.1

Authorization: {{authValue}}

Content-Type: application/json

{

  "apiId" : "/apis/echo-api;rev=2",

  "notes" : "Let's release it"

}

The notes included here will be displayed in the release notes on the developer portal. It is an optional property. To rollback a release revision, simply create a new release that targets the previous revision and it will become current once again.

Get Release notes for an API


You can retrieve the set of release notes for an API with the following request:

GET {{baseUrl}}/apis/{{apiId}}/releases?api-version=2017-03-01 HTTP/1.1

Authorization: {{authValue}}

Saturday 25 May 2019

Optimize price-performance with compute auto-scaling in Azure SQL Database serverless

Optimizing compute resource allocation to achieve performance goals while controlling costs can be a challenging balance to strike especially for database workloads with complex usage patterns. To help address these challenges, we are pleased to announce the preview of Azure SQL Database serverless. SQL Database serverless (preview) is a new compute tier that optimizes price-performance and simplifies performance management for databases with intermittent and unpredictable usage. Line-of-business applications, dev/test databases, content management and e-commerce systems are just some examples across a range of applications that often fit the usage pattern ideal for SQL Database serverless. SQL Database serverless is also well-suited for new applications with compute sizing uncertainty or workloads requiring frequent rescaling in order to reduce costs. The serverless compute tier enjoys all the fully managed, built-in intelligence benefits of SQL Database and helps accelerate application development, minimize operational complexity, and lower total costs.

Compute auto-scaling


SQL Database serverless automatically scales compute for single databases based on workload demand and bills for compute used per second. Serverless contrasts with the provisioned compute tier in SQL Database which allocates a fixed amount of compute resources for a fixed price and is billed per hour. Over short time scales, provisioned compute databases must either over-provision resources at a cost in order to accommodate peak usage or under-provision and risk poor performance. Over longer time scales, provisioned compute databases can be rescaled, but this solution may require predicting usage patterns or writing custom logic to trigger rescaling operations based on a schedule or performance metrics. This adds to development and operational complexity. In serverless, compute scaling within configurable limits is managed by the service to continuously right-size resources. Serverless also provides an option to automatically pause the database during inactive usage periods and automatically resume when activity returns.

Pay only for compute used


In SQL Database serverless, compute is only billed based on the amount of CPU and memory used per second.  While the database is paused only storage is billed, providing additional price optimization benefit.

For example, consider a line-of-business application or a dev/test database that is idle at night, but needs multi-core bursting headroom throughout the day. Suppose the application is using a serverless database configured to allow auto-pausing and auto-scaling up to 4 vcores and has the following usage pattern over a 24 hour period:

Azure Certifications, Azure Learning, Azure Guides, Azure Study Materials

As can be seen, database usage corresponds to the amount of compute billed which is measured in units of vcore seconds and sums to around 46k vcore seconds over the 24 hour period. Suppose the compute unit price for the serverless database is around $0.000073/vcore/second. Then the compute bill for this one day period is just under $3.40. This is calculated by multiplying the compute unit price by the total number of vcore seconds accumulated. During this time period, the database was auto-paused while idle and enjoyed the benefit of bursting episodes up to 80 percent of 4 vcores without customer intervention. In this example, the price savings using serverless is significant compared to a provisioned compute database configured with the same 4 vcore limit.

Price-performance trade-offs


When using SQL Database serverless there are price-performance trade-offs to consider. These trade-offs are related to the compute unit price and the impact on application performance due to compute warm-up after periods of low or idle usage.

Compute unit price

The compute unit price is higher for a serverless database than for a provisioned compute database since serverless is optimized for workloads with intermittent usage patterns. If CPU or memory usage is high enough and sustained for long enough, then the provisioned compute tier may be less expensive.

Compute warm-up after low usage

While a serverless database is online, memory is gradually reclaimed if CPU or memory usage is low enough for long enough. When workload activity returns, disk IO may be required to rehydrate data pages into the SQL buffer pool or query plans may need to be recompiled. This memory management policy to reclaim cache based on low usage is unique to SQL Database serverless and done to control customer costs, but can impact performance. Memory reclamation based on low usage does not occur in the provisioned compute tier for single databases or elastic pools where this kind of impact can be avoided.

Compute warm-up after pausing

The latency to pause and resume a serverless database is usually around one minute or less during which time the database is offline. After the database is resumed, memory caches need to be rehydrated which adds additional latency before optimal performance conditions return. The idle period that must elapse before auto-pausing occurs can be configured to compensate for this performance impact. Alternatively, auto-pausing can be disabled for workloads sensitive to this impact and still benefit from auto-scaling. Compute minimums are billed while the database is online regardless of usage, and so disabling auto-pausing can increase costs.

Thursday 23 May 2019

Visual interface for Azure Machine Learning service

Artificial Intelligence, Azure Machine Learning Service, Azure Study Materials, Azure Guides

This new drag-and-drop workflow capability in Azure Machine Learning service simplifies the process of building, testing, and deploying machine learning models for customers who prefer a visual experience to a coding experience. This capability brings the familiarity of what we already provide in our popular Azure Machine Learning Studio with significant improvements to ease the user experience.

Visual interface


The Azure Machine Learning visual interface is designed for simplicity and productivity. The drag-and-drop experience is tailored for:

◈ Data scientists who are more familiar with visual tools than coding.
◈ Users who are new to machine learning and want to learn it in an intuitive way.
◈ Machine learning experts who are interested in rapid prototyping.

It offers a rich set of modules covering data preparation, feature engineering, training algorithms, and model evaluation. Another great aspect of this new capability is that it is completely web-based with no software installation required. All of this to say, users of all experience levels can now view and work on their data in a more consumable and easy-to-use manner.

Artificial Intelligence, Azure Machine Learning Service, Azure Study Materials, Azure Guides

Scalable Training


One of the biggest challenges data scientists previously faced when training data sets was the cumbersome limitations to scaling.  If you were to start by training on a smaller model and then had a need to expand it due to an influx of data, or complex algorithms you would be required to migrate your entire data set to continue your training.  With the new visual interface for Azure Machine Learning we’ve replaced the back end to reduce these limitations.

An experiment authored in the drag-and-drop experience can run on any Azure Machine Learning Compute cluster. As your training scales up on larger data sets or complex models, the Azure Machine Learning compute can autoscale from single node to multi node each time an experiment is submitted to run. With autoscaling you can now start with small models and not worry about expanding your production work with bigger data. By removing scaling limitations, data scientists now can focus on their training work.

Easy deployment


Deploying a trained model to a production environment previously required knowledge of coding, model management, container service, and web service testing. We wanted to provide an easier solution to this challenge so that these skills are no longer necessary. With the new visual interface, customers of all experience levels can deploy a trained model with just a few clicks. We will discuss how to launch this interface later in this blog.

Once a model is deployed, you can test the web service immediately from this new user visual interface. Now you can test to make sure your models are correctly deployed. All web service inputs are now pre-populated for convenience. The web service API and sample code are also automatically generated. These procedures normally used to take hours to perform, but now with the new visual interface it can all happen within just a few clicks.

Artificial Intelligence, Azure Machine Learning Service, Azure Study Materials, Azure Guides

Full integration of Azure Machine Learning service


As the newest capability of Azure Machine Learning service, the visual interface brings the best of Azure Machine Learning service and Machine Learning Studio together. The assets created in this new visual interface experience can be used and managed in the Azure Machine Learning service workspace. These include experiments, compute, models, images, and deployments. It also natively inherits the capabilities like run history, versioning, and security of Azure Machine Learning service.

How to use


See for yourself just how easy it is to use this interface with just a few clicks. To access this new capability, open your Azure Machine Learning workspace in the Azure portal. In your workspace, select visual interface (preview) to launch the visual interface.

Artificial Intelligence, Azure Machine Learning Service, Azure Study Materials, Azure Guides

Tuesday 21 May 2019

Microsoft 365 boosts usage analytics with Azure Cosmos DB – Part 2

Finding the right partition key—a critical design decision


After moving to Azure Cosmos DB, the team revisited how data would be partitioned (referred to as “sharding” in MongoDB). With Azure Cosmos DB, each collection must have a partition key, which acts as a logical partition for the data and provides Azure Cosmos DB with a natural boundary for distributing data across partitions. The data for a single logical partition must reside inside a single physical partition. Physical partition management is managed internally by Azure Cosmos DB.

The Microsoft 365 usage analytics team worked closely with the Azure Cosmos DB team to optimize data distribution in a way that would ensure high performance. The team initially tried the same approach as they used with MongoDB, which was using a random GUID as the partition key. However, this required scanning all of the partitions for reads and over allocating resources for writes, making writes fast but reads slow. The team then tried using Tenant ID as the partition key but found that the vast difference in the amount of report data for each tenant made some partitions too hot, which would have required throttling, while others remained cold.

The solution lay in creating a synthetic partition key. In the end, the team solved both the slow read and too hot and too cold issues by grouping 100 documents per tenant ID into a bucket and then using a combination of tenant IDs and bucket IDs as the partition key. The bucket ID loops from 1 to n, where n is a variable and can be adjusted for each report.

Handling four terabytes of new data every day


In one region alone, more than 6 TB of data is stored in Azure Cosmos DB, with 4 TB of that written and refreshed daily. Both of those numbers are continuing to grow. The database consists of more than 50 different collections, and the largest is more than 300 GB in size. It consumes an average of 150,000 request units per second (RU/s) of throughput, scaling this number up and down as needed.

The different collections map closely to the different reports that the system serves, which in turn have different throughput requirements. This design enables the Microsoft 365 usage analytics team to optimize the number of RU/s that are allocated to each collection (and thus to each report), and to elastically scale that throughput up or down on a per-collection and per-report basis.

Built-in, cost-effective scalability and performance


With Azure Cosmos DB, the Microsoft 365 usage analytics team is delivering real-time customer insights with less maintenance, better performance, and improved availability—all at a lower cost. The new usage analytics system can now easily scale to handle future growth in the number of Office 365 commercial customers. All that was accomplished in less than five months, without any service interruptions. “The benefits of moving from MongoDB to Azure Cosmos DB more than justify the effort that it took,” says Guo Chen, Principal Software Development Manager on the Microsoft 365 usage analytics team.

Improved performance and service availability

The team’s use of built-in, turnkey geo-distribution provided a way to easily distribute reads and writes across two regions. Combined with the other work done by the team, such as rewriting the data access layer using the Azure Cosmos DB Core (SQL) API, this enabled the team to reduce the time for the majority of reads from 12 milliseconds to 3 milliseconds. The image below illustrates this performance improvement.

Azure Certifications, Azure Study Materials, Azure Tutorials and Material, Azure Guides

Although this difference may seem negligible in the context of viewing a report, it resulted in significant service improvements. “There are two ways to access reporting data in the usage analytics system: through the Microsoft 365 admin center, and through Microsoft Graph,” explains Xiaodong Wang, a Software Engineer on the Microsoft 365 usage analytics team. “In the past, people complained that the Graph API was too slow. That’s no longer an issue. In addition, service availability is better now because the chances of any query timing-out are reduced.”

The image below shows just how much service availability is improved. The graph illustrates successful API requests divided by the total API requests and shows that the system is now delivering a service availability level of greater than 99.99 percent.

Azure Certifications, Azure Study Materials, Azure Tutorials and Material, Azure Guides

Zero maintenance and administration

Because Azure Cosmos DB is a fully managed service, the Office 365 development team no longer needs to devote one full-time person to database maintenance and administration. Annual certificate maintenance is no longer a burden, and VMs no longer need to be restarted weekly to protect against any compromises in service availability.

“In the past, with MongoDB, we had to allocate core developer resources to administrative management of the data store,” says Shilpi Sinha, Principal Program Manager on the Microsoft 365 usage analytics team. “Now that we are running on a fully managed service, we are able to repurpose developer resources towards adding new customer value instead of managing the infrastructure.”

Elastic scalability

The Microsoft 365 usage analytics team can now scale database throughput up or down on demand, as needed to accommodate a fluctuating workload that on average, is growing at a rate of 8 percent every three months. By simply adjusting the number of RU/s allocated to each collection, which can be done in the Azure portal or programmatically, the team can easily scale up during heavy data-ingestion periods to handle new reports, and most importantly, to accommodate continued overall growth of Office 365 around the world.

“Today, all we need to do is keep an eye on request unit usage versus what we have budgeted,” says Wang. “If we’re reaching capacity, we can allocate more RU/s in just a few minutes. We don’t have to pay for spare capacity until we need it and more importantly, we no longer need to worry whether we can handle future growth in data volumes or report usage.”

Lower costs

On top of all of those benefits, the Microsoft 365 usage analytics team increased data and reporting volumes while reducing its monthly Microsoft Azure bill for the usage analytics system by more than 13 percent. “After we cut over to Azure Cosmos DB, our monthly Azure expenses decreased by almost 20 percent,” says Chen. “We undertook this project to better serve our customers. Being able to save close to a quarter-million dollars per year—and likely more in the future—is like icing on the cake.”

“Usage analytics are offered as part of the base capability to all Microsoft 365 customers, irrespective of the type of subscription they purchase," said Sinha. "Keeping the costs of operating this service as low as possible contributes to our goal of running the overall Microsoft 365 service as efficiently as possible while at the same time giving our customers new and improved insights into how their people are using our services.”

Saturday 18 May 2019

Microsoft 365 boosts usage analytics with Azure Cosmos DB

The challenge: Understanding the behavior of more than 150 million active users


Office 365 is a flagship service within the Microsoft 365 Enterprise solution, with millions of commercial customers and more than 150 million active commercial users each month. Office 365 provides extensive reporting for administrators within each company on how the service is being used including license assignment, product-level usage, user-level activity, site activity, group activity, storage consumption, and more. The Microsoft 365 usage analytics team incrementally adds new reports to cover more Office 365 services.

Previous architecture

The telemetry data needed to generate such reports was collected in a system called usage analytics, that until recently ran on the community version of MongoDB. The image below shows the data flow, with an importer web service used to write log streams collected in Azure Blob storage to MongoDB. An OData web service exposes APIs to extract the stored data for both reporting within the Microsoft 365 admin center and for access through Microsoft Graph. Every day, as part of a full daily refresh, several billion rows of data were added to the system.

Azure Cosmos DB, Azure Certifications, Azure Study Materials, Azure Guides

Each of the primary geographies served by Office 365 has an independent usage analytics repository, all employing a similar architecture. In each geography, data was stored on two MongoDB clusters, with each cluster consisting of up to 50 virtual machines (VMs) hosted in Azure Virtual Machines and running MongoDB. The two clusters in each geography functioned in a primary/backup configuration. Data was written separately to both clusters and under normal operation, all reads were performed on the primary cluster.

Each cluster was designed for a write-heavy workload. To speed writes, sharding of data across individual cluster nodes was done using a random globally unique identifier (GUID) such as a MongoDB shard key. Every day for a few hours, new data from Azure Blob storage was written using a multithreaded importer. Each thread wrote batches of 2,000 records at a time to all cluster nodes and waited for all records to finish before starting on the next batch of 2,000.

Problems and pains


This architecture presented several problems for the Microsoft 365 usage analytics team, ranging from excessive administrative effort and costs to limited performance, reliability, availability, and scalability. Some specific pains included:

◈Poor performance. Reads were inefficient and reports sometimes timed out because of the use of a random GUID as a shard key required querying all nodes. In addition, during the few hours each day when new data was imported, with writes and reads hitting the primary cluster node during the same time, performance was poor. To make matters worse, if anything failed during a batch write, which often happened due to internal database errors, all 2,000 records had to be written again.

Full-time administration. Maintenance of the MongoDB clusters was manual and time-consuming, requiring human resources to dedicate time towards managing the clusters. This put an unnecessary resource constraint on the team, which would rather use its bandwidth to bring new reports to market. Plus, bugs in MongoDB 3.2 required all servers to be restarted weekly. And renewing the security certificates on each cluster node within the virtual network had to be completed annually, and required an additional two weeks of effort per cluster. During such routine administrative tasks, if an operation failed on one cluster node, the entire cluster was down until the issue was resolved.

High costs. Significant costs were incurred to run the MongoDB backup clusters, which remained idle most of the time. Those costs continued to increase as Office 365 usage grew.

Limited scalability. Less than three years after MongoDB was initially deployed, the largest repository was almost at maximum capacity. Any spare capacity was forecast to run out within six months as more products and reports were added, with no easy way to scale.

While the team was dealing with the architectural limitations of its existing solution, they were looking ahead to a lineup of new, high-scale capabilities that they wanted to enable for customers in the usage analytics space. The team started looking for a new, cost-effective, and low-maintenance solution that would let them move from self-maintained VMs running MongoDB to a fully managed database service.

Geo-distribution on Azure Cosmos DB: The key to an improved architecture


After exploring their options, the team decided to replace MongoDB with Azure Cosmos DB, a fully managed globally-distributed, multi-model database service designed for global distribution and virtually unlimited elastic scalability. The first step was to deploy the needed infrastructure.

In contrast to the primary/backup, two-cluster configuration that it had used with MongoDB, the team took advantage of turnkey global distribution of active data in Azure Cosmos DB. Using multiple Azure regions for data replication provided an easy way to write to any region, read from any region, and better balance the workload across the database instances—all while relying on Azure Cosmos DB to transparently handle active data replication and data consistency.

“True geo-replication had been deemed too hard to do with MongoDB, which is why the previous architecture separately wrote data to both the primary and backup clusters,” says Xiaodong Wang, a Software Engineer on the Microsoft 365 usage analytics team. “With Azure Cosmos DB, implementing transparent geo-distribution literally took minutes—just a few mouse clicks.”

The image below shows the internal architecture of the usage analytics system today. Each of the primary geographies served by Office 365 is served by Cosmos databases geo-replicated across two Azure regions within that geography. Under normal operating conditions, writes are sent to one region within each geography while reads are routed to both. If for some reason a region is prevented from serving reads, those reads are automatically routed to the other region serving that same geography.

Azure Cosmos DB, Azure Certifications, Azure Study Materials, Azure Guides

Migrating a production workload to Azure Cosmos DB


Developers began writing a new data access layer on the new infrastructure to accommodate reads and writes, using the Azure Cosmos DB SQL (Core) API. After bringing the new system online, the team began to write new production data to both old and new systems, while continuing to serve production reports from the old one.

Developers began to address the reports that they would need to duplicate for the new solution, working through them one at a time. Separate Cosmos containers were created within the database for most reports, so that each collection would be separately scalable after the system came online. The largest reports were addressed first to ensure that Azure Cosmos DB could handle them, and after each new report was verified, the team began serving it from the new environment.

After all functionality and reports were being served by Azure Cosmos DB, and everything was running as it should, the team stopped writing new data to the old system and decommissioned the MongoDB environment. The development team was able to move to Azure Cosmos DB, rewrite the data access layer, and migrate all reports for all geographies without any service interruptions to end users.

Thursday 16 May 2019

Azure Firewall and network virtual appliances

Network security solutions can be delivered as appliances on premises, as network virtual appliances (NVAs) that run in the cloud or as a cloud native offering (known as firewall-as-a-service).

Azure Firewall, Azure Certifications, Azure Learning, Azure Tutorials and Materials

Customers often ask us how Azure Firewall is different from Network Virtual Appliances, whether it can coexist with these solutions, where it excels, what’s missing, and the TCO benefits expected. We answer these questions in this blog post.

Network virtual appliances (NVAs)


Third party networking offerings play a critical role in Azure, allowing you to use brands and solutions you already know, trust and have skills to manage. Most third-party networking offerings are delivered as NVAs today and provide a diverse set of capabilities such as firewalls, WAN optimizers, application delivery controllers, routers, load balancers, proxies, and more. These third party capabilities enable many hybrid solutions and are generally available through the Azure Marketplace.

Cloud native network security


A cloud native network security service (known as firewall-as-a-service) is highly available by design. It auto scales with usage, and you pay as you use it. Support is included at some level, and it has a published and committed SLA. It fits into DevOps model for deployment and uses cloud native monitoring tools.

What is Azure Firewall?


Azure Firewall is a cloud native network security service. It offers fully stateful network and application level traffic filtering for VNet resources, with built-in high availability and cloud scalability delivered as a service. You can protect your VNets by filtering outbound, inbound, spoke-to-spoke, VPN, and ExpressRoute traffic. Connectivity policy enforcement is supported across multiple VNets and Azure subscriptions. You can use Azure Monitor to centrally log all events. You can archive the logs to a storage account, stream events to your Event Hub, or send them to Log Analytics or your security information and event management (SIEM) product of your choice.

Is Azure Firewall a good fit for your organization security architecture?


Organizations have diverse security needs. In certain cases, even the same organization may have different security requirements for different environments. As mentioned above, third party offerings play a critical role in Azure. Today, most next-generation firewalls are offered as Network Virtual Appliances (NVA) and they provide a richer next-generation firewall feature set which is a must-have for specific environments/organizations.  In the future, we intend to enable chaining scenarios to allow you to use Azure Firewall for specific traffic types, with an option to send all or some traffic to a third party offering for further inspection. This third-party offering can be either a NVA or a cloud native solution.

Many Azure customers find the Azure Firewall feature set is a good fit and it provides some key advantages as a cloud native managed service:

◈ DevOps integration – easily deployed using Azure Portal, Templates, PowerShell, CLI, or REST.
Built in HA with cloud scale.
◈ Zero maintenance service model - no updates or upgrades.
◈ Azure specialization— for example, service tags, and FQDN tags.
◈ Significant total cost of ownership saving for most customers.

But for some customers third party solutions are a better fit.

The following table provides a high-level feature comparison for Azure Firewall vs. NVAs:

Azure Firewall, Azure Certifications, Azure Learning, Azure Tutorials and Materials

Figure 1: Azure Firewall versus Network Virtual Appliances – Feature comparison

Why Azure Firewall is cost effective


Azure Firewall pricing includes a fixed hourly cost ($1.25/firewall/hour) and a variable per GB processed cost to support auto scaling. Based on our observation, most customers save 30 percent – 50 percent in comparison to an NVA deployment model. We are announcing a price reduction, effective May 1, 2019, for the firewall per GB cost to $0.016/GB (-46.6 percent) to ensure that high throughput customers maintain cost effectiveness. There is no change to the fixed hourly cost.

The following table provides a conceptual TCO view for a NVA with full HA (active/active) deployment:

Cost Azure Firewall  NVAs 
Compute $1.25/firewall/hour
$0.016/GB processed
(30%-50% cost saving)
Two plus VMs to meet peek requirements
Licensing  Per NVA vendor billing model 
Standard Public Load Balancer  First five rules: $0.025/hour
Additional rules: $0.01/rule/hour
$0.005 per GB processed 
Standard Internal Load Balancer First five rules: $0.025/hour
Additional rules: $0.01/rule/hour
$0.005 per GB processed 
Ongoing/Maintenance  Included Customer responsibility
Support  Included in your Azure Support plan  Per NVA vendor billing model

Figure 2: Azure Firewall versus Network Virtual Appliances – Cost comparison

Tuesday 14 May 2019

Azure SQL Database Edge: Enabling intelligent data at the edge

The world of data changes at a rapid pace, with more and more data being projected to be stored and processed at the edge. Microsoft has enabled enterprises with the capability of adopting a common programming surface area in their data centers with Microsoft SQL Server and in the cloud with Azure SQL Database. We note that latency, data governance and network connectivity continue to gravitate data compute needs towards the edge. New sensors and chip innovation with analytical capabilities at a lower cost enable more edge compute scenarios to drive higher agility for business.

At Microsoft Build 2019, we announced Azure SQL Database Edge, available in preview, to help address the requirements of data and analytics at the edge using the performant, highly available and secure SQL engine. Developers will now be able to adopt a consistent programming surface area to develop on a SQL database and run the same code on-premises, in the cloud, or at the edge.

Azure SQL Database Edge offers:

◉ Small footprint allows the database engine to run on ARM and x64 devices via the use of containers on interactive devices, edge gateways, and edge servers.

◉ Develop once and deploy anywhere scenarios through a common programming surface area across Azure SQL Database, SQL Server, and Azure SQL Database Edge

◉ Combines data streaming and time-series, with in-database machine learning to enable low latency analytics

◉ Industry leading security capabilities of Azure SQL Database to protect data-at-rest and in- motion on edge devices and edge gateways, and allows management from a central management portal from Azure IoT.

◉ Cloud connected, and fully disconnected edge scenarios with local compute and storage.

◉ Supports existing business intelligence (BI) tools for creating powerful visualizations with Power BI and third-party BI tools.

◉ Bi-directional data movement between the edge to on-premises or the cloud.

◉ Compatible with popular T-SQL language, developers can implement complex analytics using R, Python, Java, and Spark, delivering instant analytics without data movement, and real-time faster insights

Azure SQL Database, Azure Certifications, Azure Learning, Azure Guides, Azure Study Materials

◈ Provides support for processing and storing graph, JSON, and time series data in the database, coupled with the ability to apply our analytics and in-database machine learning capabilities on non-relational datatypes.

For example, manufacturers that employ the use of robotics or automated work processes can achieve optimal efficiencies by using Azure SQL Database Edge for analytics and machine learning at the edge. These real-world environments can leverage in-database machine learning for immediate scoring, initiating corrective actions, and detecting anomalies.

Key benefits:

◈ A consistent programming surface area as Azure SQL Database and SQL Server, the SQL engine at the edge allows engineers to build once for on-premises, in the cloud, or at the edge.
◈ The streaming capability enables instant analysis of the incoming data for intelligent insights.
◈ In-Database AI capabilities enables scenarios like anomaly detection, predictive maintenance and other analytical scenarios without having to move data.

Azure SQL Database, Azure Certifications, Azure Learning, Azure Guides, Azure Study Materials

Train in the cloud and score at the edge


Supporting a consistent Programming Surface Area across on-premises, in the cloud, or at the edge, developers can use identical methods for securing data-in-motion and at-rest while enabling high availability and disaster recovery architectures equal to those used in Azure SQL Database and SQL Server. Giving seamless transition of the application from the various locations means a cloud data warehouse could train an algorithm and push the machine learning model to Azure SQL Database Edge and allow it to run scoring locally, giving real-time scoring using a single codebase.

Intelligent store and forward


The engine provides proficiencies to take streaming datasets and replicate them directly to the cloud, coupled with enabling an intelligent store-and-forward pattern. In duality, the edge can leverage its analytical capabilities while processing streaming data or applying machine learning using in-database machine learning. Fundamentally, the engine can process data locally and upload using native replication to a central datacenter or cloud for aggregated analysis across multiple different edge hubs.

Azure SQL Database, Azure Certifications, Azure Learning, Azure Guides, Azure Study Materials