Showing posts with label Data Warehouse. Show all posts
Showing posts with label Data Warehouse. Show all posts

Saturday, 20 August 2022

Azure Data Explorer: Log and telemetry analytics benchmark

Azure Data Explorer, Azure Exam Prep, Azure Tutorial and Materials, Azure Career, Azure Skills, Azure Jobs, Azure Preparation Exam

Azure Data Explorer (ADX), a component of Azure Synapse Analytics, is a highly scalable analytics service optimized for structured, semi-structured, and unstructured data. It provides users with an interactive query experience that unlocks insights from the ocean of ever-growing log and telemetry data. It is the perfect service to analyze high volumes of fresh and historical data in the cloud by using SQL or the Kusto Query Language (KQL), a powerful and user-friendly query language.

Azure Data Explorer is a key enabler for Microsoft’s own digital transformation. Virtually all Microsoft products and services use ADX in one way or another; this includes troubleshooting, diagnosis, monitoring, machine learning, and as a data platform for Azure services such as Azure Monitor, PlayFab, Sentinel, Microsoft 365 Defender, and many others. Microsoft’s customers and partners are using ADX for a large variety of scenarios from fleet management, manufacturing, security analytics solutions, package tracking and logistics, IoT device monitoring, financial transaction monitoring, and many other scenarios. Over the last years, the service has seen phenomenal growth and is now running on millions of Azure virtual machine cores.

Last year, the third generation of the Kusto engine (EngineV3) was released and is currently offered as a transparent, in-place upgrade to all users not already using the latest version. The new engine features a completely new implementation of the storage, cache, and query execution layers. As a result, performance has doubled or more in many mission-critical workloads.

Superior performance and cost-efficiency with Azure Data Explorer

To better help our users assess the performance of the new engine and cost advantages of ADX, we looked for an existing telemetry and logs benchmark that has the workload characteristics common to what we see with our users:

1. Telemetry tables that contain structured, semi-structured, and unstructured data types.

2. Records in the hundreds of billions to test massive scale.

3. Queries that represent common diagnostic and monitoring scenarios.

As we did not find an existing benchmark to meet these needs, we collaborated with and sponsored GigaOm to create and run one. The new logs and telemetry benchmark is publicly available in this GitHub repo. This repository includes a data generator to generate datasets of 1GB, 1TB, and 100TB, as well as a set of 19 queries and a test driver to execute the benchmark.

The results, now available in the GigaOm report, show that Azure Data Explorer provides superior performance at a significantly lower cost in both single and high-concurrency scenarios. For example, the following chart taken from the report displays the results of executing the benchmark while simulating 50 concurrent users: 

Azure Data Explorer, Azure Exam Prep, Azure Tutorial and Materials, Azure Career, Azure Skills, Azure Jobs, Azure Preparation Exam

Source: microsoft.com

Tuesday, 5 January 2021

4 common analytics scenarios to build business agility

Azure Synapse Analytics is a limitless analytics service that is designed to bring the two worlds of big data and data warehousing into a unified, enterprise-grade, powerful platform. In this blog post, we look at four real-world use cases where global organizations have used Azure Synapse Analytics to innovate and drive business value through data. For a more detailed and in-depth coverage of how data analytics can help your business, see our e-book Analytics Lessons Learned: How Four Companies Drove Business Agility with Analytics and sign up for Azure to start exploring your data with Azure Synapse.

Why Azure Synapse?

Azure Synapse provides a complete, out-of-the-box solution designed to accelerate time-to-insight and empower business agility. Azure Synapse is the only end-to-end platform that unifies data ingestion, big data analytics, and data warehousing. It offers turnkey setup and configuration options on fully managed infrastructure to help you get results fast. It offers greater control and flexibility in terms of pricing by enabling you to choose the best pricing option for each workload with both serverless and dedicated options.

Use case one: Just-in-time inventory

Aggreko is a global leader in the supply of temporary power generation, temperature control systems, and energy services, providing backup energy and power supply whenever and wherever their customers need it. Aggreko uses Azure Synapse to increase operational efficiency with the just-in-time supply of their specialist equipment.

Aggreko’s data ingestion pipeline was set up to run every eight hours because it took four hours to run the ingestion (batch) jobs. Moreover, the data warehouse had to be rebuilt every day due to storage limitations. This meant that there was a lag of 8-24 hours between when the data arrived and when it was available for data analytics pipelines:

Azure Exam Prep, Azure Tutorial and Material, Azure Certification, Azure Prep, Microsoft Career

By adopting Azure Synapse, Aggreko was able to significantly improve its time-to-insight by reducing ingestion complexities and improving speed. Ingestion time was reduced from four hours to less than five minutes. This in turn meant that for Aggreko, data is now available for analytics pipelines in near real-time (less than five minutes’ lag). The team also estimated that they have saved 30-40 percent of their time—this time was spent solving technology problems in their legacy systems. By adopting Azure Synapse, data is now available for instant exploration, which means that the Aggreko team has more time to focus on solving business problems.

"Azure Synapse gives us a single environment to explore and query the data without moving it. So at a spectrum of the volume of data, we can achieve exponentially faster insights, by querying directly over the lake before outputting insight to Power BI." —Elizabeth Hollinger, Director of Data Insights at Aggreko

As mentioned before, this use case is based on a real-world scenario where Aggreko adopted Azure Synapse as their analytics platform.

Use case two: Fraud detection


Clearsale, a leading fraud detection company based in Brazil, used Azure Synapse to modernize their operational analytics data platform. Clearsale helps customers verify an average of half a million transactions daily using big data analytics to detect fraud across the world. Clearsale’s dataset doubles in size every two years, and the company needs to provide fraud detection services within seconds. This requires a great level of scalability and performance:

Azure Exam Prep, Azure Tutorial and Material, Azure Certification, Azure Prep, Microsoft Career

Using Azure Synapse, Clearsale has significantly reduced the time it takes to train new models to improve their fraud detection capability. Using their previous on-premises platform, it used to take close to a week to ingest, prepare, and train the machine learning models. Using Azure Synapse, this has now been slashed to under six hours. This is a massive improvement that has enhanced their capability, improved efficiency, and reduced operational overhead.

Use case three: Predictive maintenance


GE Aviation's Digital Group is a world leader in manufacturing airplane engines and developing aviation software. On top of manufacturing, GE also provides advanced data analytics to many airlines around the world. For each flight, GE ingests the time series data for the entire flight, which includes as many as 350,000 data points. Understandably, running data analytics on such volumes of data can be challenging. To solve this, the team adopted Azure Synapse:

Azure Exam Prep, Azure Tutorial and Material, Azure Certification, Azure Prep, Microsoft Career

Using Azure Synapse made it significantly easier and quicker for GE to build complex predictive machine learning models. Building something similar using their previous system would have required many complex steps, across multiple systems and environments. For GE, the native integration between Microsoft Power BI and Azure Synapse proved to be extremely useful. They can now explore data quickly and when an anomaly is found in the Power BI reports, analysts are able to do drill-down analysis to see why the spikes occurred and what corrective maintenance is needed.

Use case four: Marketing analytics (customer 360⁰ view)


Imagine a large multinational retail company that has stores in Australia, New Zealand, and Japan. The company sells consumer goods, electronics, and personal care items through its brick and mortar stores as well as through digital online channels. The company wants to leverage data analytics to build an end-to-end view of their customers. The goal is to improve customer experience and increase profit. To achieve this, the data team found Azure Synapse to be the best platform to achieve this:

Azure Exam Prep, Azure Tutorial and Material, Azure Certification, Azure Prep, Microsoft Career

Azure Synapse enabled the data team to unite their data, developers, and business users in ways that were not possible before. Azure Synapse has simplified ingestion and data processing and made it easy for the organization to have a central data store that holds all operational and historical data that can be refreshed in near real-time. Azure Synapse has also simplified data exploration and discovery without the need to transform data from one format to another or move the data to other systems. This has enabled the data team to experiment, map, and correlate different datasets to produce curated (gold) data that is ready for consumption.

Source: microsoft.com

Saturday, 28 November 2020

Achieving 100 percent renewable energy with 24/7 monitoring in Microsoft Sweden

Earlier this year, we made a commitment to shift to 100 percent renewable energy supply in our buildings and datacenters by 2025. On this journey, we recognize that how we track our progress is just as important as how we get there.

Today, we are announcing that Microsoft will be the first hyperscale cloud provider to track hourly energy consumption and renewable energy matching in a commercial product using the Vattenfall 24/7 Matching solution for our new datacenter regions in Sweden, which will be available in 2021.

Vattenfall and Microsoft are also announcing that the 24/7 hourly matching solution—the first commercial product of its kind—is now generally available. Vattenfall is a leading European energy company with a strong commitment to make fossil-free living possible within one generation. The solution is built using Microsoft’s Azure services, including Azure IoT Central and Microsoft Power BI.

Today’s announcement builds on last year’s partnership announcement with Vattenfall when the 24/7 Matching solution was first introduced. Since then, the solution has been in pilot in Vattenfall headquarters in Solna and the new Microsoft headquarters in Stockholm, which has seen 94 percent of the total office building energy consumption matched with Swedish wind and 6 percent matched with Swedish hydro power.

We continually invest in new ways to make our buildings and datacenters more energy efficient and sustainable. As part of today’s announcement, Microsoft is signing a power purchase agreement (PPA) to cover 100 percent of Microsoft’s energy consumption in Sweden. Microsoft will ensure that the company’s operations in Sweden use renewable energy.

The Vattenfall 24/7 Matching solution enables us to have a more accurate picture of energy used to match with Guarantees of Origin (GOs). This marks another important step in our commitment to be carbon negative by 2030 and use 100 percent renewable energy by 2025.

Azure Exam Prep, Azure Tutorial and Material, Azure Prep, Azure Certification, Azure Learning

Increasing transparency and accuracy of renewable energy matching


Fulfilling our 100 percent renewable energy commitment requires a better way of tracking renewable electricity. Today, the industry is using Energy Attribute Certificates, called Guarantees of Origin (GOs) in Europe and Renewable Energy Certificates (RECs) in the US. These ensure that the amount of electricity sold corresponds to the amount produced. GOs allow end consumers to choose electricity from a specific source; this enables them to choose electricity exclusively from renewable sources such as wind, solar, or hydropower.

While we have seen remarkable progress toward renewable sourcing and commitments, there is a fundamental flaw in monitoring the source and quantity of energy consumed. For any given hour, a business does not know the source of the energy they are consuming. That energy may come from renewable sources, or it may be produced from fossil fuel. The current system has no way of matching the supply of renewable energy with demand for that energy on an hourly basis. And without the transparency of supply and demand, market forces cannot work to ensure that renewable energy demand is supplied from renewable sources.

Through this solution, Microsoft Sweden’s new home is powered by renewable energy through the procurement of GOs, which traces electricity from renewable sources to provide information to electricity customers on the source of their energy—not just on a monthly or yearly basis, but on an hourly basis.

Azure Exam Prep, Azure Tutorial and Material, Azure Prep, Azure Certification, Azure Learning
The 24/7 matching of GOs and renewable energy credits (RECs) offers the following benefits:

◉ Businesses can see if their commitment to 100 percent renewable energy cover each hour of consumption and translate sourcing of renewable energy into climate impact.

◉ Energy providers can more easily understand demands for renewable energy hour-by-hour and take action to help production meet demand.

◉ 24/7 matching of consumption to production drives true market demand for renewable energy. As 24/7 hourly renewable products are rolled out across the world, they will incentivize investment in energy storage such that energy companies can store renewable energy when it is generating, so they can continue to supply their customers with renewable energy when it is not. Over time, this storage will allow electricity grids to supply 100 percent decarbonized power.

◉ The system can inspire regulatory change in how GOs and RECs are created, acquired and retired.

IoT for more accurate energy monitoring


IoT enables companies to gain near real-time insights of the physical world, connecting objects to give you insights into the health of a system or process, predict failures before they happen and gain overall efficiencies in operations.

The Vattenfall 24/7 hourly monitoring solution leverages Azure IoT Central to manage the full picture of energy consumption in a given building. Azure IoT Central helps solution builders move beyond proof of concept to building business-critical applications they can brand and sell directly or through Microsoft AppSource. Today, Microsoft offers two IoT Central energy app templates for solar panel and smart meter monitoring to help energy solution builders accelerate development.

Commitment to building world-class, sustainable datacenters


We believe that our datacenters should be positive contributors to the grid, and we continue to innovate in energy technology and monitoring resources to support our corporate commitment to be carbon negative by 2030.

Source: microsoft.com

Tuesday, 27 October 2020

Introducing the Microsoft Azure Modular Datacenter

We designed the Azure Modular Datacenter (MDC) for customers who need cloud computing capabilities in hybrid or challenging environments, including remote areas. This announcement is complemented by our Azure Space offerings and partnerships that can extend satellite connectivity anywhere in the world. Scenarios range from mobile command centers, humanitarian assistance, military mission needs, mineral exploration, and other use cases requiring high intensity, secure computing on Azure.

The MDC can give customers a path to migrate apps to Azure while still running these workloads on-premises with low-latency connections to their own datacenter. This provides a stepping stone for transforming workloads to the Azure API with the option of continuing to run these apps on-premises, or in public or sovereign clouds.

Azure where you need it

Microsoft Exam Prep, Microsoft Online Learning, Azure Prep, Azure Certification, Azure Guides

Around the world, there are significant cloud computing and storage needs in areas with adverse conditions, where low communication, disrupted network availability and limited access to specialized infrastructure would have previously prevented taking advantage of cloud computing. The MDC solves this by bringing Azure to these environments, providing datacenter scale compute resources closest to where they're needed.

With MDC you can deploy a self-contained datacenter unit with a field transportable solution that provides near-immediate value. The unit can operate in a wide range of climates and harsh conditions in a ruggedized, radio frequency (RF) shielded unit. Once deployed it can act as critical infrastructure where temperature, humidity, and even level surfaces can pose a challenge.

MDC can provide onsite augmentation of compute and storage capabilities, managing and operating high-performance applications in the field, IoT and real-time analytics workloads that require ultra-low latency, and standing up cloud applications to support critical infrastructure recovery.

Connectivity


A major differentiator for MDC is that customers can run the unit with full network connectivity, occasionally connected or fully disconnected. This is a unique, powerful capability that allows customers to access the power of the Azure cloud on their terms.

Satellite communications option


Microsoft Exam Prep, Microsoft Online Learning, Azure Prep, Azure Certification, Azure Guides

Microsoft is partnering with satellite operators to provide an option for secure and reliable connectivity to field deployed MDC units.

This connectivity is achieved through a network high availability module which continuously evaluates network performance. In the event of a network disruption, the network high availability module will move traffic from the impacted network to a backup satellite connection. This resiliency ensures continued delivery of essential hyperscale services through Azure. Alternatively, MDC can use satellite communications as the primary connection where no other network is available.

Ready to go


Microsoft Exam Prep, Microsoft Online Learning, Azure Prep, Azure Certification, Azure Guides

Sunday, 25 October 2020

Quickly get started with samples in Azure Synapse Analytics

Get started immediately on your first project with the new Knowledge center in the Azure Synapse Studio.

To further accelerate time to insight in Microsoft Azure Synapse Analytics, we are introducing the Knowledge center to simplify access to pre-loaded sample data and to streamline the getting started process for data professionals. You can now create or use existing Spark and SQL pools, connect to and query Azure Open Datasets, load sample scripts and notebooks, access pipeline templates, and tour the Azure Synapse Studio—all from one place.

Azure Synapse Analytics, Azure Learning, Azure Exam Prep, Azure Tutorial and Material, Azure Guides

The Knowledge center can be accessed from the Azure Synapse Studio Homepage under Useful links or by clicking the Question mark icon on the Navigation bar.

Azure Synapse Analytics, Azure Learning, Azure Exam Prep, Azure Tutorial and Material, Azure Guides

Use samples immediately


The Knowledge center offers several one-click tutorials that create everything you need to instantaneously explore and analyze data.

Azure Synapse Analytics, Azure Learning, Azure Exam Prep, Azure Tutorial and Material, Azure Guides

In the Explore sample data with Spark tutorial, you can easily create an Apache Spark pool and use notebooks natively inside Azure Synapse to analyze New York City (NYC) Yellow Taxi data and customize visualizations. This is possible as Azure Synapse unifies both SQL and Spark development within the same analytics service.

Azure Synapse Analytics, Azure Learning, Azure Exam Prep, Azure Tutorial and Material, Azure Guides

In the Query data with SQL tutorial, you can query and analyze the same NYC Yellow Taxi dataset with a serverless SQL pool, which allows you to use T-SQL for fast data lake exploration without provisioning or managing dedicated resources. The tutorial also enables you to quickly visualize results with one click.

Azure Synapse Analytics, Azure Learning, Azure Exam Prep, Azure Tutorial and Material, Azure Guides

The Create external table with SQL tutorial allows you to use either a serverless or dedicated SQL pool to create an external table, giving you the flexibility to harness data on your own terms and choose the most cost-effective option for each use case. Dedicated SQL pools provide a broad set of workload management capabilities including workload isolation and workload priority.

Azure Synapse Analytics, Azure Learning, Azure Exam Prep, Azure Tutorial and Material, Azure Guides

Browse available samples


The new Knowledge center also contains numerous sample datasets, notebooks, scripts, and pipeline templates to allow you to quickly get started.

Azure Synapse Analytics, Azure Learning, Azure Exam Prep, Azure Tutorial and Material, Azure Guides

Add Azure Open Datasets to access sample data on COVID-19, public safety, transportation, economic indicators, and more. The datasets are automatically loaded into the Data hub on the main navigation under the Linked tab and then Azure Blog Storage. With the click of a button, you can run sample scripts to select the top 100 rows and create an external table or you can also create a new notebook.

Azure Synapse Analytics, Azure Learning, Azure Exam Prep, Azure Tutorial and Material, Azure Guides

Regardless of whether you prefer to use PySpark, Scala, or Spark.NET C#, you can try a variety of sample notebooks. These will open in the Develop hub of the Azure Synapse Studio under Notebooks.

Azure Synapse Analytics, Azure Learning, Azure Exam Prep, Azure Tutorial and Material, Azure Guides

In addition to sample notebooks, there are samples for SQL scripts like Analyze Azure Open Datasets using SQL On-demand, Generate your COPY Statement with Dynamic SQL, and Query CSV, JSON, or Parquet files. Once these scripts are published in your workspace, they will open in the Develop hub of the main navigation under SQL scripts.

Azure Synapse Analytics, Azure Learning, Azure Exam Prep, Azure Tutorial and Material, Azure Guides

Furthermore, there are around 30 different templates for pipelines, including templates for copying data from various sources.

Azure Synapse Analytics, Azure Learning, Azure Exam Prep, Azure Tutorial and Material, Azure Guides

Tour Azure Synapse Studio


The Knowledge center offers a comprehensive tour of the Azure Synapse Studio to help familiarize you with key features so you can get started right away on your first project.

Azure Synapse Analytics, Azure Learning, Azure Exam Prep, Azure Tutorial and Material, Azure Guides

Tuesday, 24 December 2019

Microsoft is a leader in The Forrester Wave™: Streaming Analytics, Q3 2019

Processing Big data in real-time is an operational necessity for many businesses. Azure Stream Analytics is Microsoft’s serverless real-time analytics offering for complex event processing.

We are excited and humbled to announce that Microsoft has been named a leader in The Forrester Wave™: Streaming Analytics, Q3 2019. Microsoft believes this report truly reflects the market momentum of Azure Stream Analytics, satisfied customers, a growing partner ecosystem and the overall strength of our Azure cloud platform.

Azure Tutorial and Material, Azure Guides, Azure Certifications, Azure Study Materials, Azure Exam Prep, Azure Learning
The Forrester Wave™: Streaming Analytics, Q3 2019

Forrester Wave™: Streaming Analytics, Q3 2019 report evaluated streaming analytics offerings from 11 different solution providers and we are honored to share that that Forrester has recognized Microsoft as a Leader in this category. Azure Stream Analytics received the highest possible score in 12 different categories including Ability to execute, Administration, Deployment, Solution Roadmap, Customer adoption and many more.

The report states, “Microsoft Azure Stream Analytics has strengths in scalability, high availability, deployment, and applications. Azure Stream Analytics is an easy on-ramp for developers who already know SQL. Zero-code integration with over 15 other Azure services makes it easy to try and therefore adopt, making the product the real-time backbone for enterprises needing real-time streaming applications on the Azure cloud. Additionally, through integration with IoT Hub and Azure Functions, it offers seamless interoperability with thousands of devices and business applications.”

Key Differentiators for Azure Stream Analytics


Fully integrated with Azure ecosystem: Build powerful pipelines with few clicks

Whether you have millions of IoT devices streaming data to Azure IoT Hub or have apps sending critical telemetry events to Azure Event Hubs, it only takes a few clicks to connect multiple sources and sinks to create an end-to-end pipeline.

Azure Tutorial and Material, Azure Guides, Azure Certifications, Azure Study Materials, Azure Exam Prep, Azure Learning

Developer productivity

One of the biggest advantages of Stream Analytics is the simple SQL-based query language with its powerful temporal constraints to analyze data in motion. Familiarity with SQL language is enough to author powerful queries. Additionally, Azure Stream Analytics supports language extensibility via C# and JavaScript user-defined functions (UDFs) or user-defined aggregates to perform complex calculations as part of a Stream Analytics query.

Analytics prowess

Stream Analytics contains a wide array of analytic capabilities such as native support for geospatial functions, built-in callouts to custom machine learning (ML) models for real-time scoring, built-in ML models for Anomaly Detection, Pattern matching, and more to help developers easily tackle complex scenarios while staying in a familiar context.

Intelligent edge

Azure Stream Analytics helps bring real-time insights and analytics capabilities closer to where your data originates. Customers can easily enable new scenarios with true hybrid architectures for stream processing and run the same query in the cloud or on the IoT edge.

Best-in-class financially backed SLA by the minute

We understand it is critical for businesses to prevent data loss and have business continuity. Stream Analytics guarantees event processing with a 99.9 percent availability service-level agreement (SLA) at the minute level, which is unparalleled in the industry.

Scale instantly

Stream Analytics is a fully managed serverless (PaaS) offering on Azure. There is no infrastructure to worry about, and no servers, virtual machines, or clusters to manage. We do all the heavy lifting for you in the background. You can instantly scale up or scale-out the processing power from one to hundreds of streaming units for any job.

Mission critical

Stream Analytics guarantees “exactly once” event processing and at least once delivery of events. It has built-in recovery capabilities in case the delivery of an event fails. So, you never have to worry about your events getting dropped.

Sunday, 1 December 2019

Azure SQL Data Warehouse is now Azure Synapse Analytics

Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources—at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs.

With Azure Synapse, data professionals can query both relational and non-relational data using the familiar SQL language. This can be done using either serverless on-demand queries for data exploration and ad hoc analysis or provisioned resources for your most demanding data warehousing needs. A single service for any workload.

In fact, it’s the first and only analytics system to have run all the TPC-H queries at petabyte-scale. For current SQL Data Warehouse customers, you can continue running your existing data warehouse workloads in production today with Azure Synapse and will automatically benefit from the new preview capabilities when they become generally available.

Azure SQL Data Warehouse, Azure Synapse Analytics, Azure Study Materials, Azure Tutorial and Material, Azure Online Exam

Taking SQL beyond data warehousing


A cloud native, distributed SQL processing engine is at the foundation of Azure Synapse and is what enables the service to support the most demanding enterprise data warehousing workloads. This week at Ignite we introduced a number of exciting features to make data warehousing with Azure Synapse easier and allow organizations to use SQL for a broader set of analytics use cases.

Unlock powerful insights faster from all data

Azure Synapse deeply integrates with Power BI and Azure Machine Learning to drive insights for all users, from data scientists coding with statistics to the business user with Power BI. And to make all types of analytics possible, we’re announcing native and built-in prediction support, as well as runtime level improvements to how Azure Synapse handles streaming data, parquet files, and Polybase. Let’s dive into more detail:

◉ With the native PREDICT statement, you can score machine learning models within your data warehouse—avoiding the need for large and complex data movement. The PREDICT function (available in preview) relies on open model framework and takes user data as input to generate predictions. Users can convert existing models trained in Azure Machine Learning, Apache Spark™, or other frameworks into an internal format representation without having to start from scratch, accelerating time to insight.

Azure SQL Data Warehouse, Azure Synapse Analytics, Azure Study Materials, Azure Tutorial and Material, Azure Online Exam

◉ We’ve enabled direct streaming ingestion support and ability to execute analytical queries over streaming data. Capabilities such as: joins across multiple streaming inputs, aggregations within one or more streaming inputs, transform semi-structured data and multiple temporal windows are all supported directly in your data warehousing environment (available in preview). For streaming ingestion, customers can integrate with Event Hubs (including Event Hubs for Kafka) and IoT Hubs.

◉ We’re also removing the barrier that inhibits securely and easily sharing data inside or outside your organization with Azure Data Share integration for sharing both data lake and data warehouse data.

◉ By using new ParquetDirect technology, we are making interactive queries over the data lake a reality (in preview). It’s designed to access Parquet files with native support directly built into the engine. Through improved data scan rates, intelligent data caching and columnstore batch processing, we’ve improved Polybase execution by over 13x.

Azure SQL Data Warehouse, Azure Synapse Analytics, Azure Study Materials, Azure Tutorial and Material, Azure Online Exam

Workload isolation


To support customers as they democratize their data warehouses, we are announcing new features for intelligent workload management. The new Workload Isolation functionality allows you to manage the execution of heterogeneous workloads while providing flexibility and control over data warehouse resources. This leads to improved execution predictability and enhances the ability to satisfy predefined SLAs.

Azure SQL Data Warehouse, Azure Synapse Analytics, Azure Study Materials, Azure Tutorial and Material, Azure Online Exam

COPY statement


Analyzing petabyte-scale data requires ingesting petabyte-scale data. To streamline the data ingestion process, we are introducing a simple and flexible COPY statement. With only one command, Azure Synapse now enables data to be seamlessly ingested into a data warehouse in a fast and secure manner.

This new COPY statement enables using a single T-SQL statement to load data, parse standard CSV files, and more.

COPY statement sample code:

COPY INTO dbo.[FactOnlineSales] FROM ’https://contoso.blob.core.windows.net/Sales/’

Safe keeping for data with unmatched security


Azure has the most advanced security and privacy features in the market. These features are built into the fabric of Azure Synapse, such as automated threat detection and always-on data encryption. And for fine-grained access control businesses can ensure data stays safe and private using column-level security, native row-level security, and dynamic data masking (now generally available) to automatically protect sensitive data in real time.

To further enhance security and privacy, we are introducing Azure Private Link. It provides a secure and scalable way to consume deployed resources from your own Azure Virtual Network (VNet). A secure connection is established using a consent-based call flow. Once established, all data that flows between Azure Synapse and service consumers is isolated from the internet and stays on the Microsoft network. There is no longer a need for gateways, network addresses translation (NAT) devices, or public IP addresses to communicate with the service.

Azure SQL Data Warehouse, Azure Synapse Analytics, Azure Study Materials, Azure Tutorial and Material, Azure Online Exam

Tuesday, 15 October 2019

Azure Analysis Services web designer adds new DAX query viewer

We released the Azure Analysis Services web designer. This new browser-based experience allows developers to start creating and managing Azure Analysis Services (AAS) semantic models quickly and easily. While SQL Server Data Tools (SSDT) and SQL Server Management Studio (SSMS) are still the primary tools for development, this new experience is intended to make modeling fast and easy. It is great for getting started on a new model or to do things such as adding a new measure to an existing model.

Today we are announcing new functionality that will allow you to generate, view and edit your DAX queries. This provides a great way to learn DAX while testing the data in your models. DAX or Data Analysis Expressions is a formula language used to create custom calculations in Analysis Services. DAX formulas include functions, operators, and values to perform advanced calculations on data in tables and columns.

To get started open the web designer from the Azure Portal.

Azure Certifications, Azure Learning, Azure Tutorial and Materials, Azure Guides, Azure Online Exam

Once inside the designer, select the model that you wish to query.

Azure Certifications, Azure Learning, Azure Tutorial and Materials, Azure Guides, Azure Online Exam

This opens up the query designer where you can drag and drop fields from the right to graphically generate and then run a query against your model.

Azure Certifications, Azure Learning, Azure Tutorial and Materials, Azure Guides, Azure Online Exam

Now switch the view from designer to query.

Azure Certifications, Azure Learning, Azure Tutorial and Materials, Azure Guides, Azure Online Exam

This will bring up the new query editor with the DAX query that was generated from the query that was graphically created in the previous steps.

Azure Certifications, Azure Learning, Azure Tutorial and Materials, Azure Guides, Azure Online Exam

The query text can be edited and rerun to see new results.

Saturday, 3 August 2019

Power BI and Azure Data Services dismantle data silos and unlock insights

Learn how to connect Power BI and Azure Data Services to share data and unlock new insights with a new tutorial. Business analysts who use Power BI dataflows can now share data with data engineers and data scientists, who can leverage the power of Azure Data Services, including Azure Databricks, Azure Machine Learning, Azure SQL Data Warehouse, and Azure Data Factory for advanced analytics and AI.

With the recently announced preview of Power BI dataflows, Power BI has enabled self-service data prep for business analysts. Power BI dataflows can ingest data from a large array of transactional and observational data sources, and cleanse, transform, enrich, schematize, and store the result. Dataflows are reusable and can be refreshed automatically and daisy-chained to create powerful data preparation pipelines. Power BI is now making available support for storing dataflows in Azure Data Lake Storage (ADLS) Gen2, including both the data and dataflow definition. By storing dataflows in Azure Data Lake Storage Gen2, business analysts using Power BI can now collaborate with data engineers and data scientists using Azure Data Services.

Data silos inhibit data sharing


The ability for organizations to extract intelligence from business data provides a key competitive advantage, however attempting this today can be time consuming and costly. To extract intelligence and create value from data, an application must be able to access the data and understand its structure and meaning. Data often resides in silos that are application or platform specific, creating a major data integration and data preparation challenge.

Consistent data and metadata formats enable collaboration


Power BI, Azure Data Services, Azure Learning, Azure Online Exam, Azure Certifications, Azure Guides
By adopting a consistent way to store and describe data based on the Common Data Model (CDM), Power BI, Azure Data Services and other applications can share and interoperate over data more effectively. Power BI dataflows are stored in ADLS Gen2 as CDM folders. A CDM folder contains a metadata file that describes the entities in the folder, with their attributes and datatypes, and lists the data files for each entity. CDM also defines a set of standard business entities that define additional rich semantics. Mapping the data in a CDM folder to standard CDM entities further facilitates interoperability and data sharing. Microsoft has joined with SAP and Adobe to form an Open Data Initiative to encourage the definition and adoption of standard entities across a range of domains to make it easier for applications and tools to share data through an enterprise Data Lake.

By adopting these data storage conventions, data ingested by Power BI, with its already powerful and easy to use data prep features, can now be further enriched and leveraged in Azure. Similarly, data in Azure can be exported into CDM folders and shared with Power BI.


Azure Data Services enable advanced analytics on shared data


Azure Data Services enable advanced analytics that let you maximize the business value of data stored in CDM folders in the data lake. Data engineers and data scientists can use Azure Databricks and Azure Data Factory dataflows to cleanse and reshape data, ensuring it is accurate and complete. Data from different sources and in different formats can be normalized, reformatted, and merged to optimize the data for analytics processing. Data scientists can use Azure Machine Learning to define and train machine learning models on the data, enabling predictions and recommendations that can be incorporated into BI dashboards and reports, and used in production applications. Data engineers can use Azure Data Factory to combine data from CDM folders with data from across the enterprise to create an historically accurate, curated enterprise-wide view of data in Azure SQL Data Warehouse. At any point, data processed by any Azure Data Service can be written back to new CDM folders, to make the insights created in Azure accessible to Power BI and other CDM-enabled apps or tools.

New tutorial explores data sharing between Power BI and Azure


A tutorial is now available to help you understand how sharing data between Power BI and Azure using CDM folders can break down data silos and unlock new insights. The tutorial with sample code shows how to integrate data from Power BI into a modern data warehousing scenario in Azure. The tutorial allows you to explore the flows highlighted in green in the diagram above.  

In the tutorial, Power BI dataflows are used to ingest key analytics data from the Wide World Importers operational database and store the extracted data with its schema in a CDM folder in ADLS Gen2. You then connect to the CDM folder and process the data using Azure Databricks, formatting and preparing it for later steps, then writing it back to the lake in a new CDM folder. This prepared CDM folder is used by Azure Machine Learning to train and publish an ML model that can be accessed from Power BI or other applications to make real-time predictions. The prepared data is also loaded into staging tables in an Azure SQL Data Warehouse, where it is transformed into a dimensional model. 

Power BI, Azure Data Services, Azure Learning, Azure Online Exam, Azure Certifications, Azure Guides

Azure Data Factory is used to orchestrate the flow of data between the services, as well as to manage and monitor the processing at runtime. By working through the tutorial, you’ll see first-hand how the metadata stored in a CDM folder makes it easier to for each service to understand and share data.  

Sample code accelerates your data integration projects


The tutorial includes sample code and instructions for the whole scenario. The samples include reusable libraries and code in C#, Python, and Scala, as well as reusable Azure Data Factory pipeline templates, that you can use to integrate CDM folders into your own Azure Data Services projects. 

Thursday, 9 May 2019

Azure SQL Data Warehouse releases new capabilities for performance and security

As the amount of data stored and queried continues to rise, it becomes increasingly important to have the most price-performant data warehouse. While we’re excited about being the industry leader in both of Gigaom’s TPC-H and TPC-DS benchmark reports, we don’t plan to stop innovating on behalf of our customers.

We’re excited to introduce several new features that will continue to make Azure SQL Data Warehouse the unmatched industry leader in price-performance, flexibility, and security.

To enable customers to continue improving the performance of their applications without adding any additional cost, we’re announcing preview availability of result-set caching, materialized views, and ordered clustered columnstore indexes.

In addition to price-performance enhancements, we’ve added new capabilities that enable customers to be more agile and flexible. The first is workload importance, which is a new feature that enables users to decide how workloads with conflicting needs get prioritized. Second, our new support for automatic statistics maintenance (auto-update statistics) means that manageability and maintenance of Azure SQL Data Warehouse just got easier and more effective. And finally, we’re also adding support for managing and querying JSON data. Users can now load JSON data directly into their data warehouses and mix it with other relational data, leading to faster and easier insights.

Our last announcement focuses on security and privacy. As you know, deploying data warehousing solutions in the cloud demands sophisticated and robust security. While Azure SQL Data Warehouse already enables an advanced security model to be deployed, today we’re announcing support for Dynamic Data Masking (DDM). DDM allows you to protect private data, through user-defined policies, ensuring it’s visible only to those that have permission to see it.

Azure Certifications, Azure Learning, Azure Guides, Azure Study Materials

In the sections below, we’ll dive into these new features and the benefits that each provide.

Price-performance


Price-performance is a reoccurring theme in our releases because it ensures we provide one of the fastest analytics services at incredible value. With new functionalities announced today, we continue to demonstrate our commitment towards offering the leading price-performance platform.

Interactive dashboarding with result-set caching (preview)

Interactive dashboards come with predictable and repetitive query patterns. Result-set caching, now available in preview, helps with this scenario as it enables instant query response times while reducing time-to-insight for business analysts and reporting users.

With result-set caching enabled, Azure SQL Data Warehouse automatically caches results from repetitive queries, causing subsequent query executions to return results from the persisted cache that omits full query execution. In addition to saving compute cycles, queries satisfied by result-set cache do not use any concurrency slots and thus do not count against existing concurrency limits. For security reasons, only users with the appropriate security credentials can access the result sets in cache.

Materialized views to improve performance (preview)

Another new feature that greatly enhances query performance for a wide set of queries is materialized view support, now available in preview. A materialized view improves the performance of complex queries (typically queries with joins and aggregations) while offering simple maintenance operations.

When materialized views are created, Azure SQL Data Warehouse query optimizer transparently and automatically rewrites user queries to leverage deployed materialized views, leading to improved query performance. Best of all, as the data gets loaded into base tables, Azure SQL Data Warehouse automatically maintains and refreshes materialized views, providing a simplified view of maintenance and management. As the user queries leverage materialized views, queries run significantly faster and use less system resources. The more complex and expensive the query within the view is, the bigger potential there is for execution time savings.

Fast scans with ordered clustered columnstore indexes (preview)

Columnstore is a key enabler for storing and efficiently querying large amounts of data. For each table, it divides incoming data into row groups and each column of a row group forms a segment on a disk. When querying columnstore indexes, only the column segments that are relevant to user queries are read from the disk. Ordered clustered columnstore indexes further optimize query execution by enabling efficient segment elimination.

Due to pre-ordered data, you can drastically reduce the number of segments that are read from the disk, leading to faster query processing. Ordered clustered columnstore indexes is now available in preview, and queries containing filters and predicates can greatly benefit from this feature.

Flexibility


As business requirements evolve, the ability to change and adapt solution behavior is one of the key benefits of a modern data warehousing product. The ability to handle and manage heterogeneous data that enterprises have while offering ease of use and management is critical. To support these needs, Azure SQL Data Warehouse is introducing the following new functionalities to help you deal with ever-evolving requirements.

Prioritize workloads with workload importance (general availability)

Running mixed workloads on your analytics solution is often a necessity to effectively and quickly execute business processes. In situations where resources are constrained, the capability to decide which workloads need to be executed first is critical, as it helps with overall solution cost management. For instance, executive dashboard reports may be more important than ad-hoc queries. Workload importance now enables this scenario. Requests with higher importance are guaranteed quicker access to resources, which helps meet predefined SLAs and ensures important requests are prioritized.

Workload classification concept

To define workload priority, various requests must be classified. Azure SQL Data Warehouse supports flexible classification policies that can be set for a SQL query, a database user, database role, Azure Active Directory login, or Azure Active Directory group. Workload classification is achieved using the new CREATE WORKLOAD CLASSIFIER syntax.

The diagram below illustrates the workload classification and importance function:

Azure Certifications, Azure Learning, Azure Guides, Azure Study Materials

Workload importance concept

Workload importance is established through classification. Importance influences a requester's access to system resources  including memory, CPU, and IO and locks. A request can be assigned one of these five levels of importance: low, below_normal, normal, above_normal, and high. If a request with above_normal importance is scheduled, it gets access to resources before a request with the default normal importance.

Azure Certifications, Azure Learning, Azure Guides, Azure Study Materials

Manage and query JSON data (preview)

Organizations are increasingly faced with dealing with multiple data sources and heterogeneous file formats, JSON being among the top ones, aside from CSV files. To speed up time to insight and minimize unnecessary data transformation processes, Azure SQL Data Warehouse now enables support for querying JSON data. This feature is now available in preview.

Business analysts can now use the familiar T-SQL language to query and manipulate documents that are formatted as JSON data. JSON functions, such as JSON_VALUE, JSON_QUERY, JSON_MODIFY, and OPENJSON are now supported in Azure SQL Data Warehouse. Azure SQL Data Warehouse can now effectively support both relational and non-relational data, including joins between the two, while enabling users to use their traditional BI tools, such as Power BI.

Automatic statistics maintenance and update (preview)

Azure SQL Data Warehouse implements a cost-based optimizer to ensure optimal execution plans are being generated and used. For any cost-based optimizer to be effective, column level statistics are needed. When these statistics are stale, there is potential for selecting a non-optimal plan, leading to slower query performance.

Today, we’re extending that support for auto statistics creation by adding the ability to automatically refresh and maintain statistics. As data warehouse tables get loaded and updated, the system can now automatically detect and update out-of-date statistics. With the auto-update statistics capability now available in preview, Azure SQL Data Warehouse delivers full statistics management capabilities while simplifying statistics maintenance processes. You no longer need to manually maintain statistics, which leads to a simplified and more cost-effective data warehouse deployment.

Security


Azure SQL Data Warehouse provides one of the most advanced security and privacy features in the market. This is achieved through using proven SQL Server technology. SQL Server, as the core technology and component of Azure SQL Data Warehouse, has been the least vulnerable databases over the last eight years according to the NIST national vulnerabilities database. To expand existing Azure SQL Data Warehouse's security and privacy features, we’re announcing Dynamic Data Masking (DDM) support is now available in preview.

Protect sensitive data with dynamic data masking (preview)

Dynamic data masking (DDM) enables administrators and data developers to control access to their company’s data, allowing sensitive data to be safe and restricted. It prevents unauthorized access to private data by obscuring the data on-the-fly. Based on user-defined data masking policies, Azure SQL Data Warehouse can dynamically obfuscate data as the queries execute, and before results are shown to users.

Azure Certifications, Azure Learning, Azure Guides, Azure Study Materials

Azure SQL Data Warehouse implements the DDM capability directly inside the engine. When creating tables with DDM, policies are stored in the system's metadata and then enforced by the engine as queries get executed. This centralized policy enforcement process simplifies data masking rules management as access control is not implemented and repeated at the application layer. As various users access queries tables, policies are automatically honored and applied while protecting sensitive data. DDM comes with flexible policies and you can choose to define a partial mask, which exposes some of the data in the selected columns, or a full mask that obfuscates the data completely. Azure SQL Data Warehouse also provides built-in masking functions that users can choose from.