Tuesday, 30 March 2021

Strengthen and optimize compliance in Azure Security Center

The Regulatory Compliance dashboard in Azure Security Center is an excellent tool for helping organizations understand their compliance posture relative to industry standards. Reporting on compliance with specific standards is obviously critical for regulated customers, though tracking compliance status is also relevant to many other organizations who want to align with industry-defined best practices. Many of our customers use compliance frameworks as the basis of their organizational security model.

Azure Security Center improves your organization's overall compliance readiness. By performing ongoing assessments, Azure Security Center provides rich, actionable insights and reports to simplify your regulatory compliance journey.

Several significant upgrades have recently been released to the compliance management experience in Azure Security Center, including Azure Security Benchmark integration with Secure Score, a new section for downloading audit certification reports, integration of shared responsibility model details into the product, and Workflow Automation functionality.

Azure Security Benchmark

Azure Security Benchmark is now fully integrated into the regulatory compliance dashboard as the default standard, available to all Azure Security Center customers for free. Azure Security Benchmark comprises the canonical set of controls that Microsoft defines and recommends as a security baseline, aligned with industry frameworks and customized to Azure and cloud environments. The Benchmark is thus a superset of security controls related to cloud security in Azure, covering the full set of security requirements related to cloud security from each of the standards it maps to.

Secure Score is built on top of Azure Security Benchmark and provides a key performance indicator (KPI) measurement against Azure Security Benchmark controls. Secure Score provides a prioritized set of recommendations, allowing you to quickly identify the highest risk factors in your environment. All Security Center customers now have access to both the Azure Security Benchmark view from the compliance controls perspective, along with the Secure Score view to prioritize action by risk.

Azure Security Center, Azure Certification, Azure Learning, Azure Prep, Azure Preparation, Azure Career
Figure 1: Azure Security Benchmark framework in the Security Center regulatory compliance dashboard

A large set of additional industry and regulatory standards are supported in the Azure Security Center regulatory compliance experience, including ISO 27001, NIST SP 800-53 R4, PCI DSS 3.2.1, and more, and can be added to the dashboard individually and applied on any scope, depending on your organizational requirements. Within the dashboard, you can download a point-in-time report on your compliance status, including both a summary executive-level report in PDF format and a detailed report of compliance per resource in CSV format. These reports are available for Azure Security Benchmark as well as all other compliance standards in the dashboard.

For continuous real-time reporting, we've recently added the ability to configure Continuous Export on compliance frameworks, so you can get real-time compliance data continuously streamed to your Log Analytics workspace or Azure Event Hub for streaming to any external system.

Audit reports and shared responsibility in the cloud


Managing compliance in the cloud isn't only about what you need to do, it is based on a shared responsibility model with your cloud provider. That's why we've recently added access to Azure compliance certification artifacts directly in the Azure Security Center compliance experience. We provide access to documents on Azure certifications for many compliance standards, including ISO standards, Payment Card Industry data security standard (PCI), Sevice Orgainzation Controls (SOC), and more. You are now able to filter and search to find exactly the document you need and download it directly from the Audit Reports area in Azure Security Center. Access to these documents was previously available through the Service Trust Portal, requiring separate authentication.

Azure Security Center, Azure Certification, Azure Learning, Azure Prep, Azure Preparation, Azure Career
Figure 2: Audit Certification reports in Security Center

In addition to audit reports, we've recently added information on shared responsibility baked in directly to the compliance management experience in the dashboard. Across many standards, we've added an indication of responsibility to each control requirement, whether Microsoft responsibility, customer responsibility, or shared responsibility. This can give a more complete picture of what each control requirement fully entails and helps you understand where the platform responsibility ends, and your responsibility begins.

For NIST SP 800-53 R4, we have additionally added in-depth platform implementation details on compliance controls, consisting of a set of assessments from the Azure Control Framework that describes how Azure as a platform implements its part of that control. This will become available for additional compliance standards over time. Finally, we've also added extended control details for each compliance requirement, giving you access to a detailed description of the control and guidance for how to become compliant with that control.

Azure Security Center, Azure Certification, Azure Learning, Azure Prep, Azure Preparation, Azure Career

Figure 3: Shared Responsibility Model and control information in the regulatory compliance dashboard

Workflow automation for compliance events


An additional new feature that has recently been released is the ability to configure workflow automations for regulatory compliance data. This capability allows you to trigger a Logic App automatically any time there is a status change on a regulatory compliance assessment and run any action based on that event. The automation can be configured on one or more standards that you are tracking in the compliance dashboard. You can configure any number of automated actions implemented by Logic Apps. There are several built-in, predefined templates, such as sending an email to specific users or opening a new ticket in a ticketing system. You can also create your own custom Logic App with the automation logic of your choice.

Explore regulatory compliance data in Azure Resource Graph


All the regulatory compliance data is available for customers in Azure Resource Graph for easy exploration and querying. Now, accessing this data is also available directly as an option in the regulatory compliance dashboard. Just click on the Open Query button in the dashboard to automatically load a query returning detailed resource compliance data for the standard you currently have loaded in the dashboard. You can then adjust this query as needed to generate a view of your choice on the compliance data, as well as cross-reference and filter by other data stored in Azure Resource Graph for advanced exploration.

Source: microsoft.com

Saturday, 27 March 2021

Perform at-scale, agentless SQL Server discovery and assessment with Azure Migrate

Moving on-premises infrastructure, databases, and applications to Azure is key to the success of your cloud migration and modernization journey, and we are committed to simplifying that process. With each major milestone, we set out to make these primary migration scenarios as seamless as possible. Today, we are announcing the preview of at-scale, agentless discovery and migration-readiness assessments for SQL Server. You can now use Azure Migrate to create a unified view of your entire datacenter, across Windows Server, Linux, and SQL Server.

Azure Migrate provides a streamlined, comprehensive portfolio of Microsoft and partner tools to meet migration needs, all in one place. With this release, we are taking three tools that help with server, database, and web app assessments and are unifying them into one guided end-to-end flow. The first step in that journey is assessments for Azure SQL Managed Instance and Azure SQL Database. Soon, we will also expand to support integrated assessments for Azure App Service.

You can now discover your SQL Server instances and databases running in a VMware environment and analyze their configuration, performance, and application dependencies for migration to Azure SQL Database and Azure SQL Managed Instance. Assessments will include Azure SQL readiness, right-sizing, and cost projections. You now have an easy way to discover your entire SQL estate and one unified experience for IaaS and PaaS migration assessments.

Azure Exam Prep, Azure Tutorial and Material, Azure Certification, Azure Preparation

Unified onboarding for Windows Server, Linux, and SQL Server


Azure Migrate appliance for VMware helps with discovery, assessment, software inventory, application dependency mapping, and migration.

◉ Deploy a new Azure Migrate on-premises appliance or upgrade your existing appliance to start discovering your SQL Server instances and databases. You can also use the appliance to inventory installed software and perform agentless dependency mapping.

◉ You can now provide multiple credentials (domain, non-domain, and SQL authentication) for discovery. Azure Migrate appliance will automatically map server and database credentials across the entire estate.

◉ To ensure that your credentials are secure, they are encrypted and stored on your appliance in your datacenter. Credentials are not sent to Microsoft.

SQL Server discovery and application dependency mapping


Azure Migrate discovery will help you model servers running SQL Server and the various instances and databases, including multiple SQL instances running on the same server.

◉ You can discover up to 6,000 databases with one Azure Migrate appliance; SQL performance data is collected every 30 seconds to ensure that you get the most accurate right-sizing recommendations.

◉ You can discover SQL instances and databases running on SQL Server 2008 through SQL Server 2019. Developer, Enterprise, Express, and Web editions are supported.

◉ Discovery details will include SQL Server version, edition, availability mode, count and size of user databases, and compatibility level.

◉ You can also use the agentless dependency mapping feature to identify application tiers or interdependent applications; this information is useful when you need to plan migration for interdependent servers.

Azure Exam Prep, Azure Tutorial and Material, Azure Certification, Azure Preparation

Azure SQL assessment, right-sizing, and cost planning


Azure Migrate assessment now natively includes the assessment logic that our SQL Server team has built over the years, augmented to deliver an at-scale experience for SQL Server. Azure Migrate also supports assessments for Azure infrastructure service and Azure VMware Solution.  

◉ You can customize assessments for your unique scenario with the ability to specify the target Azure region, Azure SQL deployment type, reserved capacity, service tier, and performance history.

◉ You’ll get best-fit recommendations, including right-sizing based on performance history and service tier recommendation. Assessments also include projected compute and storage costs based on the target Azure SQL deployment that is the most cost-effective.

◉ In addition to migration readiness information, blockers and issues are surfaced so that you can mitigate them as needed.

◉ You can also modify assessment inputs at any time or create multiple assessments for the same set of SQL instances and databases to compare and identify the target Azure options that work best for you.

Azure Exam Prep, Azure Tutorial and Material, Azure Certification, Azure Preparation

Source: microsoft.com

Thursday, 25 March 2021

Advancing Azure business continuity management

Azure Exam Prep, Azure Preparation, Azure Certification, Azure Tutorial and Material

How we define a “service” for our BCM program

If you ask three people what a service is, you may get three different answers. At Microsoft, we define a service (business process or technology) as a means of delivering value to customers (first- or third-party) by facilitating outcomes customers want to achieve.

To ensure the highest level of resiliency for each of our “services” we include:

◉ People: The people who are responsible for providing the service.

◉ Process: The methodology used to provide the service.

◉ Technology: The tools used to deliver the service or the technology itself delivered as the value.

Customers see our services as product offerings that are comprised of various bundled services. Each individual service is mapped in our inventory and run through the BCM program to ensure that the people, processes, and technologies for those services are resilient to a variety of failures.

Our end-to-end program identifies, prioritizes, maps, and tests every service providing more than “box checking” compliance. Instead, we focus on a broad understanding of how to provide the best service to our customers who demand reliable service offerings for their business.

How the BCM program is managed in practice

Through a sophisticated set of tooling, every service (both internal and external facing) is uniquely mapped and shared with a string of compliance tooling addressing privacy, security, BCM, and more. This ensures that every service contains sharable meta-data for other tools regardless of type or criticality.

In the context of this post, records are automatically ported to our BCM management tool. Once there, they are automatically scoped for disaster recovery (DR) requirements that meet certifiable standards and to deliver on our customer promises. These records contain the most familiar elements of a BCM program, including business impact analysis, dependencies, workforce, suppliers, recovery plans, and tests. In addition, we provide insight into potential customer impacts, detection capabilities, and willingness to failover.

Testing recoverability

No amount of tooling, policies, or documents can provide the same level of confidence in service recovery and sustainability as comprehensive testing. Azure services test at various levels ranging from individual unit tests, all the way to complete "region down" scenarios. Every service must show proof of testing and that their recovery meets their stated goals—both internally and what we guarantee to our end customers in the Service Level Agreements (SLAs). Tabletop testing, in which simulated emergencies are merely discussed, is not considered acceptable or compliant for our program.

Our most robust integrated testing takes place in our “Canary” environment that consists of two distinct production datacenter regions: one in Eastern Ubited States and the other in Central United States.

On a regular basis, we test service recovery with a complete zone or region shutdown (simulating a major production outage or catastrophic loss), forcing all services to invoke their recovery plans. These tests not only verify service recoverability, but also test our incident response team’s processes for managing major incidents. For Availability Zones, we test and verify the seamless continuation of service availability in the face of an entire zone loss. These are end-to-end tests that include detection, response, coordination, and recovery.

All processes from detection to response and action are performed as if it were a real service-impacting event. Service responders are the normal on-call engineers. Additionally, we also test synthetic customer responsible functions, such as virtual machine (VM) failover to paired regions, ensuring customer workloads can operate in large scale failure scenarios.

Availability Zones—our highest level of seamless availability

With more Azure regions becoming zone-enabled, our customers have additional options for resiliency with the highest level of availability supported by SLA and in-region disaster recovery without the need to failover out of region. Advantages include:

◉ Customers can have the highest level of availability and transparent recovery in a zone down situation.

◉ Data is synchronously replicated—no data loss due to async to another region.

◉ No potential for latency due to secondary region distance.

Customers can leverage regional high-availability, multi-region remote disaster recovery or both. This "belt and suspenders” path provides the highest level of assurance that services will be resilient regardless of impact. Coupling high availability of Availability Zones with the out of region option to a remote location as a failsafe to the most catastrophic regional events.

Just as we do robust testing for cross-region disaster recovery, we perform the same diligence to our zone enabled services. Using our Canary regions, we are able to perform end-to-end zone down drills proving our capabilities in providing the best reliable services to our customers.

Compliance

The Microsoft BCM program follows all industry and government standards—addressing identification of services, calculating impact (recovery time objective or or recovery point objective), dependency mapping, concise disaster recovery plans, and testing those plans. These plans are reviewed at every level and verified via comprehensive end-to-end testing.

The program itself has achieved dozens of industry and government certifications, including ISO 22301 which is the highest standard a program can achieve. In fact, to date, Azure is the only cloud service provider to achieve this rating.

Azure has been able to achieve these ratings by ensuring we have the following elements to maintain a successful and value add program:

◉ Leadership support and awareness at every level.

◉ Extensive policy, standards, and training documentation.

◉ Dedicated BCM practitioners with experience in driving a mature program.

◉ Transparent reporting and gap analysis driving informed decision making.

◉ Comprehensive testing of services ensuring that what we measure is accurate.

◉ Modern tooling driving the high-volume scalability ensuring compliance in the program.

Source: microsoft.com

Microsoft MTA 98-361: Tips to Let You Pass This Exam

Microsoft Software Development Fundamentals 98-361 Certification helps you build and enhance your Information Technology career. With this certification, you will improve your skills in Data Types, Visual Studio, Decision Structures, Repetition, Error Handling, Classes, Inheritance and Polymorphism, Encapsulation, Algorithms, Data Structures, and Web Page Development.

Tips & Tricks To Pass MTA 98 361 Exam

When you are about to take a certification exam like MTA fundamentals exam 98-361, it is essential to prepare in advance. Strictly following the below-mentioned tips will help you pass your Microsoft 98-361 exam on the first attempt.

1. Determine Your Learning Style

Everyone has different learning style and styles that are productive for them. Some may find they even have a dominant learning style. Others favor different learning styles in different situations. There is no right or wrong answer to which learning style is best for you – or a blend of learning styles. By discovering and a more solid understanding of your learning styles, you can apply techniques that will improve your learning speed and productivity.

2. Dedicate Sufficient Time

The next step is to get you ready to crack the Microsoft Software Development Fundamentals, 98 361 exams by investing time and hard work in studying. Please take this step very sincerely.

Sitting for the Microsoft Technology Associate (MTA) demands putting yourself in a mental state for exam preparation. It indicates putting aside a regular time and being persistent with it. It means making an actual effort to do and fulfill the course requirements.

3. Prioritize Challenging Microsoft 98-361 Exam Syllabus Topics

Within every learning activity, there are exam topics you will find easy, or you already know it. The same applies to taking the 98-361 software development fundamentals. As much as possible, concentrate more on subjects you find more difficult.

Once you have gathered your study material for the MTA exam (98-361) and you’re ready to begin, you need to point out the topics that you find challenging and give more attention to them.

However, this doesn’t mean you should overlook other topics. The core purpose is to manage your time adequately by determining your priorities. Treat your study sessions as if you’re taking the real exam.

4. Get a Comfortable Study Area

Choosing the right place to study is also important. As much as possible, remove any situation or thing that can divert your mind. Ensure you only have the things you require, such as your study guide, laptop, water bottle, etc.

5. Take Training Course for the MTA 98-361 Exam Preparation

There is plenty of online platforms that offer courses at a fair price. With just a simple google search, you can find and utilize a lot more content from the different websites to ease your exam preparation. Taking training courses helps you to be planned and directional about your study content.

Studying on your own may be excellent for obtaining general knowledge. Taking courses, however, specifically makes you ready to pass your certification exam.

6. Utilize Microsoft 98-361 Practice Test

With exams like the Microsoft certification that need practical knowledge and skills, the best learning approach is to take practice tests. The Microsoft Software Development Fundamentals 98-361 exam involve hands-on practices.

Is the MTA 98-361 certification worth the Effort?

The Microsoft Software Development Fundamentals 98-361 certificate is one of the most renowned certifications available presently. It’s not just to get an entry-level job. This is also the prime IT mark you can have on your CV.

The MTA 98-361 exam syllabus provides fundamental training and experience that many IT industries are looking for. This certification has an application at the utmost levels in the IT industry.

With MTA certification exam 98 361, you can improve your chance of receiving jobs at any IT-dependent organization. Any site or page that launches may need a specific skillset from a software development expert. The 98 361 software development fundamentals certification solves that purpose.

Conclusion

The MTA 98 361 software development fundamentals exam helps in the navigation of most IT fundamentals. Having a Microsoft certification definitely helps you stand out among the competition and have a little more confidence in what you can gain and achieve with your skillset.

Tuesday, 23 March 2021

Harness the power of data with Azure Data and AI

Azure Exam Prep, Azure Tutorial and Material, Azure Certification, Azure Preparation

For any organization to succeed in a world of unprecedented uncertainty, a new level of agility is required. Core to that agility is the ability to quickly gain insights from data at any scale, power ultra-low latency applications that deliver personalized experiences, and empower all users in an organization, regardless of size. This requires a computing platform that provides limitless scale, performance, and possibilities for what can be achieved with data.

More Info: AZ-900: Microsoft Azure Fundamentals

To help organizations achieve these limitless capabilities, we shared several announcements today at Microsoft Ignite:

Unmatched analytics

Our commitment to customers is to make analytics in Azure the most performant and secure experience it can be. Azure Synapse Analytics is our limitless analytics service that brings together data integration, enterprise data warehousing, and big data analytics in a single experience, enabling data professionals to unlock timely insights to increase agility.

To help customers simplify their migration experience to Azure Synapse, we are announcing Azure Synapse Pathway. With a few clicks, customers can scan their source systems and automatically translate their existing scripts into TSQL. What used to take weeks or months can now be accomplished in minutes. Azure Synapse Pathway will support customers migrating from Teradata, Snowflake, Netezza, AWS Redshift, SQL Server, and Google BigQuery—enabling them to get up and running with Azure Synapse faster than ever.

Comprehensive data governance

To enable customers to discover and govern data better than ever before, we introduced Azure Purview, our unified data governance service, in December. Since its debut, customers have used the services to automatically scan, discover, and classify over 14.5 Billion data assets to get a holistic view of their data estates. Customers can now use Azure Purview to scan Azure Synapse workspaces across serverless and dedicated SQL pools. Synapse users can now also break down operational siloes more effectively than ever before, with the ability to natively discover data with a Purview-powered search within their Synapse workspaces.

Starting today, customers can now automatically scan and classify data residing in Amazon AWS S3, as well as data residing in on-premises Oracle DB and SAP ERP instances. This is in addition to Teradata, SQL Server on-premises, Azure data services, and Power BI, which have been supported data sources since Azure Purview’s debut.

Your apps. Faster

It’s now easier for customers to bring hybrid capabilities and cloud scale to their Apache Cassandra deployments using the new Azure Managed Instance for Apache Cassandra, a new addition to our NoSQL portfolio. Customers can easily take on-premises Apache Cassandra workloads and add limitless cloud scale while maintaining full compatibility with the latest version of Apache Cassandra. Their deployments gain improved performance and availability, while benefiting from Azure’s security and compliance capabilities.

To make it simpler for developers to take advantage of multi-document transactions and retriable writes, Azure Cosmos DB now provides MongoDB v4 transaction support. Azure Cosmos DB also offers additional enterprise-grade enhancements like continuous backup with point-in-time restore and new role-based access controls for enhanced security. We also announced the general availability of Azure Synapse Link for Azure Cosmos DB, which enables near real-time analytics on operational data for the Azure Cosmos DB’s Core API and API for MongoDB.

Customers often depend on caching to meet increasing traffic demands on their applications. To support new use cases and apps with limitless global scale, Azure Cache for Redis Enterprise tiers are now generally available supporting larger cache sizes of up to 13 TB, and support for Redis 6.0. Active Geo-Replication in preview, enables global caches with multi-primary writes that are designed to deliver up to 99.999 percent availability.

Azure is the only public cloud that natively offers these comprehensive features to turbo charge your applications.

AI-powered search at your fingertips

Azure Cognitive Search is the only cloud search service with built-in AI capabilities that enrich all types of information to easily identify and explore relevant content at scale. We are announcing a new semantic search capability that brings our AI-powered search service to a whole new level enabling organizations to offer highly relevant search experience in their apps. These are the same capabilities that power search engines like Bing that are now available for any developer to use.

Enhanced migration experience

We hear from many organizations across industries that SQL continues to be the preferred database for critical workloads. That’s why we are making it easier than ever to migrate to Azure SQL using Azure Migrate, a central hub for Azure cloud migrations. Azure Migrate offers tools to discover, assess, and migrate workloads to the cloud with a few simple clicks. Customers can now easily understand all the technical and financial aspects of their cloud journey, by assessing their SQL source and destination landscapes at scale, with integrated SKU recommendations and cost estimates included.

Transform your business

Today’s exciting announcements are just part of what Azure can do to help your organization unify, manage, govern, and gain insights on all your data to improve business performance. Your data holds so much potential. We look forward to seeing what you do with it. Go limitless with Azure.

Source: microsoft.com

Saturday, 20 March 2021

Azure Defender for Storage powered by Microsoft threat intelligence

With the reality of working from home, more people and devices are now accessing corporate data across home networks. This raises the risks of cyber-attacks and elevates the importance of proper data protection. One of the resources most targeted by attackers is data storage, which can hold critical business data and sensitive information.

Read More: AZ-600: Configuring and Operating a Hybrid Cloud with Microsoft Azure Stack

To help Azure customers better protect their storage environment, Azure Security Center provides Azure Defender for Storage, which alerts customers upon unusual and potentially harmful attempts to access or exploit their storage accounts.

What’s new in Azure Defender for Storage

As with all Microsoft security products, customers of Azure Defender for Storage benefit from Microsoft threat intelligence to detect and hunt for attacks. Microsoft amasses billions of signals for a holistic view of the security ecosystem. These shared signals and threat intelligence enrich Microsoft products and allow them to offer context, relevance, and priority management to help security teams act more efficiently.

Based on these capabilities, Azure Defender for Storage now alerts customers also upon the detection of malicious activities such as:

◉ Upload of potential malware (using hash reputation analysis).

◉ Phishing campaigns hosted on a storage account.

◉ Access from suspicious IP addresses, such as TOR exit nodes.

In addition, leveraging the advanced capabilities of Microsoft threat intelligence helps us enrich our current Azure Defender for Storage alert and future detections.

To benefit from Azure Defender for Storage, you can easily configure it on your subscription or storage accounts and start your 30-day trial today.

Cyberattacks on cloud data stores are on the rise

Nowadays, more and more organizations place their critical business data assets in the cloud using PaaS data services. Azure Storage is among the most widely used of these services. The amount of data obtained and analyzed by organizations continues to grow at an increasing rate, and data is becoming increasingly vital in guiding critical business decisions.

With this rise in usage, the risks of cyberattacks and data breaches are also growing, especially for business-critical data and sensitive information. Cyber incidents cause organizations to lose money, data, productivity, and consumer trust. The average total cost of a data breach is $3.86 million. On average, it takes 280 days to identify and contain a breach, and 17 percent of cyberattacks involve malware.

It’s clear that organizations worldwide need protection, detection, and rapid-fire response mechanisms to these threats. Yet, on average, more than 99 days pass between infiltration and detection, which is like leaving the front door wide open for over four months. Therefore, proper threat intelligence and detection are needed.

Azure Defender for Storage improved threat detections

1. Detecting upload of malware and malicious content

Storage accounts are widely used for data distribution, thus they may get infected with malware and cause it to spread to additional users and resources. This may make them vulnerable to attacks and exploits, putting sensitive organizational data at risk.

Malware reaching storage accounts was a top concern raised by our customers, and to help address it, Azure Defender for Storage now utilizes advanced hash reputation analysis to detect malware uploaded to storage accounts in Azure. This can help detect ransomware, viruses, spyware, and other malware uploaded to your accounts.

A security alert is automatically triggered upon detection of potential malware uploaded to an Azure Storage account.

Azure Defender, Microsoft Threat Intelligence, Azure Exam Prep, Azure Certification, Azure Guides, Azure Preparation

In addition, an email notification is sent to the owner of the storage account:

Azure Defender, Microsoft Threat Intelligence, Azure Exam Prep, Azure Certification, Azure Guides, Azure Preparation

It’s important to notice that, currently, Azure Defender for Storage doesn’t offer malware scanning capabilities. For those interested in malware scanning upon file or blob upload, they might consider using a third-party solution.

“Azure Blob Storage is a very powerful and cost-effective storage solution, allowing for fast and cheap storage and retrieval of large amounts of files. We use it on all our systems and often have millions of documents in Blob Storage for a given system. With PaaS solutions, it can, however, be a challenge to check files for malware before they are stored in Blob Storage. It is incredibly easy to enable the new “Malware Reputation Screening” for storage accounts at scale, it offers us a built-in basic level of protection against malware, which is often sufficient, thus saving us the overhead to set up and manage complex malware scanning solutions.”—Frans Lytzen, CTO at NewOrbit

In addition to malware, Azure Defender for Storage also alerts upon unusual upload of executable (.exe) and service package (.cspkg) files which can be used to breach your environment.

2. Detecting phishing campaigns hosted on Azure Storage

Phishing is a type of social engineering attack often used to steal user data, including login credentials, credit card numbers, and other sensitive info. Email phishing attacks are among the most common types of phishing attacks, where cybercriminals spread a high volume of fake emails designed to trick visitors into entering their corporate credentials or financial information into a web form that looks genuine or to download attachments containing malware, such as ransomware.

Email phishing attacks are becoming more sophisticated, making it even harder for users to distinguish between legitimate and malicious messages. One of the ways attackers use to make their phishing webpages look genuine, both to users and security gateways, is to host those pages on certified cloud storage, such as Azure Storage.

Using dedicated storage accounts to host the phishing content makes it easier to detect and block such accounts. So, attackers constantly try to sneak their phishing content and webpages into others’ storage accounts that allow uploading content.

Microsoft threat intelligence amasses and analyzes several signals to help better identify phishing campaigns, and now Azure Defender for Storage can alert when it detects that one of your Azure Storage accounts hosts content used in a phishing attack affecting users of Microsoft 365.

3. Detecting access from suspicious IP addresses

The reputation of client IP addresses that access Azure Storage are continuously monitored. These reputations are based on a threat intelligence feed which contains data from various sources, including first and third-party threat intelligence feeds, curated from honeypots, malicious IP addresses, botnets, malware detonation feeds, and more, also including analyst-based observations and collections.

This provides another layer of protection for Azure Storage as customers are alerted when IP addresses with questionable reputations access their storage accounts. Moreover, existing alerts such as access from unusual locations are enriched with information about the reputation of this anomalous client IP address. Consequently, customers now receive alerts with better explanations, as well as elevated fidelity and severity.

Figure 1 illustrates how access to storage is analyzed by examining the reputation of the client IP address according to this feed.

Azure Defender, Microsoft Threat Intelligence, Azure Exam Prep, Azure Certification, Azure Guides, Azure Preparation

Figure 1: Enriching Azure Storage Service access logs with the reputation of client IP.

This new alert has been vital in revealing and preventing cyber-attacks, which may have otherwise caused severe damage, as observed in two real customer case-study scenarios described below. 

First case study: Detecting malicious access to critical customer data


Figure 2 depicts a hybrid architecture in which on-premises machines are monitored using Security Center Log Analytics agent. These machines access a storage service via a gateway that has an external IP address and is installed on-premises. The storage is protected by a firewall, which permits access only from the dedicated gateway.

Azure Defender, Microsoft Threat Intelligence, Azure Exam Prep, Azure Certification, Azure Guides, Azure Preparation

Figure 2: Attack on a hybrid environment uncovered by tracking IP reputation.

Two on-premises machines were infected by malware. Although the malware remains undetected, the compromise was exposed by observing the two machines initiate access to a honeypot via the gateway. Our Azure Defender for Storage service used the TI feed about IPs that have accessed a honeypot network. The customer was alerted accordingly, preventing a situation in which compromised machines will access critical customer data. Note that the firewall that was setup was not enough to guarantee that compromised machines will not be able to access critical data. Hence this detection was vital in uncovering the breach.

Second case study: Identifying potential malware infection from virtual machines


Figure 3 depicts a virtual machine (VM) infected with a bot spreading an innocent-looking malware to SMB-protocol enabled file systems (such as Azure File Storage). The malware can be anything from an executable file or DLL to an Excel or Word file with macros enabled. The infected virtual machine's communication to its Command and Control Server is intercepted and reported to the TI feed. Azure Defender for Storage flagged access from the VM’s IP address even though it’s not hosted in Azure. As soon as the infected VM copied a file to a protected Azure Storage account, the incident was reported as an alert to the customer, who immediately mitigated the risk preventing further infection to customer machines.

Azure Defender, Microsoft Threat Intelligence, Azure Exam Prep, Azure Certification, Azure Guides, Azure Preparation

Figure 3: A Keybase-infected VM stores a malicious file in Azure Storage.

Source: azure.microsoft.com

Thursday, 18 March 2021

Our commitment to expand Azure Availability Zones to more regions

Azure Tutorial and Material, Azure Learning, Azure Certification, Azure Prep, Microsoft Preparation

The cloud continues to play a critical role in our everyday lives. Our customers range from classrooms and small businesses to critical life and safety services and Fortune 500 companies. Microsoft’s cloud infrastructure—the largest in the world—is the backbone for many of these experiences which have been essential to connecting people, businesses, and governments and running mission-critical applications. We believe that a trusted cloud is one that is secure, reliable, and supports regulatory compliance. We continue to build Azure to support customer needs for low-latency, high-availability cloud services and with the ability to both store and process data within a country or geography.

In 2020, Microsoft announced development of new Azure datacenter regions that will bring local low-latency, data-resident cloud solutions to 14 new countries and expanding global availability. Already in 2021, we’ve continued our plans to expand our datacenter regions to new markets like Indonesia and grow in existing markets in the United States and China. As we continue to bring local cloud services to more countries, we are doing so with resilience and high availability in mind.

Key takeways

◉ By end of 2021, every country in which we operate a datacenter region will deliver Azure Availability Zones (AZs).

◉ Every new datacenter region we launch going forward will include Azure Availability Zones.

◉ Over the last 12 months we have enabled Availability Zones in five datacenter regions, and this week we launched Availability Zones in Brazil South.

◉ We are continuing to expand zonal capabilities, and in 2021, all foundational and mainstream Azure services will be AZ enabled.

◉ We recently launched the Azure Well-Architected Framework—a set of guiding tenets that can be used to improve the quality and resilience of a workload.

Bringing Availability Zones to every country we operate in by end of 2021

By end of the 2021, every country in which we operate a datacenter region will include at least one region with Azure Availability Zones architecture. Additionally, every new datacenter region we launch going forward will have AZs. A datacenter region is made up of multiple datacenter facilities with redundant power, cooling, and networking—customers leverage this infrastructure to ensure their applications and services are resilient and performant through Azure Availability Zones. AZs, comprising of a minimum of three zones, allow customers to spread their infrastructure and applications across discrete and dispersed datacenters for added resiliency and high availability. We’re doing this to ensure that every Azure customer has access to highly resilient services to support their most important workloads and processes.

Over the last 12 months, we have enabled AZs in five datacenter regions, and this week we launched Availability Zones in Brazil South. Our approach in designing our Microsoft datacenter regions using Availability Zones is to support synchronous replication, while ensuring physical separation to offer protection and isolation from localized failures, which can range from mechanical or electrical issues, structure fires or flooding, or any unforeseen disaster. We ensure the customer impact of using Availability Zones is minimal to none with a latency perimeter of less than two milliseconds between Availability Zones. And we encrypt all data that is traversing within or between regions to ensure the use of Availability Zones conforms to the highest security standards. As part of our design process, we utilize more than 30 viability and risk-based criteria to evaluate the placement of each of the three Availability Zones. This process identifies both significant individual risk, as well as considering collective and shared risk between AZs. This careful approach ensures Azure always delivers both a secure and resilient environment.

Zonal capabilities for all mainstream Azure services

AZs are a combination of physical and logical infrastructure and many of our core Azure services provide the necessary support for deploying, building and operating highly available applications. AZs are an important resilient architecture for our customers and Microsoft’s own internal workloads. Azure services deploy into AZs, providing customers the assurance that applications and processes using Azure services include all the additional resiliency benefits regardless of their choice to use AZs for their own applications. 

We are continuing to expand zonal capabilities, and in 2021, all foundational and mainstream Azure services will be AZ enabled. AZ enabled services are designed to provide the right level of flexibility and can be configured to be either zone-redundant (with automatic replication across zones), zonal (where instances can be pinned to a specific zone), or both.

Expanding customer options for business continuity and reliability

Azure Tutorial and Material, Azure Learning, Azure Certification, Azure Prep, Microsoft Preparation
Capabilities like AZs are critical to customers requiring highly available applications to keep critical infrastructure running and available. Customers like National Australia Bank (NAB) will run 1,000 applications on Azure. NAB will co-design, develop, and invest in new cloud-based payments and customer services that take advantage of the resilience capabilities in Azure Availability Zones as part of their five-year strategic partnership with Microsoft.

“Trust and resilience are critical for the financial services industry to meet both the regulatory requirements of APRA and customer expectations. The investment that Microsoft continues to make in its Azure Availability Zones gives us, our customers and regulators peace of mind that systems will be available, and data will be protected.”—Steve Day, Executive General Manager Infrastructure, Cloud and Workplace, NAB

Azure offers a wide variety of options to support customers in architecting for resiliency, providing customers with the flexibility to choose between data-resident, distance-separated disaster recovery across regions, in-zone, and server and rack redundancy. For local and zonal disaster recovery (DR), Azure Site Recovery makes it possible to replicate and orchestrate the failover of applications in Azure within Azure Availability Zones.

We recently launched the Azure Well-Architected Framework—a set of guiding tenets that can be used to improve the quality of a workload. Reliability is one of the five pillars of architectural excellence alongside cost optimization, operational excellence, performance efficiency, and security. If you already have a workload running in Azure and would like to assess your alignment to best practices in one or more of these areas, try the Azure Well-Architected review.

Source: azure.microsoft.com

Wednesday, 17 March 2021

Helpful Tips For Microsoft MD-100 Exam Takers

Microsoft MD-100 is developed to assess the applicants' skills and expertise to carry out specific technical tasks. To crack this exam, the applicants should be able to deploy Windows and manage it, configure connectivity, and manage data and devices. It is vital to mention that those who have already passed the Microsoft 70-698 exam before its expiration don't have to opt for this certification exam. Otherwise, you should go ahead to sit for Microsoft MD-101 to authorize you to obtain the Microsoft 365 Certified: Modern Desktop Administrator Associate certification.

Important Details about Microsoft MD-100 exam

The applicants for Microsoft MD-100 are the administrators with expertise in deploying, configuring, monitoring, securing, and managing devices & clients' applications in the enterprise environment. Apart from this, they maintain identity, policies, apps, updates, and access. As certified professionals, they commonly team up with the Microsoft 365 enterprise administrators to execute and design a device strategy that satisfies modern organizations' business requirements. To be thoroughly prepared for this certification exam, you should be well-versed with Microsoft 365 workloads. You should have the competence and a definite level of skills in maintaining, configuring and deploying Windows 10 and Windows 11, along with non-Windows technologies and devices.

This exam is part of the prerequisites for achieving the Microsoft 365 Certified: Modern Desktop Administrator Associate certification. The applicants should be thoroughly prepared before sitting this exam to avert its retaking. There are quite a good number of study materials available for this exam. Explore the study materials available on the Microsoft webpage to begin your preparation in earnest. Many other platforms also offer a lot of updated resources that you can use to prepare for the Microsoft MD-100 exam. This platform can give you the best practice tests with answers from IT specialists and video tutorials designed by experienced professionals.

Microsoft MD-100 is only available in the English language, and the applicants planning to sit for this exam are needed to pay$165 as the registration fee. After this, you can go ahead to register for your preferred date and time for the exam. As stated earlier, the individuals must understand the exam content and topics before passing this certification exam. Understanding the objectives is very important for smooth exam preparation. That is why let's have a look at the objectives you have to master before you attempt Microsoft MD-100.

Details of Microsoft MD-100 Exam Objectives

  • Install and configure Windows (20-25%)
  • Configure and manage connectivity and storage (15-20%)
  • Maintain Windows (30-35%)
  • Protect devices and data (25-30%)

Major Benefits of Passing Microsoft MD-100 Windows Client Exam

There are so many benefits that the Microsoft certificates and their exams can fetch, especially the ones provided by Microsoft. Thus, the certification that you can achieve by passing the MD-100 exam can get you the following benefits:

  • Your skills get confirmed, and many opportunities get opened up.
  • By achieving the certification, the applicants are well prepared for professional-level jobs. Most of the leading organizations set standards that are connected to the certifications. To work for them, you must have one or two.
  • There's no rocket-science in knowing that the individuals with the MD-100 Windows Client certification have greater chances of getting more career opportunities.

Helpful Preparation Resources for the Microsoft MD-100 exam

1. Instructor-Led Training

One of the most effective ways Microsoft helps you get ready for the certification exam is by taking instructor led training course. You get access to some of the most experienced and proficient instructors who can help you successfully pass the exam. Select various courses dedicated to installing, configuring, and maintaining Windows 10 and Windows 11 for a captivating learning session. As you are in a one-on-one interaction with the tutors, you can ask them questions straightaway. The training method concretely equips you with the skills required to fulfill enterprise-level requirements.

2. Online Self-Paced Training

If you have a busy work life and prefer self-studying, you can go for the Microsoft self-paced training course. In this situation, you get access to the resources that you use up at your speed. The approach perfectly fits into the life of those tight on time and budget. You do not pay anything for this training.

3. Online Communities

A simple online search can reveal various forums that give valuable learning opportunities. The good thing about them is that you can combine them with either self-paced or instructor-led training. If you have any questions, do not hesitate to post them on these forums, and the participants will advise possible solutions.

4. Utilize MD-100 Practice Tests

You need to make sure that you have a perception of what to anticipate in the exam. Going through MD-100 practice tests can explain what the questions look like in Microsoft MD-100. Practice tests will help you gauge your current skill set level and preparation. It will also give you a perception of preparedness since you will learn just the type of exam questions and the entire exam.

Also, keep in mind that learning from various sources and assigning sufficient study time will help you a lot. Do not rote study materials; understand them thoroughly. Moreover, be more practical and less theoretical, take enough sleep on the exam night, and show up early for the exam. Only this way will you be able to get through your Microsoft MD-100.

Conclusion

Follow the tips mentioned above to boost your chances of passing the Microsoft MD-100 certification exam. And do not forget to utilize practice tests to get an overview of the exam structure and your preparedness level. If you put all your efforts into the task, you will succeed.

Tuesday, 16 March 2021

Prevent exceeding Azure budget with forecasted cost alerts

Forecasted cost alerts within budgets in Azure Cost Management and Billing is now generally available. Forecasted cost alerts provide advanced notification that your spending trends are likely to exceed your allocated budget. This empowers you to proactively investigate and make changes to your spending well in advance of actually breaching your budget at the end of the period. Budgets and alerting gives you the ability to hold your organization accountable for its spending to ensure there are no surprises with your cloud costs.

More Info: AZ-500: Microsoft Azure Security Technologies

Rest easy knowing that budgets can now alert you based off of both your actual costs and our projections of your future costs. Never get caught off guard by your spending again. In this blog, we would like to explain how to get started with configuring your first forecasted cost alert.

Configure your forecast alert using the Azure portal

Follow the steps below to configure your first forecasted cost alert using the Azure portal.

1. Navigate to the Cost Management and Billing blade in the Azure portal and select Budgets from the menu.

2. Once on the Budgets blade, click Add to create your new budget.

Azure Exam Prep, Azure Certification, Azure Learning, Azure Exam Study, Azure Preparation

3. Enter in details for your new budget, including the Scope you want evaluated, the Name, Reset Period, and the budget Amount. Click Next to begin configuring alerts.

5. Configure your alerts. You can now choose between configuring a "Forecasted" cost alert and an "Actual" cost alert. This will determine what costs are used when the budget evaluates that particular alert. You can configure both Forecasted and Actual alerts within the same budget if you are interested in having multiple thresholds that monitor your spending.

Azure Exam Prep, Azure Certification, Azure Learning, Azure Exam Study, Azure Preparation

5. Once you have configured the alert thresholds for your budget, specify one or more alert recipients and then click Create.

Receive forecasted cost alerts


Once you've configured your forecast alert and created the budget be on the lookout for emails like the one below. If you are creating a subscription or resource group budget you also can trigger alert automation by using Action Groups.

Azure Exam Prep, Azure Certification, Azure Learning, Azure Exam Study, Azure Preparation

Source: microsoft.com

Sunday, 14 March 2021

Built-in backup management at scale with Backup center

During this period of the pandemic-disrupted workplace, we have seen unprecedented growth in cloud adoption and dependence on the cloud for data protection to address business continuity and ensure resilience. We understand that protecting your data from increased ransomware attacks, adapting to increased scale with demand surges, managing costs more efficiently, and driving optimizations in overall management are top of mind for you. Azure Backup helps you achieve this by providing built-in capabilities to safeguard your data so that you can recover in the event of accidental deletion, corruption, or ransomware. In our conversations with customers, we realized that it is important for you to be able to manage data protection for an increasing cloud estate in a scalable manner.

More Info: AZ-600: Configuring and Operating a Hybrid Cloud with Microsoft Azure Stack

Traditionally, at-scale data protection activities like governance, monitoring, operating, and optimizing backup while possible, required leveraging various individual solutions like Azure Policies or Azure Backup’s vault management, monitoring, and reports. This past week at Microsoft Ignite, we announced the general availability of Backup center, a centralized backup management interface built into Azure for all your backup management needs. Backup center simplifies data protection management at-scale by enabling you to discover, govern, monitor, operate, and optimize backup management, all from one unified console, enabling you to drive operational efficiency with Azure.

Azure Storage, Azure Backup & Recovery, Azure Management, Azure Preparation, Azure Exam Prep

Imagine being able to go to one single console for your daily actions of monitoring, backing up, and restoring data sources in any subscription, resource group, location, or tenant. Imagine being able to define and track compliance with Azure policies to ensure your organization’s desired backup goals are being met. Imagine being able to generate reports on backup activities and derive insights from them to drive optimizations. Now imagine being able to do all of that from one single place. It’s now possible with Backup center.

Discover backup capabilities, samples, and guidance


Jump-start your data protection experience by selecting your workload of choice and following guided experiences to get started with Azure Backup. Access community resources to find sample templates, scripts, and policies to enhance your automation. In addition, get answers to your questions and raise feature requests. Stay updated by finding what’s new with Azure Backup right within Backup center.

Govern your backup estate in a unified manner


Bring your organization to a desired backup goal state through seamless integration with Azure Policy. Track compliance against Azure policies and create remediations. Configure specific backup policies to virtual machines (VMs) based on tag information using our new tag-based Azure policies.

Monitor and operate


With inventory views built on top of Azure Resource Graph, you can now get an overview of your entire Azure Backup estate across subscriptions, resource groups, locations, and even tenants (when using Azure Lighthouse) in a scalable and performant manner. Create custom views by querying Azure Resource Graph directly. Detect potential threats by finding insightful information like soft-deleted or stopped backup instances at a quick glance. Track jobs across job states and operations from a single view. Trigger any daily operation (one-time backups, restores, and even cross-region restores) from a single action center.

Optimize


Generate backup reports for a chosen duration with Backup reports, now generally available and seamlessly integrated within Backup center. You can also configure these reports to be sent to your email inbox periodically. Create custom reports by editing the reporting workbook and choosing data of your choice. Optimize costs with insights on policy optimizations and inactive resources. View trends of usage information and policy adherence.

Backup center enables centralized backup management for Azure Virtual Machines, SQL databases in Azure VMs, HANA databases in Azure VMs, and Azure Files.

With one single place that you can come to for all backup needs, Backup center enhances the simplicity of backup management, enabling you to get faster summaries, make rapid decisions, and drive efficiencies in backup management with Azure. Backup center is part of our ongoing investments to empower you with solutions that help you grow efficiently as your customer base reaches new peaks.

Source: microsoft.com

Saturday, 13 March 2021

Advancing failure prediction and mitigation—introducing Narya

Project Narya is a holistic, end-to-end prediction and mitigation service—named after the "ring of fire" from Lord of the Rings, known to resist the weariness of time. Narya is designed not only to predict and mitigate Azure host failures but also to measure the impact of its mitigation actions and to use an automatic feedback loop to intelligently adjust its mitigation strategy. It leverages our Resource Central platform, a general machine learning and prediction-serving system that we have deployed to all Azure compute clusters worldwide. Narya has been running in production for over a year and, on average, has reduced virtual machine interruptions by 26 percent—helping to run your Azure workloads more smoothly. 

How did we approach this before Narya?

In the past, we used machine learning to inform our failure predictions, then selected the mitigation action statically based on the failure predicted. For example, if a piece of hardware was determined to be "at-risk" then we would notify customers running workloads on it that we have detected degraded hardware through in-virtual machine notifications. We would also always perform this set of steps:

Read More: AZ-600: Configuring and Operating a Hybrid Cloud with Microsoft Azure Stack

1. Block new allocations on the node.

2. Migrate off as many of the virtual machines as possible on the fly (using live migration).

3. Wait several days for short-lived virtual machines to be stopped organically or re-deployed by customers.

4. Migrate off the remaining virtual machines by disconnecting the virtual machines and moving them to healthy nodes.

5. Bring the node out of production and run internal diagnostics to determine repair action.

Although this approach worked well, we saw several opportunities to improve in certain scenarios. For instance, some failures may be too severe (such as damaged disks) for us to wait days for virtual machines to be stopped or re-deployed. At other times, an "at-risk" prediction might be more minor or even a false positive. In these cases, forced migration would cause unnecessary customer impact, and instead, it would be better to continue monitoring further signals and re-evaluate the node after a given period. Ultimately, we concluded that to truly design the best system for our customers, we needed not only to be more flexible in how we responded to our predictions, but we also needed to measure the exact customer impact of our actions for every different scenario.

How do we approach this now, with Narya?

This is where Narya comes in. Rather than having a single pre-determined mitigation action for an "at-risk" prediction, Narya considers many possible mitigation actions. For a given set of predictions, Narya uses either an online A/B testing framework or a reinforcement learning framework to determine the best possible response.

Phase 1: Failure prediction

Narya starts by using fleet telemetry to predict potential host failures due to hardware faults. We can produce accurate predictions by using a mix of both domain-expert, knowledge-based predictive rules, and a machine learning-based method.

An example of a domain-expert predictive rule is if a CPU Internal Error (IERR) occurs twice within n days (for example, n = 30), this indicates that the node will likely fail again soon. Narya currently uses several dozen domain-expert predictive rules derived from data-driven methods.

Narya also incorporates a machine learning model, which is helpful because it analyzes more signals and patterns over a larger time frame than the predictive rules—allowing us to predict failures earlier. This builds on our prior failure prediction work but, rather than focusing on failures of individual components, this model now reviews overall host health with respect to real customer impact. Since 2018, we have also expanded the kinds of incoming signals and have improved signal quality. As a result, we have reduced the number of false positives and negatives, ultimately improving the effectiveness of this failure prediction step.

Phase 2: Deciding and applying mitigation actions

Rather than having one fixed mitigation strategy, we created a selection of mitigation actions for Narya to consider. Each mitigation action can be considered as a composite of many smaller steps, including:

◉ Marking the node as unallocatable.

◉ Live migrating the virtual machines to other nodes.

◉ Soft rebooting the kernel while preserving memory, which minimizes interruptions to customer workloads which experience only a short pause.

◉ Deprioritizing allocations on the node.

◉ And more.

For example, one mitigation action might be to mark the node unallocatable, then attempt a memory-preserving kernel soft reboot, and mark allocatable again if successful. If unsuccessful, implement a live migration and send the node to diagnostics, where we run tests to determine whether the hardware is degraded. If it is, then we send the node to repair and replace the hardware. Overall, this gives us far more flexibility to handle different scenarios with different mitigations, improving overall Azure host resilience.

To respond to "at-risk" predictions in a much more flexible manner, Narya uses an online A/B testing framework and a reinforcement learning (RL) framework to continuously optimize the mitigation action for minimal virtual machine interruptions.

A/B testing framework

When Narya conducts A/B testing, it selects different mitigation actions, compares them to a control group with no action taken, and gathers all the data to determine which mitigation actions are best for which scenarios. From then onwards, for this given set of failure predictions, it continuously selects the best actions—helping to reduce virtual machine reboots, ensure more available capacity, and maintain the best performance.

Reinforcement learning (RL) framework

When Narya uses reinforcement learning, it learns how to maximize the overall customer experience by exploring different actions over time, weighing the most recent actions the most heavily. Reinforcement learning is different from A/B testing in that it automatically learns to avoid less optimal actions by continuously balancing between using the most optimal actions and exploring new ones.

Phase 3: Observe customer impact and retrain models

Finally, after mitigation actions are taken, new data can be gathered. We now have a measure of the most up-to-date customer impact data, which we use to continually improve our models at every step of the Narya framework. Narya makes sure to do this automatically—the data not only helps us to update the domain-expert rules and the machine learning models in the failure prediction step, but also informs better mitigation action policy in the decision step.

Azure Exam Prep, Azure Certification, Azure Learning, Azure Preparation
Figure 1: Narya starts with a hardware failure prediction, makes a smart decision on how to respond, implements the response, then measures the customer impact and incorporates it via a feedback loop.

Narya in action: an example


The following is a real example in which Narya helped to protect real customer workloads:

◉ T0 20:15:31, Narya predicted the node had a high probability of failure due to disk issues.
◉ T0 20:32:01, Narya selected the mitigation action: "Mark the node as unallocatable for three days, attempt a live migration, and after all the virtual machines have been migrated or if the host fails, send the node to diagnostics."
◉ T0 20:32:11, the node was marked unallocatable, and a live migration was triggered.
◉ T0 20:47:22 – 00:11:55, nine virtual machines eligible for live migration were live migrated off the node successfully.
◉ T1 19:14:01, the node went unhealthy, and 15 virtual machines still on the node were rebooted.
◉ T1 19:55:07, the node sent to diagnostics after entering fault state.
◉ T2 00:14:12, the disk stress test failed.
◉ T3 00:19:56, the disk was replaced.

In this real-world example, Narya prevented nine virtual machine reboots and prevented further customer pain by ensuring that no new workloads were allocated to the node that was expected to fail soon. In addition, the broken node was immediately sent for repair, and there were no repeated virtual machine reboots as we already anticipated the issue. While this example is relatively simple, the main purpose is to illustrate that Narya evaluated the situation and smartly selected this mitigation action for this situation. In other scenarios, the mitigation action might involve marking the node unallocatable for a different number of days, trying a soft kernel reboot instead of a live migration, or deprioritizing allocations rather than fully marking the node as unallocatable. Narya is built to respond much more flexibly to different "at-risk" predictions, to best improve the overall customer experience.

What makes Narya different?


1. Data-driven action selection: Instead of making our best guess for the mitigation action, we are now testing and measuring the effects of each mitigation action, using data to determine the true impact of each mitigation action selected.

2. Dynamic wherever possible: As opposed to having static mitigation assignments, Narya now continuously ensures that the best mitigation action is selected even as the system changes via software updates, hardware updates, or customer workload changes, etc. For example, perhaps there is a static assignment where a predicted failure caused by a drop in CPU frequency leads us to perform a live migration. While this might be a defense mechanism to indicate an imminent failure, a recent update to the Azure platform might have the system intentionally adjust CPU frequency to rebalance power consumption, meaning a drop in CPU frequency might not necessarily mean we should perform a live migration. With a static assignment, we would accidentally apply actions that end up doing harm, as we mistakenly avoid using healthy nodes. With Narya, we will notice from A/B testing and reinforcement learning that, for this specific scenario, live migration is no longer the optimal mitigation action.

3. Flexible mitigation actions: In the past, only one given mitigation action could be prescribed for a given set of symptoms. However, with multi-tenancy and diverse customer workloads, even with expert-domain knowledge, it was difficult to determine the best mitigation ahead of time. With Narya, we can now configure as many mitigation actions as we would like and allow Narya to automatically test and select the action items best suited for different failure predictions. Finally, because we have smart safety mechanisms in place, we can also be confident that Narya's mitigation action chains will prevent any dead-locks that might lead to indefinite blocking.

Going forward


Moving forward, we hope to improve Narya to make Azure even more resilient and reliable. Specifically, we plan to:

◉ Incorporate more prediction scenarios: We plan to develop more advanced hardware failure prediction techniques covering more hardware failure types. We also plan to incorporate more software scenarios into this prediction step.

◉ Incorporating more mitigation actions: By building additional mitigation actions, we will be able to add more flexibility into how Narya can respond to a broad scope of failure predictions.

◉ Making the decision smarter: Finally, we plan to improve Narya by adding more nuance into the "smart decision" step, where we decide on the best mitigation action. For example, we can look at what workloads are running on a given node, incorporate that information into the "smart decision" step, and time our mitigation action in a manner that minimizes interruptions.