Monday, 26 November 2018

Best practices for alerting on metrics with Azure Database for MySQL monitoring

Whether you are a developer, database administrator, site reliability engineer, or a DevOps professional, monitoring databases is an important part of maintaining the reliability, availability, and performance of your MySQL server. There are various metrics available for you in Microsoft Azure Database for MySQL to get insights on the behavior of the server. You can also set alerts on these metrics using the Azure portal or Azure CLI.

Azure Study Materials, Azure Guides, Azure Certification, Azure Tutorial and Material

With modern applications evolving from a traditional on-premises approach to becoming more hybrid or cloud-native, there is also a need to adopt some best practices for a successful monitoring strategy on a hybrid and public cloud. Here are some example best practices on how you can use monitoring data on your MySQL server, and areas you can consider improving based on these various metrics.

Active connections


Sample threshold (percentage or value): 80 percent of total connection limit for greater than or equal to 30 minutes, checked every five minutes.

Things to check:

◈ If you notice that active connections are at 80 percent of the total limit for the past half hour, verify if this is expected based on the workload.
◈ If you think the load is expected, active connections limit can be increased by upgrading the pricing tier or vCores. You can check active connection limits for each SKU.

Azure Study Materials, Azure Guides, Azure Certification, Azure Tutorial and Material

Failed connections


Sample threshold (percentage or value): 10 failed connections in the last 30 minutes, checked every five minutes.

Things to check:

◈ If you see connection request failures over the last half hour, verify if this is expected by checking the logs for failure reasons.


◈ If this is a user error, take the appropriate action. For example, if there is an authentication failed error, check your username/password.

◈ If the error is SSL related, check the SSL settings and input parameters are properly configured.
     ◈  Example: mysql "sslmode=verify-ca sslrootcert=root.crt host=mydemoserver.postgre.database.azure.com dbname=postgres user=mylogin@mydemoserver"

CPU percent or memory percent


Sample threshold (percentage or value): 100 percent for five minutes or 95 percent for more than two hours.

Things to check:

◈ If you have hit 100 percent CPU or memory usage, check your application telemetry or logs to understand the impact of the errors.
◈ Review the number of active connections. Check for connection limits. If your application has exceeded the maximum connections or is reaching the limits, then consider scaling up computing.

IO percent


Sample threshold (percentage or value): 90 percent usage for greater than or equal to 60 minutes.

Things to check:

◈ If you see that IOPS is at 90 percent for one hour or more, verify if this is expected based on the application workload.
◈ If you expect a high load, then increase the IOPS limit by increasing storage. Storage to IOPS mapping is below for reference.

Storage


The storage you provision is the amount of storage capacity available to your Azure Database for MySQL server. The storage is used for the database files, temporary files, transaction logs, and the MySQL server logs. The total amount of storage you provision also defines the I/O capacity available to your server.

Basic General purpose  Memory optimized 
Storage type Azure Standard Storage  Azure Premium Storage  Azure Premium Storage 
Storage size  5GB TO 1TB  5GB to 4TB  5GB to 4TB 
Storage increment size  1GB  1GB  1GB
IOPS  Variable  3IOPS/GB
Min 100 IOPS
Max 6000 IOPS 
3IOPS/GB
Min 100 IOPS
Max 6000 IOPS

You can add additional storage capacity during and after the creation of the server. The Basic tier does not provide an IOPS guarantee. In the General purpose and Memory optimized pricing tiers, the IOPS scale with the provisioned storage size in a three to one ratio.

Storage percent


Sample threshold (percentage or value): 80 percent

Things to check:

◈ If your server is reaching provisioned storage limits, it will soon be out of space and set to read-only.
◈ Please monitor your usage and you can also provision for more storage to continue using the server without deleting any files, logs, and more.

Related Posts

0 comments:

Post a Comment