Alibaba Cloud Infrastructure Monitoring with Datadog

As public cloud adoption accelerates across all industries, traditional monitoring systems are evolving towards observability systems. From the data perspective, observability covers a larger breadth and depth of scope than just monitoring. Observability not only involves system metrics for monitoring alarms but also records the internal operations of the system. Standard monitoring data models inform you of an expected outcome, but with the observability data, one can quickly locate identity the root cause and recover from the issue.

Metrics, Tracing, and Logging

As Alibaba Cloud users begin the exponential expanse the security in the cloud, as well as managing operational overhead of your cloud ecosystem, many organizations have begun to discuss 3rd party tools for infrastructure monitoring. Alibaba Cloud-Monitor orchestrates events, metrics, and customizable tags but many solutions are already in use for on-premises infrastructure or other cloud providers. One of the leaders in this space and most popular platforms is Datadog. The new integrations with Alibaba Cloud are now generally available. While the Datadog Agent has always been able to provide visibility into Alibaba Cloud instances, this new integration now enables you to also monitor the health and performance of many Alibaba Cloud services, collect resource specific metadata on your resources and even collect custom tags applied to your resources. Datadog uses Alibaba Cloud Monitor APIs to collect metrics and metadata from the services included in this integration.

Here are the current services that you can extract metrics from at Alibaba Cloud:

  • Alibaba Cloud Servers Load Balancer (SLB)
  • Alibaba Elastic Compute Service instances
  • Alibaba Cloud ApsaraDB for RDS instances
  • Alibaba Cloud ApsaraDB for Redis instances
  • Alibaba Cloud Content Delivery Network (CDN) instances
  • Alibaba Cloud Container Service clusters
  • Alibaba Cloud Express Connect instances

In addition to Cloud Monitor, Datadog also collects detailed metadata and custom tags directly from each service’s API. For more details, see the service-specific documentation for ECSApsaraDB for RedisApsara DB for RDS, and Server Load Balancer. Note that ApsaraDB for Redis does not currently support custom tags.

Datadog automatically populates all of this metadata in the form of tags, so you can derive more useful insights from your metrics by aggregating them across any scope that matters to you. This means, for instance, that you can filter and analyze any ECS metric by instance type, region, or any other dimension that is accessible through the ECS API—including any custom tags you’ve added to your resources.

By installing the Datadog Agent on your Alibaba Cloud VMs, you can get even richer context around the metrics and metadata collected by our new Alibaba Cloud integration. Datadog automatically ties together metrics and tags from each cloud instance so that you can get a unified view of your dynamic infrastructure and applications. This allows you to slice and dice all metrics coming from each VM, using tags that come from the Datadog Agent, Alibaba Cloud Monitor, and Alibaba Cloud service APIs.

Datadog agent installed in Alibaba Cloud VMs

Once you install the Datadog Agent on your Alibaba Cloud VMs, you will also be able to monitor all the applications and services running on those cloud instances, with more than 450 built-in integrations. With Datadog APM, you can trace requests across distributed services and instances, giving you yet another layer of visibility into your cloud applications. The Agent can also collect logs from each VM to provide additional context for troubleshooting issues. This means, for example, that if you see a spike in CPU usage on an Alibaba Cloud VM, you can quickly investigate by inspecting underlying logs from that particular Alibaba Cloud instance.

Here is the Alibaba Cloud dashboard in Datadog:

Alibaba Cloud Dashboard in Datadog

Here are the trackable Alibaba Cloud metrics in Datadog:

Alibaba Cloud Metrics in Datadog

Many integrations, including for Alibaba Cloud, are available for Datadog and the list of data collection metrics is available here.

Setting up the Datadog Alibaba Cloud integration

This can be done through the GUI as shown here:

Datadog Alibaba Cloud Integration

It can also be done via the Alibaba Cloud API as shown below:

Fill out the following parameters to integrate Datadog with the Alibaba Cloud API:

  • Account Id

Find this by hovering over the avatar on the top right of the Alibaba Cloud console and selecting Security Settings. The account ID is displayed on the top of that page.

Alibaba Cloud Account Id
  • Access Key Id & Access Key Secret

In your Alibaba Cloud Account:

  1. Create a new user in the RAM tab with the following parameters:
    • Logon Name: Datadog
    • Display name: Datadog
    • Description: Datadog User for the Datadog-Alibaba Cloud integration
  2. Select Programmatic Access:
Alibaba Cloud Programmatic Access

3. After hitting OK, copy and paste the AccessKeyID and AccessKeySecret in the Datadog-Alibaba Cloud integration tile and click install integration.

Alibaba Cloud Access Keys

4. In your Alibaba Cloud Account, select Add Permissions for the user you just created, then add all of the following permissions:

  • AliyunCloudMonitorReadOnlyAccess
  • AliyunECSReadOnlyAccess
  • AliyunKvstoreReadOnlyAccess
  • AliyunRDSReadOnlyAccess
  • AliyunSLBReadOnlyAccess
  • AliyunCDNReadOnlyAccess
  • AliyunCSReadOnlyAccess
  • AliyunExpressConnectReadOnlyAccess

5. Press Update, and after around ~15 minutes, the metrics seen in the Metrics tab of the Datadog-Alibaba Cloud integration tile starts appearing in your metric explorer page tagged with any custom tags you add to your resources and tags found here:

They will appear here:

Datadog Metrics

6. Optional – Set Optionally Limit Metrics Collection in your Datadog-Alibaba Cloud integration tile. This comma-separated list of Alibaba Cloud tags (in the form <KEY:VALUE>) defines a filter to use when collecting metrics from Alibaba Cloud. Wildcards such as ? (for single characters) and * (for multiple characters) can be used. Only hosts that match one of the defined labels are imported into Datadog—the rest are ignored. Hosts matching a given label can also be excluded by adding! before the label.

Find a list of all the metrics and events collected by Datadog after the integration is complete here.

Datadog DingTalk integration

You can now even integrate with DingTalk groups. DingTalk, the chat application from Alibaba, can now send automatic alerts to notify of critical infrastructure and events, such as With Datadog, you can set up alerts to automatically detect significant changes in your Alibaba Cloud environment, like an unexpected deficit in available memory or a spike in 5xx errors from your Server Load Balancer instances. With this integration, the correct DingTalk group will know about the alerts in the instance they happen.

Within the Notify your team field of the alert’s configuration page, enter @dingtalk-<GROUP_NAME> for each DingTalk group you want to notify.

Datadog Dingtalk Integration

Make sure to set an alert on 5xx errors from Alibaba’s Cloud Content Delivery Network. If these alert triggers, Datadog will push the notification to the “DevOps100” DingTalk group.

The notification will include a message as well as a link to the alert within your Datadog account so that team members know exactly where to look to get more context around the issue. In this case, Datadog detected an unexpected spike in 5xx errors on our Alibaba Cloud CDN instances. If you click on the link, you can view a history of times the alert was previously triggered and then navigate to relevant metrics, logs, and distributed request traces collected from the related Alibaba services you’re monitoring with Datadog.

Steps to setting up the Datadog DingTalk integration

To integrate Datadog with a DingTalk group:

  1. In the DingTalk app, navigate to Messages, and then click on the group where you want to add a Datadog integration.
  2. In the top right corner, click the Group Settings icon (it looks like an ellipsis) and choose Group Robot.
  3. On the Group Robot menu, select Datadog and click Add.
  4. Enter a name for the robot and click Finished. This returns a webhook address.
  5. Copy the webhook address and then click Finished.
  6. On the DingTalk integration tile, enter the DingTalk group where you added the Datadog integration into the Group Name field and paste the webhook address into the Group Robot Webhook field. Group names can contain letters, digits, and underscores.
  7. Click Install Configuration (or Update Configuration).

After installing or updating the integration, you can use the @-notification feature with your DingTalk group name.

About Roopu Cloud

If you have any questions or concerns about Alibaba Cloud, you can contact us. We are experts in building and implementing cloud solutions in the Alibaba Cloud platform as well as in other Chinese cloud platforms. Let us help you!

Other related posts:

Leave a Comment

Scroll to Top