Learning About Datadog MCP

May 15, 2026

With a bit of time on my hands recently between projects, I thought I’d take the opportunity to dig into some of the newer innovations in the technology space and get more across genAI tools and how these work and might be able to be used outside of a simple web chat interface. One area that we’d thrown around as potentially interesting was seeing what the capabilities were of the (still in beta) MCP server from Datadog. With a vague idea of wanting to put together a system that would be able to be hosted for a customer in their own environment, with access to their Datadog account and exploring how these systems might be able to be delivered as a service offering to clients I started looking into options. For the end to end offering I decided to work with the Datadog MCP server, Langchain to control the model and AWS Bedrock for serving the LLM itself.

The Datadog MCP Server

The Datadog MCP Server is still in preview mode, and subject to changes and updates over time but is still plenty powerful to use. For this project, using the standard connection options like Claude or Codex weren’t options I was looking to use given this would ultimately be hosted remotely and also because I wanted to be able to experiment with Langchain as part of this test. Because of that, the solution was to use the instructions for the Other option, and add in the MCP connector into the Langchain code I was creating. To help with deployability, I also opted to authenticate using with a Datadog API key and App key instead of the more common OAuth2.0 flow.

The Datadog MCP Connection Code Block

{
  "mcpServers": {
    "datadog": {
      "type": "http",
      "url": "https://mcp.datadoghq.com/api/unstable/mcp-server/mcp",
      "headers": {
          "DD_API_KEY": "<YOUR_API_KEY>",
          "DD_APPLICATION_KEY": "<YOUR_APPLICATION_KEY>"
      }
    }
  }
}

Langchain

I know a little bit of Python. Enough to be dangerous, not enough to convince anyone that I could be a software engineer. But as it so happens to be that most of the genAI systems and frameworks coming out these days are written in either Python or Typescript (or in the case of Langchain, both), it makes what would have been a very uphill struggle a bit more palatable. The Langchain docs are pretty good for getting started, and the overall documentation available is fantastic. I did run into some issues that turned out to be a limitation of the specific model I was using (below), which I incorrectly thought was an issue with my code. It was while trying to lean on some guidance from Gemini that I came across a small gotcha of LLM’s, which is that they don’t do well with newer libraries that are updating regularly. I tried querying a few different LLM’s for troubleshooting my code, and every single one was giving back references to functions used in earlier library versions that were now deprecated. Specifically in my case it was telling me to use create_react_agent instead of create_agent, which conflicted with the library in VS Code throwing warnings that create_react_agent was deprecated. Once I managed to get this resolved though, it was smooth sailing for the most part and being able to quickly reference most of the documentation and functions on the website was genuinely a fantastic experience.

AWS Bedrock

While simply connecting to ChatGPT or Claude via the normal API was possible, something that we wanted to look into as part of this work was using models provided by hyperscalers, primarily for compliance reasons. I tossed up both AWS and Azure for this, ultimately settling on AWS. There wasn’t any special thought process on this, a colleague had used the Azure AI platform previously and I thought it would be good to balance it out and try another option. So I created a free-tier AWS account, created an access key and found a model to work with. This is where I hit a snag, although I didn’t realise it at the time. AWS provide a multitude of AI models via Bedrock, and in my day-to-day activities I tend to use Gemini for any LLM-related activity. Because of this I thought I’d stick with it and selected the Google Gemma model. Bedrock lets you search based on vendor and embeddings for their serverless models, but I didn’t realise at the time that not all models have the same capability sets. Or more accurately, that tool use was a specific capability I had to check for. I should have clicked with the model being Gemma instead of Gemini, but there are very different capabilities compared to Anthropic and OpenAI’s models. So for a day and a half I was tearing my hair out over why this model wouldn’t check the MCP when I asked it to, instead returning that it should use the MCP to investigate. Given my awareness of my Python capabilities, I assumed I wasn’t checking the correct return message or that I had incorrectly coded the MCP configuration for the server. It wasn’t until I rebuilt all my code from scratch and saw it wasn’t calling any tools I provided it that I tried a different model. Not because I realised my mistake, but to see if a different model would give a different error message. Most of the documentation on Langchain references old model versions, but they all use OpenAI or Claude, so I figured if nothing else they should give more informational error messages than other options.

The combination of elation and annoyance when you realise such a small mistake is causing all your problems is profound. It was only on re-checking the capabilites of the models that I noted the tool use capability. It’s not a capability you can search or filter on, and no structure on what is written on the capabilities section for you to quickly reference - to check properly you need to go into each models page and review everything. Great thing to know before selecting a model for something like this.

Regardless, with the change of model the code all started to work and the MCP server was able to be interacted with. Playing around with a few queries for things like how many dashboards were in the account, what were the names of the services in the account and so on all came back with mostly correct answers - the dashboard count would vary by up to 5, but given there are ~280 dashboards in the account, that’s a reasonable margin of error. That’s a number I would personally handwave and answer “around 300” to, so the variance isn’t a concern for me.

But we have a v0.1 of the code created and ready to expand. While for now the code is entirely local, the plan is to containerise it, add a RAG of Datadog documentation using ChromaDB for the model to refer to for troubleshooting, and build out the ability to add to that RAG via chat messaging or email communication to build out the knowledge base. Once those parts come together we might have something usable in our day-to-day work to assist workflows and engaging with the Datadog platform without users needing to log into the platform to see what’s happening.