Troubleshooting Lab: Activate and monitor distributed denial-of-service (DDoS) protection
Diagnostic Scenariosβ
Scenario 1 β Root Causeβ
A company's security team reports that during an authorized load test conducted by a third-party company, legitimate traffic from real users began being dropped by Azure while the test was still in progress. The environment uses DDoS Network Protection linked to the main virtual network.
The affected public IP is pip-appgw-prod, associated with an Application Gateway. The team verified that the DDoS plan is active and that the subscription shows no quota alerts. Application Gateway logs show connections being rejected starting from minute 14 of the test.
Azure Monitor output for IP pip-appgw-prod during the incident:
MetricName : PacketsDroppedDDoS
TimeGrain : PT1M
Average : 18400
Unit : Count
MetricName : IfUnderDDoSAttack
TimeGrain : PT1M
Average : 1
Unit : Count
MetricName : PacketsForwardedDDoS
TimeGrain : PT1M
Average : 210
Unit : Count
The team also confirms that the Application Gateway WAF is in detection mode, not prevention, and that no custom rules were added to the WAF this week.
What is the root cause of the observed legitimate traffic dropping?
A) The Application Gateway WAF in detection mode interfered with DDoS packet inspection, generating incorrect drops.
B) DDoS Network Protection activated mitigation because the test traffic profile exceeded the learned limits for that IP, and no custom DDoS policy was configured to adjust these limits.
C) The DDoS plan is linked to the virtual network, but the Application Gateway operates in a delegated subnet that doesn't inherit protection from the plan, causing unpredictable behavior.
D) The load test used repeated source IPs, which activated an internal Azure rate limiting rule that operates independently of the DDoS plan.
Scenario 2 β Action Decisionβ
The problem cause has been identified: the DDoS Network Protection plan associated with virtual network vnet-prod-eastus was accidentally deleted by an operator who confused the resource with a test plan. The deletion occurred 40 minutes ago. The application exposed by public IPs in this VNet is in production and receives real traffic.
The team verifies that:
- No other DDoS plan is available in the subscription
- The subscription has Contributor permission on the network resource group
- A security alert was opened in the team's channel with high criticality
- The on-call team has access to Azure portal and Azure CLI
- The scheduled maintenance window starts in 6 hours
What is the correct action to take at this moment?
A) Wait for the maintenance window to recreate the plan with correct configurations and avoid production changes outside the approved process.
B) Immediately create a new DDoS Network Protection plan and associate it with vnet-prod-eastus, restoring coverage as quickly as possible.
C) Enable DDoS IP Protection on each public IP individually as a temporary measure while the formal plan recreation process is initiated.
D) Escalate to the architecture team before any action, as plan recreation without review could apply incorrect policy configurations.
Scenario 3 β Root Causeβ
A security analyst configured an alert in Azure Monitor to notify the team when a DDoS attack is detected on any public IP in the subscription. After two weeks, no alerts were triggered, even during a period when the resilience testing vendor reported sending volumetric traffic to the environment for 20 minutes.
The analyst presents the alert configuration:
Resource type : Subscription (subscription-level alert)
Signal : Under DDoS attack
Condition : Greater than 0
Aggregation : Average
Period : 5 minutes
Alert scope : /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
The subscription has two public IPs protected by an active DDoS Network Protection plan. The team confirms that the public IPs are correctly associated with the VNet linked to the plan. The subscription's Log Analytics Workspace recorded mitigation events during the test period.
What is the root cause of the absence of alerts?
A) The Average aggregation diluted the binary metric value below the threshold of 0, preventing alert triggering.
B) Alerts based on the Under DDoS attack metric require the scope to be defined at the individual public IP resource level, not at the subscription level.
C) The 5-minute evaluation period is insufficient for DDoS metrics, which require at least 15 minutes to accumulate mitigation data.
D) The Log Analytics Workspace recorded the events, indicating that Diagnostic Settings intercepted the data before Azure Monitor could process it for alerts.
Scenario 4 β Diagnostic Sequenceβ
An engineer receives the following report: "Public IP pip-api-backend stopped responding to external requests. There is no scheduled maintenance. The resource appears healthy in the portal."
The engineer has access to Azure portal, Azure Monitor, and Azure CLI. Below are the available investigation steps, presented out of order:
- Step P: Check
PacketsDroppedDDoSandIfUnderDDoSAttackmetrics in Azure Monitor for IPpip-api-backend - Step Q: Confirm if the public IP is associated with a VNet linked to an active DDoS Network Protection plan
- Step R: Check if the NSG associated with the subnet or NIC of the resource contains recent denial rules
- Step S: Check DDoS plan diagnostic logs to identify recorded attack vectors
- Step T: Confirm if the destination resource (API backend) is operational by checking its health internally
What diagnostic sequence is most appropriate for this scenario?
A) T, R, Q, P, S
B) Q, P, S, R, T
C) R, T, Q, P, S
D) P, Q, T, S, R
Answer Key and Explanationsβ
Answer Key β Scenario 1β
Answer: B
The IfUnderDDoSAttack metric with value 1 confirms that Azure detected and activated DDoS mitigation for IP pip-appgw-prod. The high volume of PacketsDroppedDDoS compared to the very low PacketsForwardedDDoS indicates that mitigation was aggressively filtering incoming traffic.
The root cause is that DDoS Network Protection learns the traffic profile of each IP over time and establishes adaptive mitigation limits. A load test that generates volume above the learned baseline is indistinguishable from a real volumetric attack to the system, especially when no custom DDoS policy was created for that IP with limits adjusted to its intensive usage profile.
The information about WAF in detection mode is intentionally irrelevant: WAF in detection mode doesn't drop traffic, only logs. This detail exists to divert diagnosis toward alternative A.
Alternative C represents a misconception about protection scope: DDoS Network Protection covers all public IPs of resources deployed in subnets of the linked VNet, regardless of delegation. Alternative D describes a mechanism that doesn't exist as described in Azure DDoS.
The most dangerous distractor is C: acting based on this belief would lead the engineer to try to "fix" the protection association, wasting time while the real cause remains untreated.
Answer Key β Scenario 2β
Answer: B
The cause is identified and stated in the scenario: the plan was deleted and the VNet has been without active DDoS protection for 40 minutes. Each additional minute without the plan represents real exposure of a production environment.
The correct action is to immediately create a new DDoS Network Protection plan and link it to vnet-prod-eastus. Contributor permission on the network resource group is sufficient to create and associate the plan. There is no technical or permission restriction preventing this action.
Alternative A is the most dangerous distractor: waiting 6 hours for a maintenance window when the production environment is unprotected is a decision that inverts the priority between process and real operational risk. Maintenance windows are for planned changes, not for restoring security controls accidentally removed.
Alternative C would be acceptable as a complement, but DDoS IP Protection requires individual IP configuration, has lesser resource coverage, and is not the most efficient approach when the goal is to restore the previous state.
Alternative D introduces unnecessary dependency: recreating a DDoS plan is a well-defined operation that doesn't require emergency architectural approval, especially when the desired state is already known.
Answer Key β Scenario 3β
Answer: B
The Under DDoS attack metric is emitted at the individual public IP resource level, not at the subscription level. Azure Monitor doesn't automatically aggregate metrics from individual resources when the alert scope is defined at the subscription without using subscription-level metrics. The result is that the alert never finds data to evaluate and remains silent.
The clue in the scenario is the combination of two facts: Log Analytics recorded mitigation events (confirming the attack was detected and protection worked) and no alerts triggered. This eliminates any hypothesis of protection failure and directs diagnosis to the alert configuration itself.
Alternative A is a sophisticated technical distractor: while Average aggregation on binary metrics can indeed dilute values in long windows, this is not the root cause here. With 20 minutes of attack and 5-minute window, the average would still be 1 during the event. The problem is earlier: the scope prevents data from reaching the alert.
Alternative C is false: there is no minimum 15-minute period restriction for DDoS metrics. Alternative D describes behavior that doesn't exist: Diagnostic Settings and Azure Monitor operate independently and don't compete for data.
Answer Key β Scenario 4β
Answer: A
The correct sequence is T, R, Q, P, S, which follows progressive diagnostic logic from simplest to most specific:
T confirms if the problem is in the resource itself or at the network layer, avoiding investigating DDoS protection when the cause might be an internal application failure.
R checks NSG rules, which are the most common and immediate cause of blocked traffic in Azure environments. They're verifiable in seconds and don't require metric analysis.
Q confirms if the IP is covered by the DDoS plan before interpreting any related metrics. Without this confirmation, DDoS metrics may be absent or meaningless.
P analyzes DDoS metrics with context already established by previous steps, allowing correct interpretation of whether there's active mitigation.
S is the most time-costly step and is only useful if previous steps confirmed that DDoS is mitigating traffic; at this point, diagnostic logs reveal vectors and help decide the response.
Alternative B starts by checking the plan before knowing if the resource is operational, which may lead the engineer to investigate DDoS protection when the real cause is an internal failure. Alternative D starts directly with metrics without context, which is inefficient and may generate hasty conclusions.
Troubleshooting Tree: Activate and monitor distributed denial-of-service (DDoS) protectionβ
Color Legend:
| Color | Node Type |
|---|---|
| Dark blue | Initial symptom (entry point) |
| Blue | Diagnostic question |
| Red | Identified cause |
| Green | Recommended action or resolution |
| Orange | Intermediate verification or validation |
To use this tree when facing a real problem, start from the root node describing the observed symptom and answer each question based on what you can verify directly in the environment. Follow the path that corresponds to the observed state, yes or no, until reaching an identified cause node. From the cause, the associated recommended action indicates the next operational step. Intermediate validation nodes indicate moments where data collection is necessary before continuing diagnosis.