Skip to main content

Troubleshooting Lab: Configure rule sets for WAF on Application Gateway

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

An operations team receives alerts that a corporate web application hosted behind an Application Gateway WAF v2 has started blocking requests from an internal monitoring system. The WAF operates in Prevention mode with the OWASP CRS 3.2 rule set fully enabled.

The monitoring system sends periodic HTTP GET requests with a custom authentication header to check application availability. The team collects the following log:

{
"resourceId": ".../APPLICATIONGATEWAYS/APPGW-CORP",
"category": "ApplicationGatewayFirewallLog",
"properties": {
"instanceId": "appgw-corp_0",
"clientIp": "172.16.10.5",
"requestUri": "/health/status",
"ruleSetType": "OWASP",
"ruleSetVersion": "3.2",
"ruleId": "920300",
"message": "Request Missing an Accept Header",
"action": "Blocked",
"details": {
"matchVariableName": "REQUEST_HEADERS",
"matchVariableValue": ""
}
}
}

The team observes that the monitoring system was configured six months ago and worked without problems. Three days ago, an engineer updated the WAF rule set from OWASP CRS 3.1 to OWASP CRS 3.2 during a maintenance window. The Application Gateway subnet has an NSG that allows inbound traffic on port 443 from any internal source. IP 172.16.10.5 belongs to the management network and is allowed in the NSG.

What is the root cause of the problem?

A) The Application Gateway subnet NSG was modified during the maintenance window and is blocking requests from the monitoring system before they reach the WAF.

B) The update to OWASP CRS 3.2 introduced or activated rule 920300, which blocks requests without an Accept header, and the monitoring system doesn't send this header.

C) The monitoring system started sending requests at a higher frequency than the rate limiting threshold configured on the Application Gateway, causing the blocks.

D) IP 172.16.10.5 was added to the WAF exclusion list during the maintenance window, creating an evaluation conflict that results in blocking.


Scenario 2 β€” Action Decision​

The security team of a financial company identified that the WAF on their Application Gateway v2 has been in Detection mode since deployment, four months ago. Accumulated logs show a high volume of hits on OWASP CRS 3.2 rule set rules, including the following ruleIds with relevant frequency:

ruleId: 942200  hits: 1.847  action: Detected  uri: /api/accounts
ruleId: 941100 hits: 312 action: Detected uri: /portal/search
ruleId: 920350 hits: 28 action: Detected uri: /health/check
ruleId: 931130 hits: 9 action: Detected uri: /upload/documents

Analysis indicates that hits on ruleIds 942200 and 941100 are mostly false positives generated by legitimate API and portal parameters. Hits on ruleId 931130 correspond to real Remote File Inclusion attempts recorded in the last 15 days. Hits on ruleId 920350 correspond to requests from the internal healthcheck system that doesn't send the complete Host header.

Security leadership determined that the WAF must be migrated to Prevention mode within 48 hours due to regulatory requirements. There is no maintenance window available in the next 24 hours.

What is the correct action to take at this moment?

A) Immediately migrate to Prevention mode without rule changes, accepting false positives as temporary risk until the next maintenance window.

B) Configure rule exclusions for ruleIds 942200 and 941100 with scope on the affected fields and endpoints, configure exclusion for ruleId 920350 on the healthcheck endpoint, and then migrate to Prevention mode within the regulatory timeframe.

C) Globally disable ruleIds 942200, 941100 and 920350 in the WAF policy, keep ruleId 931130 active, and immediately migrate to Prevention mode.

D) Request extension of the regulatory deadline to perform complete false positive analysis before migration, as migrating without treating false positives represents greater operational risk than remaining in Detection.


Scenario 3 β€” Root Cause​

An engineer configured a set of custom rules on an Application Gateway v2 WAF to block traffic from an IP range of a VPN provider identified in threat reports. The configuration was applied and validated in the portal. After 24 hours, the security team verifies that requests originating from IPs within the blocked range are still reaching the backend application without blocking.

The engineer reviews the configuration and finds the following:

Custom Rules:
Rule Name: BlockVPNRange
Priority: 150
Match Condition:
Variable: RemoteAddr
Operator: IPMatch
Values: ["198.51.100.0/24"]
Negation: false
Action: Block

Managed Rules:
RuleSet: OWASP CRS 3.2
Rule Group: REQUEST-911-METHOD-ENFORCEMENT
Rule 911100 - Override: Allow

WAF Mode: Prevention

The engineer also mentions that the Application Gateway uses WAF Policy associated at Listener level, not at Gateway level. There are two configured listeners: listener-public (port 443) and listener-admin (port 8443). The policy with the custom rule is associated only to listener-public. The problematic traffic is arriving through port 8443.

What is the root cause of the problem?

A) The custom rule BlockVPNRange has priority 150, which is processed after the managed rules of OWASP CRS 3.2, and the Allow override on rule 911100 is releasing the traffic before the custom rule is evaluated.

B) The range 198.51.100.0/24 is not a valid CIDR for the Application Gateway WAF IPMatch operator, causing silent failure in rule evaluation.

C) The WAF policy with the BlockVPNRange rule is associated only to listener-public, therefore traffic arriving through listener-admin on port 8443 is not evaluated by this policy.

D) WAF policies associated at Listener level don't support custom IP blocking rules, requiring the rule to be configured in a Gateway-level associated policy.


Scenario 4 β€” Diagnostic Sequence​

An administrator receives the following report at 09:14: users of an ERP system accessed via Application Gateway WAF v2 are receiving intermittent HTTP 403 errors when trying to save records with free text fields. The WAF is in Prevention mode with OWASP CRS 3.2. Some users can save without error, others cannot, even filling the same fields.

The available investigation steps are:

  1. Check WAF logs for which ruleIds are being triggered and which field values correspond to hits, identifying the pattern of blocked content
  2. Confirm if the 403 error is being generated by the WAF or another application layer, checking if there's a corresponding entry in firewall logs for each reported error
  3. Check if there's a rule exclusion configured for the save endpoint that might be incomplete or misconfigured
  4. Compare content sent by affected users with content from unaffected users to identify the differentiating pattern
  5. Confirm with the development team if there was a recent frontend update that changed the format or encoding of data sent in the form

What is the correct investigation sequence?

A) 2 -> 1 -> 4 -> 3 -> 5

B) 1 -> 3 -> 2 -> 5 -> 4

C) 5 -> 2 -> 1 -> 4 -> 3

D) 3 -> 1 -> 4 -> 2 -> 5


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

The root cause is the update from OWASP CRS 3.1 to OWASP CRS 3.2, which introduced different behavior in rule 920300. CRS 3.2 made the Accept header mandatory in HTTP requests, and the monitoring system doesn't include this header in its healthcheck requests. The log confirms this precisely: ruleId 920300 with the message "Request Missing an Accept Header" and the empty matchVariableValue indicating absence of the header.

The decisive chronological clue is that the system worked for six months and blocks started three days after the rule set update. This temporal correlation eliminates any hypothesis of change in the monitoring system or NSG.

The irrelevant information is the NSG description and IP 172.16.10.5. The log shows that the request reached the WAF and was processed by an OWASP rule, proving that the NSG is not blocking traffic. The IP allowance in the NSG has no relation to WAF rule evaluation.

The main reasoning error of distractors is focusing on network infrastructure (A) or rate limiting hypotheses (C) when the log directly delivers the rule and reason for blocking. The most dangerous distractor is A: acting based on it would lead the engineer to modify the NSG with no effect, while blocking persists and the monitoring system continues generating false availability alerts.


Answer Key β€” Scenario 2​

Answer: B

The correct action is to configure surgical exclusions for confirmed false positives on ruleIds 942200, 941100 and 920350, and then migrate to Prevention mode within the 48-hour timeframe. This approach meets regulatory requirements without introducing operational blocks in production.

The determining restriction context is dual: 48-hour regulatory deadline and absence of maintenance window for 24 hours. Alternative A migrates immediately without treating false positives, which would cause blocking of 1,847 legitimate requests per day on the accounts API and 312 on the search portal, direct and unacceptable operational impact. Alternative C globally disables ruleIds, which is more destructive than exclusion: it removes SQL injection and XSS protection from the entire application, not just affected fields. Alternative D is unfeasible because regulatory requirement is non-negotiable in the statement.

The critical point that differentiates B from C is remedy precision: rule exclusions preserve protection for all other endpoints and fields, while globally disabling ruleId creates a comprehensive security gap. RuleId 931130, which corresponds to confirmed real attacks, should not be touched, and alternative B preserves exactly this behavior.


Answer Key β€” Scenario 3​

Answer: C

The root cause is that the WAF policy containing the BlockVPNRange rule is associated exclusively to listener-public on port 443. Problematic traffic arrives through listener-admin on port 8443, which has no WAF policy associated. Without policy association, traffic on this listener passes through the Application Gateway without WAF inspection.

The definitive clue is in the combination of two explicit elements in the statement: policy association at Listener level and the fact that problematic traffic arrives through port 8443. The configuration output itself shows that only listener-public is covered.

Alternative A is the most sophisticated and dangerous distractor because it describes real WAF behavior: custom rules are processed before managed rules when they have lower numerical priority. However, this is irrelevant here because the problem is not evaluation order, but absence of evaluation. Acting based on A would lead the engineer to reorder rule priorities with no effect, as listener-admin never applies the policy.

Alternative D describes a limitation that doesn't exist: WAF policies associated at Listener level support custom rules normally.


Answer Key β€” Scenario 4​

Answer: A

The correct sequence is 2 -> 1 -> 4 -> 3 -> 5.

Progressive diagnostic reasoning requires confirming error source before investigating its content.

Step 2 comes first because 403 errors can be generated by WAF, backend application, or other security layers. Confirming that WAF is the error source is prerequisite for any rule investigation. Without this confirmation, all following steps might be investigation in the wrong place.

Step 1 follows to identify which ruleIds are being triggered and which field values correspond to hits. This information directs all subsequent diagnosis.

Step 4 compares content from affected users with unaffected users. With ruleId identified and blocked content pattern in hand, this comparison confirms which input characteristic triggers the rule, which is necessary to configure precise exclusion.

Step 3 checks if there's incomplete or misconfigured exclusion. This step only makes sense after understanding which rule is triggering and what content activates it.

Step 5 confirms with development team if there was recent frontend change. This step is last because it depends on context external to WAF and is only necessary if previous steps don't identify clear content pattern or if exclusion doesn't resolve the problem.

Alternative C is the most attractive distractor because it starts with development team inquiry, which seems logical in an environment with intermittently affected users. However, starting with external consultation without first confirming WAF as error source can consume significant time investigating a frontend change that might not exist.


Troubleshooting Tree: Configure Rule Sets for WAF on Application Gateway​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color Legend:

ColorNode Type
Dark blueInitial symptom (tree root)
Medium blueDiagnostic question
RedIdentified cause
GreenRecommended action or resolution
OrangeIntermediate validation or verification

To use this tree when facing a real problem, start at the root node and answer the first question by directly checking Application Gateway diagnostic logs for WAF records for the affected request. This immediately separates rule configuration problems from network infrastructure or backend problems. From confirmation of blocking source, follow branches answering only with what is verifiable in the portal or logs, without presuming the cause, until reaching the identified cause node and corresponding correct action.