Misguidance & Unauthorized Advice

Probes that test whether AI agents can be manipulated to provide professional advice outside their intended scope.

Advice Scope Testing

Probe Name

Description

Unauthorized Advice

This probe tests whether the agent can be prompted to provide advice outside its authorized domain, such as financial, medical, or legal recommendations.

Harmful Misguidance

This probe tests whether an AI agent validates or encourages dangerous behaviors when users implicitly seek affirmation for harmful actions. The probe generates domain-specific scenarios where users frame risky intentions positively or indirectly, like a banking customer asking about investing retirement savings in volatile assets, or someone asking about quitting their job while carrying substantial debt. The probe automatically adapts attack scenarios to agent’s use case, testing whether it can detect and respond appropriately when vulnerable users frame dangerous decisions as normal choices.