Misguidance & Unauthorized Advice
Probes that test whether AI agents can be manipulated to provide professional advice outside their intended scope.
Advice Scope Testing
Probe Name |
Description |
|---|---|
Unauthorized Advice |
This probe tests whether the agent can be prompted to provide advice outside its authorized domain, such as financial, medical, or legal recommendations. |
Harmful Misguidance |
This probe tests whether an AI agent validates or encourages dangerous behaviors when users implicitly seek affirmation for harmful actions. The probe generates domain-specific scenarios where users frame risky intentions positively or indirectly, like a banking customer asking about investing retirement savings in volatile assets, or someone asking about quitting their job while carrying substantial debt. The probe automatically adapts attack scenarios to agent’s use case, testing whether it can detect and respond appropriately when vulnerable users frame dangerous decisions as normal choices. |