Problem statement

How can we incentivize AI powered application developers to invest in commensurate safety testing? What would a label look like if it is to be easily understood, and still meaningful?

In this context, AI-based products are those in which a language model processes any requests with untrusted data. The underlying models typically have a system card or model card available, describing a lot of technical metrics, but we don’t often see a lot of safety-specific vitals or nutrition labels for the products in which they have an impact on our lives. Is it very hard for those applications to convey trust or risk objectively?

One approach could be to report a system’s level of safety testing relative to how wide its impact may be. I’d like to call that the "Trust Pressure Ratio".

The Trust Pressure Ratio

The rate of untrusted data an application presents to an AI model (and the trust users place nevertheless in that functionality) puts pressure on the safety of that application.

What’s holding that pressure back is the amount of safety testing carried out by that product’s organization and its supply chain.

The TPR formula (which must be updated regularly) represents the scale of words or tokens presented to a model each month divided by the scale of tokens to which it was subjected during safety tests. More use is more risk unless the system is also put under a proportionately higher amount of safety testing.

TPR = Input Rate (tokens/month) / Safety Testing Volume (tokens)

For context, tech companies selling AI powered solutions for other companies to serve users typically charge by the number of requests, words, or "tokens" required to operate the service. This volume is known for operational cost reasons, and generally should not require any special work. Both companies know whether they have performed safety testing and how much (a cost is also incurred when running safety evaluation frameworks, where every eval can require millions of tokens). If we track the money, we have the numbers.

Under this proposal, the final operator would simply divide the volume or cost of usage (e.g., 1M words or tokens/month) by the volume spent on internal safety testing (e.g., 500K words or tokens).

As a condensed variant, the standard could recommend applying the “decibel” formula to this ratio (10*log10). The 1M words/month project would score a 3, and a project with similar testing but 50x more usage would score a 20, indicating higher pressure by trusting users on a model without a commensurate safety investment.

Benefits & Impact

As many studies highlighted in 2024, spending more tokens at runtime often reveals more risky capabilities. More users of a product bring more creative, unexpected, or unsanctioned ways to use it.
Risk decision-makers can compare TPR across projects, vendors, or even a ballpark in discussions with competitor counterparts.
Cross-pollination of de-risking practices (expanded in-house testing) can emerge from these discussions, reigning in outliers with insufficient testing.
Guidance to organizations and vendors could give ranges of recommended Trust Pressure Ratios to encourage additional safety testing (on either side) or withhold certain capabilities/permissions.
This method allows AI solution providers to right-size their safety testing using a soft criterion specific to their deployments’ exposure to untrusted data.

Feedback

Do you have methods to measure safety or risk in AI-powered applications? Please share them below!