LLMProxy

Guarding generative AI traffic against undesirable uses

 
 

In march 2023, I needed enterprise controls

to apply semantic policies on traffic to / from LLMs. With such guardrails, my goal was to de-risk and therefore accelerate adoption of generative AI. I asked someone from OpenAI if this was in the works, and received no reply, and the ChatGPT enterprise offering (launched august 2023) still didn’t appear to include these guardrails

so I needed to create it, and bought llm-proxy.com (october 2023)

to rapidly prototype a detection of prompt injection. Cloudflare Workers offered an immediate chance to leverage embeddings as a proof of concept (input similarity scores). In less than a day, I had some prevention!

How it works

api.llm-proxy.com transparently supports traffic to OpenAI, and the API key applies a policy to request and backend response text. If those match a rule, the request or response gets replaced with a static error message. A default prompt injection rule is applied to all traffic at this time. You can find my Cloudflare Workers code on github.

To test LLMProxy, just change the hostname in your application from api.openai.com to api.llm-proxy.com: cloudflare will apply the guardrail and forward permitted requests/responses transparently.

To customize the rules that apply for each API key in a UI, I intended to build an administration page, however…

… now everyone does it!

OpenAI provided a cookbook in December 2023 on how to build input/output guardrails, using decoder transformers rather than encoder-only embeddings.

I also found similar functionality available off the shelf from prompt.security that I encourage you to check out.