Opinion

Proposal For Cryptographic Method to Rigorously Verify LLM Prompt Experiments

OverviewI propose and present proof of concept code to formally sign each stage of a turn[1] of interaction (or multiple turns) externally using asymmetric key signing (EdDSA[2]). This method directly addresses the concerns that in discussion of prompt engineering, “context engineering[3]” and general LLM exploration the chain of evidence is weak and difficult to verify with 100% certainty. BackgroundMy plan for this project started with a simple YouTube short from Michael Reeves. in this short Michael “gaslights” the LLM by directly editing the text of past turns between prompts. This causes the LLM to have a general logic breakdown due to past turns containing information that is inconsistent with the expected behavior of a RLHF[4] system. I thought to myself, this is a critical vulnerability to implement a zero trust environment for using LLMs. If a malicious actor or application can cause the LLM to believe it has said things in previous turns that it has not, it is possible that the system can be fooled into revealing information in the training set that goes against the use policy.Plan of AttackMy thinking for addressing this vulnerability identified 3 major constraints of such a zero trust LLM wrapperThe user should be cryptographic prevented from altering both text and signature blocksThe entirety of the exchange (prompt, result and if available chain of thought) should be protectedThe verification process should be relatively lightweight and not require per-user storageThe solution that I believes satisfies all 3 criterion is to model the signing process off of JWT[5] signing but with an extra detail: cross object back referencing. In other words each ‘block’ of the exchange is signed independently and there is a unidirectional a-cyclic graph of each block originating at the prompt that integrates the previous signature text into the current signature.The Cryptographic Braid is BornWhile concepts from the realm of Block Chain are borrowed for this design, it is not strictly a chain, but more of a braid. While only one node has no “previous” signature (by convention the prompt) there is not a straight path between that prompt and every single node in the resulting braid. Verification is now semi-stateless, but the entirety of the braid produces an implicit security iff: The private key that is used to generate the braid is ephemeralThere is a strict condition that all signatures present in “previous” blocks are present in the braid already (eg, you cannot introduce an independent cycle inside a signed braidThere can be only one “genesis” node, e.g. only one node can have an empty previous signatureWith these rules established the verification of such an artifact is as follows:Walk the entirety of the braid and accumulate a dictionary of blocks keyed to their signature. At each step verify the payload + the previous signature against the signature and public key. Also, at each set enforce signature uniqueness.During the walk accumulate a list of nodes with no previous signature. Verify that once every node has been visited if there is not exactly one “genesis” node the verification fails.For all “non-genesis” nodes verify that the previous signature is in the list of keys of the flattened space, if any previous keys do not exist then the verification fails. Once again walk the flattened graph and verify that each node is only visited once. This enforces the a-cyclic property of the graph. There is an example implementation in golang within the project here[6]. HypothesisProvided that this design satisfies peer review and is determined to be mathematically rigorous, this could serve as a protocol for future LLM interaction agents (even within companies like OpenAI or Anthropic) specifically designed to give rigor to any prompt or context study. Furthermore, it may allow for a check for any potential “context hacking” within LLM apis where potentially past turns are being edited server side for reasons unknown. If widely adopted discussion of pathology in model behavior can have a degree of scientific and mathematical rigor not currently observed.^a turn is a single prompt->response action when interacting with an LLM like Gemini, Claude, chatGPT, etc. It frames the LLM interaction as a ‘logic game’ between the human operator and the LLM text generation.^Edwards-curve Digital Signature Algorithm https://en.wikipedia.org/wiki/EdDSA^a term of my own creation extending the concepts of prompt engineering to multi-turn LLM interactions^Reinforcement Learning through Human Feedback https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback a widely acceptable method for LLM behavior tuning^JSON Web Token https://en.wikipedia.org/wiki/JSON_Web_Token^https://github.com/weberr13/ProjectIolite/blob/main/brain/decision.goDiscuss Read More

Related Posts

Karl Popper, meet the Hydra

AI Safety Needs Startups

The crux on consciousness

Leave a Reply Cancel reply