Secure conversational replay buffers: A proposal for turning conversations into posts more efficiently
I love conversation. Riding on the latent energy that develops over minutes and hours, it’s easy for me to riff expansively about the ideas I love, and listen as my friends do the same. It’s usually in such moments that I realize that an idea deserves to be written down and developed.
I’m grateful I’ve had many such experiences already, one week into Inkhaven. But something seems not quite as good as it could be: how those moments get turned into writing. When I sit down later, usually I can recapture my ideas with some effort. But it can feel like wasted effort, when I think of the fluency and ease with which I covered all the points within the flow of the original conversation.
Some experienced writers may see it as a total non-issue. I get it: the more conversations I have, the more socially integrated I become, and the more I follow the advice to “write first, edit later”, the easier it is to write as though I am still in conversation, recapturing something like my original energy and fluency. I can already feel a change for the better, after a week of Inkhaven. But there’s still friction, and I’m not sure that the verbal-first orientation I currently have will ever entirely make way for a “writing is entirely satisfyingly akin to speaking” situation.
A broader skill may have always depended on certain other skills, but that does not imply that those other skills (as currently conceived) should be seen as universal prerequisites to the broader skill. Consider the skill of getting from point A to point B. At one time we used horses for this, but it is no longer necessary to brush and speak softly to most forms of transportation. Nor is it necessary any longer to crank a car to start it, nor to keep maps in the glove compartment, nor even to micromanage the process of navigation by sitting at the steering wheel and manually driving the car.1
The same is true for writing. Of course I can take notes during a conversation. I can even proactively ask to record the entire thing, or sections of it when I suspect I am about to riff productively. But both of these actions disrupt the vibe, and may make participants self-conscious in a way that interferes with expression. And the parts of the conversation that are most worth capturing are often not evident except with a small amount of hindsight, which implies an additional effort to go back and sift through recordings in the case that we attempt to solve the problem by recording everything.
Now, consider the following vision: I’m having a conversation with a friend. We speak at length and easily. Eventually we hit on a subject that’s dear to me. I riff about it for a minute (or five) and then think “wow, that was a good framing/phrasing, I wish I could keep what I just said as a draft for a post, but also I want to keep exploring this and not abandon the conversation for more than 10 seconds to stare at my notebook/laptop”. No problem. I pick up my device, and tap one or two UI elements. My friend is then presented with a dialog and does something similar. Immediately, the transcript of just that part of the conversation appears on my device.
Importantly, in this vision, the disruption to the conversation is minimal and comes at a moment of closure and not one of suspense or anticipation. And when I sit down to write my post, I have to do no additional work to get back into my words.2
How do we achieve something like this? By designing a system that is always recording, but that verifiably discards everything that the participants do not explicitly consent to keep. I’ll call this a secure conversational replay buffer, or SCRB3.
In the rest of this post, I’ll briefly outline the technical, legal, and vibe challenges of such a system. If my initial framing above sets off any privacy red flags for you then you’re in good company, and should keep reading. But note first that the version of this system that we could legally prototype today (e.g. as just a phone app) would not result in as seamless an experience, as a version based on special hardware with baked-in cryptographic verification of what information was discarded. Unless stated otherwise, I will assume that we are talking about an actual cryptographically robust SCRB; how this might actually be achieved will be outlined near the end of the post.
Vibe Concerns
The idea for this proposal first came to me this morning in conversation with some of the other Inkhaven residents. They responded a little uneasily. This is understandable, given that the proposal involves an intentional change to consent dynamics, which should make us at least a little wary. I also had not thought through the cryptographic elements yet, which are central to the privacy concerns expressed by the residents.
There are two psychological concerns I want to point out here. The first is un-self-consciousness, which is one thing this proposal intends to address. You should not need to put effort into pre-emptively worrying about asking for permission at the right time, potentially jarring the vibe, prior to the moment that you feel good about what you have said. If technology can make this possible in a secure way, it seems like a clear win for vibes.
The second: even if users claim to be comfortable with the cryptographic protections, they might still be uncomfortable with the always-on recording, at least at first. This might seem internally inconsistent, assuming verification of discards is genuinely baked in. However, formal guarantees are one thing, and user acclimation is another. It’s understandable that when recording is somehow happening all the time, and the user is not an expert on the cryptographic methods, that they might be anxious or hesitant until the technology has been socially as well as technically proven.
This feels adjacent to a notion of plausible deniability. You tacitly assume that nobody is making clandestine (perhaps illegal, depending on jurisdiction) recordings of your conversations, even though you probably aren’t taking strict measures to ensure this is not the case. If designed properly, SCRBs would be no less secure in this sense than the conversations you are already having.
Of course, it remains to be seen how well SCRBs would be socially proven, and whether observer effects and self-consciousness would persist regardless of technical guarantees.
Legality
Let’s take California as an example, because I’m currently at Inkhaven in Berkeley.
California is an all-party consent state; see California Penal Code § 632. That means it is hostile territory for any technology that records any “confidental communication” without prior consent from all of the parties of the communication, which is a crime. Here’s the definition of “confidential communication” from § 632 (emphasis mine):
“confidential communication” means any communication carried on in circumstances as may reasonably indicate that any party to the communication desires it to be confined to the parties thereto, but excludes a communication made in a public gathering or in any legislative, judicial, executive, or administrative proceeding open to the public, or in any other circumstance in which the parties to the communication may reasonably expect that the communication may be overheard or recorded
If you refer to the rest of the section, you’ll find that it’s the act of recording that’s central to the crime. It doesn’t matter if we immediately delete the recording, or do voice separation to only keep one of the parties, or if the recording happens over the phone or the internet. Consent is required to record at all.
Now, I am not a legal expert, but here’s my reading of how this will apply to a genuinely cryptographically robust SCRB: we are recording, even if it is an encrypted recording with verifiable discards. In California, as soon as we use a system without obtaining prior consent, we’re committing a crime. But this is a novel situation, because by any reasonable measure, zero harm is being done, given the SCRB’s discard guarantee. So we would appeal the criminal charge to a higher court, and attempt to establish a precedent about the use of such a harmless class of recording system.
One additional concern is whether a device might accidentally capture the confidental communication of a passerby. My understanding is shaky here but as far as I can tell, this would not be any more of a big deal than it already is for consent-to-record, because I am neither party to, nor intentionally intercepting the passerby’s communication.
It’s worth pointing out at this point that not all jurisdictions are like California. For example, in Quebec, Canada, recording is legal only if you are a participant or have consent from one party; see the Quebec Criminal Code § 184(2)(a). However, it is appropriate to design our system to do zero harm according to the strictest jurisdiction in which its users (or parties to their conversations) will reside. This legal constraint aligns with the high-trust design we intend for SCRBs anyway.
In the context of legality and in particular civil liability, it’s worth mentioning Vitalik Buterin’s concept of control as liability:
every bit of control you have is a liability: you might be regulated because of it. If you exhibit control over your users’ cryptocurrency, you are a money transmitter. If you have “sole discretion over fares, and can charge drivers a cancellation fee if they choose not to take a ride, prohibit drivers from picking up passengers not using the app and suspend or deactivate drivers’ accounts”, you are an employer. If you control your users’ data, you’re required to make sure you can argue just cause, have a compliance officer, and give your users access to download or delete the data.
If you are an application builder, and you are both lazy and fear legal trouble, there is one easy way to make sure that you violate none of the above new rules: don’t build applications that centralize control.
A robust SCRB would already record only the minimal amount of data that the users had consented to, so that’s good for exposure to unforeseen liability. And we might take this insight further and not provide a cloud service at all, to accompany a hypothetical SCRB helper app or whatever.
Mechanism design
The State of the Art
Of course, there’s good old audio recordings.4 I already mostly mentioned the limitations of these in the introduction. You could improve the granularity of recording, e.g. by getting prior consent once, and then only holding down a record button during moments you want to record, which is minimally disruptive. Still, you have to remember to get consent before you press the button. And the moment at which you decide to record is prior to actually stating your ideas, not after, so filtering of material is not built-in as it is with SCRBs.
Lifelogging is the practice of making personal records of one’s life, which can involve such technology as wearable microphones or cameras. Bloggers have been writing about it for well over a decade now.5 While the degree of coverage (e.g. text versus audio versus video) varies between users, generally the interest is not in a highly selective, salience-based, secure recording system, but instead a total record of one or more aspects of life. Of course, this does not escape the legal issues with good old audio recordings.
Contemporary wearables like the Limitless Pendant (Meta) or Bee AI (Amazon) are similar, in that they use an always-on design that relies on post-hoc filtering, and do not use a selective cryptographic system to obviate prior consent.
The closest technology to SCRBs in terms of UX is a replay buffer, such as used in live streaming or music production. Basically, data is stored to a circular buffer, which allows the user the post-hoc ability to capture data that had been recorded in the past N seconds/minutes, with anything older being continuously discarded. Existing systems are intended for individual use, however, and aren’t concerned with multi-party consent.
How to do better
This section will be necessarily very brief in this preliminary proposal, given time limitations here at Inkhaven and also my limited familiarity with the cryptographic systems literature.
We tentatively adopt Vitalik Buterin’s notion of privacy pools. In Buterin’s case, this means using zero-knowledge proofs (ZKPs) to demonstrate that your funds don’t originate from unlawful sources, without revealing your entire transaction history to the entity that desires such verification from you. In principle, a robust SCRB would be designed to produce similar cryptographic attestations that it had in fact overwritten (part of) its buffer at a given time T.
There are a number of relatively more incidental decisions that would need to be made about hardware and software design. Presumably the user interface for retention/consent decisions would either be baked in to the hardware itself (e.g. buttons and a separate screen on a small recording device) or else we’d use some kind of helper app running on a paired device, such as a smart phone. I won’t discuss this more now, as it seems quite open-ended and I imagine the specifics would be decided by better designers than me.
I have the seeds of some other ideas in my mind, but these are entirely speculative at this point. For example, we we could use sandboxed language models, whose operation may also be subject to cryptographic guarantees of some kind, to decide on “smart boundaries” instead of relying on fixed discard horizons. In principle this could be used to provide the user with richer choices about what to retain, than “last 1 min” or “last 5 mins”, which would rely on the user having memory of how long they had been talking for. However, this means transferring some information about the content of the conversation to a visual display or helper app prior to the user’s consent, and this seems like it introduces the possibility of harm.
Conclusion
This is just a preliminary outline of a proposal, written in a few hours at Inkhaven, by someone who is not a legal or cryptographic professional.
It’s possible that there would be deep issues with getting the technical side to work, and I do not understand the cryptographic mechanism well enough yet to say with more confidence whether it will work. In any case it would be an engineering challenge that some startup or other organization would need to handle, while accepting the likelihood of a criminal legal challenge to establish the precedent of these systems as zero-harm. That such a challenge is likely in at least one jurisdiction suggests that great care should be taken to ensure the systems actually are zero-harm and function as advertised prior to this challenge, to avoid potentially quite protracted legal complications.
Subvocal recording devices, or something like them, could totally obviate the need for prior consent / SCRBs in common circumstances. Some such devices are in active development, and are interesting in that they might genuinely capture only my voice and nobody else’s. This might satisfy California’s statute by design, assuming we have a clear answer to the question of whether the device records any signal from parties other than the wearer. And they also do not necessarily time-lock the retention decision to the moment of emotional closure, with filtering coming at no extra cost, nor do they necessarily encrypt anything or provide any sort of other guarantees. So even if we adopt these devices, we may still want to adopt some SCRB-like framework for better security and hygiene.
I remain optimistic that I might live to see SCRB-like systems make our writing lives more about ideas and words, and less about distracting chores.
I get that this analogy is imperfect, and that “making steering automatic” might make someone think I mean to “make writing automatic” in general. But no, that’s not quite what I mean. I know that writing is a more complex task than driving, and that what we want is to marginalize chores and distractors, not eliminate the human from the loop entirely.
This also allows for a type of feedback that I might otherwise not have gotten. Just because I was pleased with my words with a few seconds of hindsight, does not mean I will remain so enamoured. When I revisit my words, I can see exactly what I said, and perhaps a quick rating of how I felt about it. With practice, seeing this information might help me to be more calibrated on the short-hindsight horizon, about whether I should be pleased with what I’ve just said. But I’m not sure about this point.
Conveniently pronounced “scrub”.
Podcasts are perhaps worth addressing, but they serve a somewhat different purpose and are even more intentional than a simple audio recording of a conversation.
