Replacing myself with an AI assistant for a day
A product manager’s experiment
Have you wondered recently how replaceable you are in your job? I’m not talking run-of-the-mill imposter syndrome but the vaguely ominous sensation you get as you read how AI is ‘transforming industries’ i.e. people losing their jobs. What if you’re just a trained monkey, a ‘stochastic parrot’, and a reasonably well-tuned set of prompts could basically do your job?
I decided to confront this question head on by instructing a virtual assistant to replace me as a product manager at Snowplow Analytics for a day. If they did great then I could automate some of my work while I prepared to leave tech to make furniture (in the time-honoured tradition). If they weren’t up to scratch, I’d still have a job, and at least have learned some new tips and techniques for making me more productive at work.
This ended up being a hilarious social experiment, and it seems I still have a job for a while yet.
The Setup
To get interesting results, I had to set some parameters:
- Ran the experiment on a Friday during our AI Hackathon week, where we were given some extra time to explore projects and learning around AI, hoping this would cut me some slack with colleagues.
- Asked ChatGPT to role-play as a virtual assistant called John.ai. All my interactions with others had to be in John.ai’s words — I could not communicate directly with anyone. I could regenerate the outputs until I was happy, but could not edit the final text.
- Communicated to colleagues by updating my Slack name and status, and sending a message to everyone I interacted with on that day to give them a heads-up. For video meetings, I turned my camera off, raised my hand when I wanted to “speak”, and played audio responses from John.ai.
- Tried a variety of tools (paid trials so no sensitive data was shared): ChatGPT, Gemini, Notion AI and ElevenLabs (for text-to-speech output)
- Sent out a survey at the end of the day to everyone I communicated with asking them about their experience interacting with me. While there’s an interesting debate to be had about how you can actually measure the short-term output of knowledge workers, particularly in a fuzzy role like product management, I picked satisfaction scores as my main success metric.
The Day
- 9:14am: Started work on a quiet Friday morning. Got some muted techno beats going with the help of Spotify’s recommendation algo (this is one AI I will swear by). Spent a while catching up on messages and email.
- 9:34am: Tried to generate some wireframes for a new screen to be added to our hackathon project. The results were amusingly bad, and did not improve with more prompting. In the end I just resorted to writing a very detailed prompt explaining what I wanted, then sending the response to my designer for him to figure out.
- 10am: Hackathon standup. I asked John.ai to generate a message explaining the experiment, and then copied it over to ElevenLabs for a more natural-sounding voice, but quickly realised that lag was an issue. Ended up sticking with ChatGPT entirely, although the chirpy female American voice didn’t quite feel like me.
- 10:20am: Got deep into a new idea around generating user features. This was a very productive session — I asked John.ai to come up with 10 features, then a JSON schema for them, then the SQL for them, then hacked and poked around and identified the gaps with our current spec. I felt for the first time a real sense of teamwork and collaboration — that my assistant was helping me automate the boring bits and evaluate different tradeoffs.
- 11:30am: Attended an update meeting about a new initiative with 4 senior engineers. Got a bit of a laugh when John.ai introduced himself, but it was a bit hard to contribute to the conversation due to the time lag. Someone would ask a question, I’d type a prompt for John.ai, receive a text response, press play and then hear the audio — each of these loops took about 5–10 seconds which killed the flow of conversation. I resorted to pre-generating some responses and waiting for the right time to press play, but inevitably what I said would be slightly out of sync.
- 12pm: Lunch came around and I breathed a sigh of relief. I hadn’t realised how exhausting it was not being able to talk to people directly — it was like taking a vow of silence for the day.
- 1pm: Got stuck into replying to some messages. Here the cracks really started to show. A lot of my work is high-context, short communication to help keep my team on track. I need to tell a specific person a very specific thing, and it took a lot of time to write that out exactly. I ended up typing out exactly what I wanted to say then getting ChatGPT to repeat it back to me — not exactly adding much value:
- 1:30pm: Had a 1:1 with our cofounder to talk about our H2 roadmap. This was the big test for the day — if John.ai could represent me well here then I’d be sorted. But after two minutes the cofounder said bluntly, “Sorry John.ai, can I talk to the real John? This isn’t working.” So I turned my camera on and we had a real, productive conversation for about 45 minutes where we sketched out some ideas, stress-tested them, and came to a nuanced conclusion about how to present these changes to the company.
- 2:30pm: Organised and prioritised some feedback from internal testing of a new feature. Here the AI did a reasonable job. With minimal prompting, it organised about 6 pages of feedback into a neatly categorised list, then helped me generate tickets and acceptance criteria for the most high-priority ones. A theme was becoming clear: it works great on constrained problems where it can follow best practice, but isn’t so good at deciding what to do next.
- 4:27pm: Sent out a survey asking people their opinions on how the experiment went. John.ai actually did pretty well at this, even generating some funky emojis for the different responses.
The Results
So after all that, how did it go? The results were reassuring — with an average rating of 2/5, I’m afraid to say John.ai wouldn’t have passed his probation.
Overall people didn’t seem to like interacting with John.ai very much — it’s that feeling of mounting frustration you get when you’re on a website trying to cut through the chatbot’s decision tree and talk to a real person. Accuracy seemed to be okay, but it made me appreciate just how finely tuned our speech is to small non-verbal cues, in-jokes, interruptions and body language. We’re normally a camera-on culture, and turning the camera off was also perceived as a bit hostile.
I think John.ai came across better in text than on calls, especially when I prompted it (him?) to speak more clearly and casually.
The first response to this question was actually quite sweet — good to know when you’re appreciated at work.
The Takeaway
So in summary, what didn’t go so well? While John.ai did well on individual tasks involving generating examples or sorting data, he was not a general replacement for me, for a couple of reasons:
- Lag — the response time delay was tiring and killed any sense of conversational flow — it was like having a phone call with a crappy connection. Maybe the results would be different with ChatGPT 4o’s new realtime voice model, but I suspect there would still be a delay, and I wouldn’t be comfortable sitting out the calls and delegating all my responses to John.ai.
- Context length vs output length — the times I got most value out of John.ai was when I asked a relatively short question and received a long response that wasn’t just boilerplate. A lot of my communication is short and specific however, and I often spent way longer explaining what I wanted than the time it would have taken to reply/act directly.
- Tool switching — I hadn’t appreciated just how much I hop around between different tools and platforms. A simple task might involve logging into and checking 3–5 different platforms, and communicating across multiple mediums, and there’s no way an AI could currently a) have all that context, and b) be able to make changes across all those platforms.
On a deeper level though, what was missing with John.ai’s responses was a deep contextual understanding of these particular people and systems — the thousands of tiny data points, social cues, in-jokes and experiences in the company that allow me to tune my response to that exact person and situation. Maybe this applies particularly to product managers where so much of the job is about relationships and communication, and our impact is measured by the success of the team around us, but I’d like to think that we all use these adaptive skills in our daily work.
This well of context also allows me to act proactively: to look at a situation in its entirety, figure out what is missing then give a small nudge to get the ball rolling. And being forced to explain this process of ‘intuition’ to an eager but clueless AI assistant was a valuable exercise that made me appreciate just how hard it is.
So I’m not going to be replaced by John.ai anytime soon. Maybe I’ll get him an internship though.