Why AI Isn't Like Poker
I recently read Thinking in Bets by Annie Duke, who writes about lessons she learned as a professional poker player and how they can be applied to business strategy. She discusses how players use strategy and experience to make moves with the highest probability of success — but luck and unknown variables can still lead to losses.
Just because the outcome was bad doesn’t necessarily mean the player was wrong. It’s crucial to distinguish between a bad decision and bad luck.
Investing is a lot like poker in this way — and so is building a startup. We make strategic calls based on available information, knowing there’s inherent risk. That’s the buy-in of being in the game.
Why AI Is a Different Kind of Game
AI doesn’t work the same way.
Three years ago, if you got a bad response from ChatGPT, you could assume it’s the model’s fault. But now, LLMs have become so powerful that outcomes depend far more on how you use them.
That’s both empowering and terrifying. Your team’s success in leveraging AI depends entirely on the quality of your agents and prompts.

Building Better Agents
Have you ever used the same prompt in ChatGPT twice and gotten wildly different results? Us too.
That’s why our R&D strategy around AI at Hedgineer is two-fold:
Build good AI.
Systematically evaluate AI to ensure it remains consistently good.
The second part is usually harder.
Because LLMs are intrinsically non-deterministic, engineering reliable agents is a nuanced problem. To solve this, we built the Hedgineer Evaluator — a framework for testing and improving AI tools before they reach users.
We evaluate metrics such as:
Test pass rate
Number of turns (iterations) to reach the correct answer
Token cost efficiency
With this feedback loop, we can pinpoint where an agent misunderstands a query and iteratively close that gap.

Building Better Prompts
Once our agents achieve consistent results, minimal hallucinations, and efficient costs, we deploy them to the client for beta testing. And at this stage, there’s often a learning curve in knowing how to ask the agent for what they want. Too little information gives the agent too much room for interpretation, and too much information overwhelms it. Crafting the right prompt is both a science and an art.
Our engineers help accelerate that process through:
Continuous feedback loops during beta testing
Reviewing conversation histories to improve prompts
Hosting teach-ins for common pitfalls
Building a shared prompt library of tried and tested examples
As teams learn to ask better questions, the AI’s effectiveness compounds.

Looking Forward: Using AI to Use AI
In our journey of building AI solutions for our clients, we’ve learned that getting those tools to work well is a two-way effort. It’s our job to ship the best agents, connectors, and workflows. But it’s their job to learn how to use them. There’s no silver bullet (yet) to ease that learning curve.
But one thing we accrue over time is a library of logs on how users interact with our products. For example, let’s say we build a custom Claude Desktop connector for your firm and turn it on for everyone at the company. We’ll have logs containing the questions users asked Claude, what answers they got back, and how much back-and-forth was necessary to arrive at a satisfactory conclusion. Now what if we fed that data back into an LLM to generate individually curated feedback on how the user could prompt better? We’re still in early stages of R+D on this, and there are a lot of technical and non-technical nuances to think through. But the preliminary prototyping is promising.
If anyone has thoughts on the idea or experience trying out something similar, reply to this email to let me know! I can share some responses in the next edition.

