Two months before it changed its name to "Meta," Facebook CEO Mark Zuckerberg personally introduced us to his metaverse for work: Horizon Workrooms, envisioned as a virtual space for workers to collaborate. Today, the company announced it's shutting that space down: "Meta has made the decision to discontinue Workrooms as a standalone app, effective February […]
After the self-induced tumult Sonos went through last year, I can understand why some people are reluctant to spend money on the company’s products. But newly appointed CEO Tom Conrad has shown that he’s determined to get back on track and revitalize Sonos as the leading whole-home audio brand. The contentious mobile app is in […]
In September, Apple launched its latest fleet of smartwatches, including the Apple Watch Series 11, the SE 3, and the Ultra 3. Each wearable offers something a little different (their prices indicate their breadth of features), and we’re already starting to see big price drops. Additionally, we’re still recommending some recent predecessors in Apple’s portfolio, […]
Hey HN!
Wanted to show our open source agent harness called Gambit.
If you’re not familiar, agent harnesses are sort of like an operating system for an agent... they handle tool calling, planning, context window management, and don’t require as much developer orchestration.
Normally you might see an agent orchestration framework pipeline like:
compute -> compute -> compute -> LLM -> compute -> compute -> LLM
we invert this so with an agent harness, it’s more like:
LLM -> LLM -> LLM -> compute -> LLM -> LLM -> compute -> LLM
Essentially you describe each agent in either a self contained markdown file, or as a typescript program. Your root agent can bring in other agents as needed, and we create a typesafe way for you to define the interfaces between those agents. We call these decks.
Agents can call agents, and each agent can be designed with whatever model params make sense for your task.
Additionally, each step of the chain gets automatic evals, we call graders. A grader is another deck type… but it’s designed to evaluate and score conversations (or individual conversation turns).
We also have test agents you can define on a deck-by-deck basis, that are designed to mimic scenarios your agent would face and generate synthetic data for either humans or graders to grade.
Prior to Gambit, we had built an LLM based video editor, and we weren’t happy with the results, which is what brought us down this path of improving inference time LLM quality.
We know it’s missing some obvious parts, but we wanted to get this out there to see how it could help people or start conversations. We’re really happy with how it’s working with some of our early design partners, and we think it’s a way to implement a lot of interesting applications:
- Truly open source agents and assistants, where logic, code, and prompts can be easily shared with the community.
- Rubric based grading to guarantee you (for instance) don’t leak PII accidentally
- Spin up a usable bot in minutes and have Codex or Claude Code use our command line runner / graders to build a first version that is pretty good w/ very little human intervention.
We’ll be around if ya’ll have any questions or thoughts. Thanks for checking us out!
Walkthrough video: https://youtu.be/J_hQ2L_yy60
Comments URL: https://news.ycombinator.com/item?id=46641362
Points: 33
# Comments: 7
Ashley St. Clair, the mother of one of X owner Elon Musk's children, is suing his company for enabling its AI to virtually strip her down into a bikini without her consent. St. Clair is one of the many people over the past couple weeks who have found themselves undressed without permission by X's AI […]
There's a new name in charge of stewarding Star Wars at Lucasfilm. The studio just announced that Dave Filoni - best-known for his work on The Mandalorian and The Clone Wars - will be taking over as president. Former president Kathleen Kennedy, whose departure had been rumored for some time, will be stepping down and […]