Muscle Memory: Cache AI Agent Actions to Cut LLM Costs and Latency

5月16日 Published inAI Tools

Muscle Memory serves as a behavior cache for AI agents. The concept is straightforward: the system records the sequence of tool calls an agent makes to solve a specific task. When that task reappears, the system replays the recorded steps instead of querying the LLM. This saves time, reduces token expenses, and ensures identical output every time. If the environment changes or an edge case arises that doesn't match the cache, the system automatically falls back to the original agent logic.

This is not a standalone agent framework. Rather, it is a plugin designed to integrate with your existing agents. Its primary focus is cache validation, allowing you to define specific "Checks" that determine whether a stored tool sequence is safe to reuse in a new context.

The primary advantage is the elimination of redundant LLM processing. Many recurring tasks can effectively function as scripts; Muscle Memory identifies these patterns and moves them out of the expensive inference loop.

How It Works

  1. Environment Check: The system uses "Checks" to determine if the current state is known (a cache hit) or unknown (a cache miss).

  2. Task Execution:

    • Cache hit: The stored behavior trajectory is replayed instantly.
    • Cache miss: The task is passed to your agent to be processed normally.
  3. Data Collection: New tool calls are recorded and saved as a fresh trajectory within the cache for future use.

The Key Piece: Cache Validation with Checks

Safe reuse depends entirely on rigorous cache validation. For any given tool, you must identify which environmental features confirm that a cached version is still valid. You define these via Checks, which consist of two parts:

  • capture: A callback that extracts specific features from the current environment.
  • compare: A callback that evaluates whether the current environment matches the cached state. It returns either a boolean or a similarity score.

You can attach Checks to tools as pre-processing (to verify conditions before execution) or post-processing (to verify results after).

The following example demonstrates using a timestamp to validate the cache for a hello tool:

from dataclasses import dataclass
from muscle_mem import Check, Engine
import time

engine = Engine()

@dataclass
class T:
    name: str
    time: float

def capture(name: str) -> T:
    now = time.time()
    return T(name=name, time=now)

def compare(current: T, candidate: T) -> bool:
    diff = current.time - candidate.time
    return diff <= 1  # Cache remains valid for 1 second

@engine.tool(pre_check=Check(capture, compare))
def hello(name: str):
    time.sleep(0.1)
    print(f"hello {name}")

def agent(name: str):
    for i in range(9):
        hello(name + " + " + str(i))

engine.set_agent(agent)

# First run: results in a cache miss
cache_hit = engine("erik")
assert not cache_hit

# Second run: results in a cache hit
cache_hit = engine("erik")
assert cache_hit

# Wait 3 seconds: the cache expires, resulting in a miss
time.sleep(3)
cache_hit = engine("erik")
assert not cache_hit

Using Muscle Memory

Install

pip install muscle-mem

Core Components

1. Engine

The Engine acts as a wrapper for your agent. It serves as the central controller that executes tasks, manages the cache, and decides whether the LLM needs to be invoked.

from muscle_mem import Engine
engine = Engine()
engine.set_agent(your_agent)  # Bind your custom agent logic
engine("do some task")        # Execute a task with caching enabled

2. Tools

Functions should be marked with the @engine.tool decorator. This allows the system to log call arguments and map them to the behavior cache.

@engine.tool()
def hello(name: str):
    print(f"hello {name}!")

hello("world")  # This specific call is logged to the cache