Cogency: Build AI Agents in Python with Transparent ReAct Loops

7月18日 Published inAI Agent Frameworks

Cogency is a Python library designed to deploy conversational AI agents with minimal overhead. At its core is a transparent ReAct loop architecture that manages complex, multi-step reasoning. Because the agent processes logic in real time, you can trace every decision as it happens.

Developing a tool-enabled agent requires only a few lines of code. Cogency includes built-in persistent memory that integrates with scalable backends like Pinecone, PGVector, or ChromaDB. The library also features automatic tool discovery: simply drop a tool into your project, and the agent handles routing intelligently without manual configuration. It is designed to be environment-aware, automatically detecting available LLMs and tools.

Beyond basic functionality, Cogency allows you to inject specific personalities, support multi-tenant architectures, and connect to major providers including OpenAI, Anthropic, and Gemini. Engineered for production environments, the framework includes native support for retries, rate limiting, and metrics tracking. The architecture remains fully extensible, allowing you to implement custom tools or memory backends without fighting the underlying framework.

Quick start

from cogency import Agent
agent = Agent("assistant")

async for chunk in agent.stream("What's the weather in Tokyo?"):
    print(chunk, end="", flush=True)

What makes Cogency different

  • 3‑line agents – Deploy a full-featured agent with tools using a single import.
  • ReAct core – Focused on clear multi‑step reasoning rather than fragile prompt hacks.
  • Built‑in memory – Persistent storage with support for Pinecone, ChromaDB, and PGVector.
  • Zero config – Automatically identifies LLMs, tools, and memory settings from your environment.
  • Auto tool discovery – Tools register and route themselves once added to the project.
  • Streaming first – High-transparency output that lets you watch the agent's thought process.
  • Clear tracing – Every reasoning step is streamed with explicit phase markers.
  • Personality injection – Define specific tones, styles, and voices for different use cases.
  • Broad LLM support – Native integration for OpenAI, Anthropic, Gemini, Grok, and Mistral.
  • Extensible – Frictionless implementation of custom tools, memory backends, and embedders.
  • Multi‑tenant – Includes built-in handling for user contexts and conversation histories.
  • Production ready – Ships with retries, rate limits, metrics, and tracing out of the box.

See the agent think

👤 Human: Plan a 3‑day Tokyo itinerary considering weather.

🛠️ Tools: web_search, weather_forecast, travel_info

🧠 Reasoning: Need weather forecast to mix outdoor and indoor activities.
⚡️ Action: weather_forecast("Tokyo 3 days")
👀 Observation: Day1: Sunny 25°C, Day2: Rain 18°C, Day3: Cloudy 22°C

🧠 Reasoning: Rain on day2 affects outdoors, need indoor alternatives.
⚡️ Action: travel_info("Tokyo indoor attractions museums")
👀 Observation: TeamLab, Tokyo National Museum, Asakusa Temple (covered)

🧠 Reasoning: Have weather and indoor options, build full itinerary.
⚡️ Action: Write 3‑day plan integrating weather.

🤖 Agent: Here's your 3‑day Tokyo itinerary...

Examples

Basic agent in three lines:

import asyncio
from cogency import Agent

async def main():
    agent = Agent("assistant")
    async for chunk in agent.stream("What's 25 times 43?"):
        print(chunk, end="", flush=True)

asyncio.run(main())

Injecting personality:

pirate = Agent("pirate", personality="a friendly pirate who loves coding")
async for chunk in pirate.stream("Tell me about AI!"):
    print(chunk, end="", flush=True)

teacher = Agent("teacher", personality="patient teacher", tone="encouraging", style="conversational")

Multi‑step reasoning:

agent = Agent("travel_planner")
async for chunk in agent.stream("I'm planning a trip to London: what's the weather there? What's the current time? Flight costs $1200, hotel $180 per night for 3 nights – what's the total?")

Custom tool with auto‑discovery:

from cogency import Agent, BaseTool

class TimezoneTool(BaseTool):
    def __init__(self):
        super().__init__("timezone", "Get current time in any city")

    async def run(self, city: str):
        return {"time": f"{city} current time: 14:30 PST"}

    def get_schema(self):
        return "timezone(city='string')"

agent = Agent("time_assistant", tools=[TimezoneTool()])

Memory backends:

from cogency import Agent, FSMemory
from cogency.memory.backends import ChromaDB, Pinecone, PGVector

agent = Agent("memory_agent", memory=FSMemory())
agent = Agent("vector_agent", memory=ChromaDB())
agent = Agent("cloud_agent", memory=Pinecone(api_key="...", index="my-index"))

ReAct loop architecture

Cogency employs a transparent ReAct cycle to navigate multi‑step tasks:

  • 🧠 Reasoning – Analyzes the request and selects the appropriate tool.
  • ⚡️ Action – Executes the tool and retrieves data.
  • 👀 Observation – Processes the tool's output.
  • 🤖 Agent – Synthesizes the final response for the user.

Every step is streamed live, ensuring the entire process is fully traceable.

Installation

pip install cogency
echo "OPENAI_API_KEY=sk-..." >> .env   # or your preferred provider

Full installation with all dependencies:

pip install cogency[all]

Selective installations:

# LLM providers
pip install cogency[openai]      # OpenAI GPT
pip install cogency[anthropic]   # Claude
pip install cogency[gemini]      # Google Gemini
pip install cogency[mistral]     # Mistral AI

# Memory backends
pip install cogency[chromadb]    # ChromaDB
pip install cogency[pgvector]    # PostgreSQL with pgvector
pip install cogency[pinecone]    # Pinecone

# Embedders
pip install cogency[sentence-transformers]  # Local embeddings
pip install cogency[nomic]                  # Nomic

Output modes

Summary mode (returns final result):

result = await agent.run("What's 15 times 23?")
print(result)  # "345"

Streaming mode (full transparency):

async for chunk in agent.stream("What's 15 times 23?"):
    print(chunk, end="", flush=True)

# 👤 Human: What's 15 times 23?
# 🧠 Reasoning: Need calculator for 15 * 23
# ⚡️ Action: calculator(expression="15 * 23")
# 👀 Observation: Result: 345
# 🤖 Agent: The answer is 345

Multi‑tenancy

await agent.run("Remember my favorite color is blue", user_id="user1")
await agent.run("What's my favorite color?", user_id="user1")  # Returns "blue"

await agent.run("What's my favorite color?", user_id="user2")  # No memory found

Supported services

  • LLMs: OpenAI, Anthropic, Google, xAI, Mistral
  • Tools: Calculator, weather, timezone, web search, file manager
  • Memory: Filesystem, ChromaDB, Pinecone, PGVector
  • Embedders: OpenAI, Sentence Transformers, Nomic

Extensibility

@tool
class MyTool(BaseTool):
    async def run(self, param: str):
        return {"result": f"Processed: {param}"}

class MyMemory(MemoryBackend):
    async def memorize(self, content: str): pass
    async def recall(self, query: str): pass