NeuralAgent: An Open-Source AI Agent for Native Desktop Automation

7月28日 Published inAutomation Tools

NeuralAgent is a native desktop assistant designed to execute complex workflows through simple natural language commands. Unlike traditional chatbots, it interacts directly with your operating system—simulating keystrokes, managing mouse movements, navigating browsers, filling out forms, and sending emails. The agent is capable of operating in the foreground or running tasks silently in the background.

Desktop automation is powered by pyautogui, while background browser orchestration is currently supported on Windows via the Windows Subsystem for Linux (WSL). The system is highly flexible, integrating with a wide variety of model providers including Claude, GPT-4, Azure OpenAI, Amazon Bedrock, Ollama, and Google Gemini. A suite of modular agents—specializing in planning, classification, and task suggestion—analyzes both text input and real-time screen content to determine the optimal next step. Built on a FastAPI backend with an Electron and React frontend, the entire stack is fully customizable.

Official site: www.getneuralagent.com

Key Features

  • Desktop Automation: Native control via pyautogui.
  • Background Processes: Browser-focused automation for Windows users via WSL.
  • Broad Model Support: Integration with Claude, GPT-4, Azure OpenAI, Bedrock, Ollama, and Gemini.
  • Modular Architecture: Dedicated agents for planning, classification, task suggestion, and title generation.
  • Multimodal Perception: The agent interprets on-screen visuals alongside user instructions.
  • Modern Stack: Powered by FastAPI, Electron, and React.

Prerequisites

Before beginning the installation, ensure the following software is installed on your system:

Tool Purpose Recommended Version
Python Backend and local AI agent processes >= 3.9
PostgreSQL Relational database for data persistence >= 13
Node.js + npm Electron and React frontend components Node >= 18, npm >= 9

Installation Links:

NeuralAgent is compatible with Windows, macOS, and Linux. Note that background browser control via WSL is currently exclusive to Windows.

Installation and Setup

You will need two terminal windows: one to host the backend server and another for the desktop application.

Backend Configuration

  1. Initialize a virtual environment (recommended):

    cd backend
    python -m venv venv
    # Activation:
    # macOS/Linux: source venv/bin/activate
    # Windows: venv\Scripts\activate
    
  2. Install the required dependencies:

    pip install -r requirements.txt
    
  3. Database Setup: Create a local PostgreSQL database. Ensure the PostgreSQL service is running before proceeding.

  4. Environment Configuration: Copy .env.example to a new file named .env and provide your specific credentials:

    DB_HOST=
    DB_PORT=
    DB_DATABASE=
    DB_USERNAME=
    DB_PASSWORD=   # Leave blank if not required
    DB_CONNECTION_STRING=
    JWT_ISS=NeuralAgentBackend
    JWT_SECRET=   # Generate a unique random string
    REDIS_CONNECTION=   # Optional
    
    # Amazon Bedrock
    AWS_ACCESS_KEY_ID=
    AWS_SECRET_ACCESS_KEY=
    BEDROCK_REGION=us-west-2
    
    # Azure OpenAI
    AZURE_OPENAI_ENDPOINT=
    AZURE_OPENAI_API_KEY=
    OPENAI_API_VERSION=2024-12-01-preview
    
    # OpenAI / Anthropic
    OPENAI_API_KEY=
    ANTHROPIC_API_KEY=
    
    # Google Gemini
    GOOGLE_API_KEY=
    
    # Ollama (Local)
    OLLAMA_URL=http://127.0.0.1:11434
    
    # Agent Model Assignments
    CLASSIFIER_AGENT_MODEL_TYPE=openai|azure_openai|anthropic|bedrock|ollama|gemini
    CLASSIFIER_AGENT_MODEL_ID=gpt-4.1
    TITLE_AGENT_MODEL_TYPE=openai|azure_openai|anthropic|bedrock|ollama|gemini
    TITLE_AGENT_MODEL_ID=gpt-4.1-nano
    SUGGESTOR_AGENT_MODEL_TYPE=openai|azure_openai|anthropic|bedrock|ollama|gemini
    SUGGESTOR_AGENT_MODEL_ID=gpt-4.1-mini
    PLANNER_AGENT_MODEL_TYPE=openai|azure_openai|anthropic|bedrock|ollama|gemini
    PLANNER_AGENT_MODEL_ID=gpt-4.1
    COMPUTER_USE_AGENT_MODEL_TYPE=openai|azure_openai|anthropic|bedrock|ollama|gemini
    COMPUTER_USE_AGENT_MODEL_ID=us.anthropic.claude-sonnet-4-20250514-v1:0
    
    # Screenshot logging for training (Disabled by default)
    ENABLE_SCREENSHOT_LOGGING_FOR_TRAINING=false
    AWS_DEFAULT_REGION=us-east-1
    AWS_BUCKET=
    
    # LangSmith Tracing
    LANGCHAIN_TRACING_V2=false
    LANGCHAIN_ENDPOINT=
    LANGCHAIN_API_KEY=
    LANGCHAIN_PROJECT=
    
    # Google Authentication (Optional)
    GOOGLE_LOGIN_CLIENT_ID=
    GOOGLE_LOGIN_CLIENT_SECRET=
    GOOGLE_LOGIN_DESKTOP_REDIRECT_URI=http://127.0.0.1:36478
    
  5. Apply Database Migrations:

    alembic upgrade head
    
  6. Launch the Backend Server:

    uvicorn main:app --reload --host 0.0.0.0 --port 8000
    

Frontend and Desktop Application Setup

  1. Install Electron dependencies:

    cd desktop
    npm install
    
  2. Configure the React application:

    cd neuralagent-app
    npm install
    
  3. Set Frontend Environment Variables: Copy .env.example to .env within the neuralagent-app directory:

    REACT_APP_PROTOCOL=http
    REACT_APP_WEBSOCKET_PROTOCOL=ws
    REACT_APP_DNS=127.0.0.1:8000
    REACT_APP_API_KEY=
    
  4. Return to the desktop root directory:

    cd ..
    
  5. Initialize the AI Agent Service (Python):

    cd aiagent
    python -m venv venv
    source venv/bin/activate   # Windows: venv\Scripts\activate
    pip install -r requirements.txt
    deactivate
    
  6. Start the Desktop Application:

    cd ..
    npm start
    

Specialized Agents and Providers

NeuralAgent allows you to delegate specific tasks to different LLM providers by modifying the .env file. You can mix and match providers like OpenAI, Anthropic, or local Ollama instances based on your needs for speed or privacy.

Available Agent Types:

  • Planner Agent: Formulates the steps required to complete a task.
  • Classifier Agent: Categorizes the type of intent or data processed.
  • Title Agent: Generates descriptive titles for active sessions.
  • Suggestor Agent: Recommends follow-up actions or improvements.
  • Computer Use Agent: The core engine that handles direct mouse and keyboard interaction.