>AI agent
Poor little shills don't know how long the Molts have been here. Kek
For the Anons, Have fun.
Here are some notable GitHub repositories for AI agents that use computer screen observation:
https://github.com/OthersideAI/self-operating-computer
A framework enabling an AI agent to operate a computer by using VLM models like GPT-4o and Claude 3 to "see" the screen and determine the next action (mouse move, click, type).
https://github.com/niuzaisheng/ScreenAgent
The official repository for the "ScreenAgent" project, which provides an environment for VLM agents to interact with real computer screens, including planning, action, and reflection stages.
https://github.com/suitedaces/computer-agent A local AI agent built with Tauri, React, and Rust that takes natural language instructions and controls your computer by taking screenshots, moving the mouse, clicking, and typing.
https://github.com/simular-ai/Agent-S
An open-source framework for building intelligent GUI agents that can interact with computers via an Agent-Computer Interface, demonstrating performance on benchmarks like OSWorld.
https://github.com/askui/python-sdk
A Python SDK that allows AI to control your desktop, enabling complex multi-step instructions and information extraction from the screen through a ComputerAgent.
https://github.com/maniotrix/offline-ai-assistant
A vision-based, offline AI assistant capable of understanding and automating tasks on a user's computer through screen capture and analysis.
These repositories generally include installation instructions and examples of how to set up the environment, including prerequisites like Python and API keys for VLM services.