ClickUi - www.ClickUi.app
🏗️ The starting ground for the most widely used computer-based AI-assistant, something most people will have installed 🌎
ClickUi is a powerful, cross-platform open-source application that integrates various AI models, speech recognition, and web scraping capabilities. It provides a seamless interface for voice and text interactions, file attachments, property lookups, and web searches.
It's 100% Python, and aims to be the best AI-computer assistant. Help us build it to either get it there or keep it that way! See the Future Features & Ideas section
Looking for Collaborators! Leave Voice mode running, have conversations throughout the day, experience how AI should be on the computer, and build it out to be even better for you/everyone.
Submit new features & ideas in the README as checkboxes so they can be added to the page
Submit pull requests to main and they will be reviewed.
1-min Demo: https://youtu.be/oH-A1hSdVKQ
-
Voice Mode: Allows users to interact with the AI using voice commands and receive spoken responses.
-
Chat Mode: Provides a text-based interface for typing queries and receiving written responses.
The AI Assistant relies on two critical dependencies that must be installed and loaded into the global scope before the program can run:
- Whisper: An automatic speech recognition (ASR) system used for transcribing voice input.
- Kokoro: A text-to-speech engine used for generating spoken responses in Voice Mode.
- API Keys: You need to configure the API keys and Engine/Model information to be able to use that AI model.
Warning:
The Whisper and Kokoro models are loaded into the global scope. They must be installed and properly configured before running the AI Assistant. Failure to do so will result in runtime errors. You can run without Voice functionality & dependencies by commenting the Whisper & Kokoro loading out (but the voice mode will not work)
- Add Code Formatting to AI replies, with small a copy icon added to each code block output to easily copy the code.
- Add a Model/Engine functionality to settings via the SettingsWidget (perhaps 'Add New' in bottom of dropdown). Have to define in python right now to make available.
- Make UI Navigable with arrow keys (after launching with hotkey, cursor starts in Prompt area. From here, allow left arrow key to open settings, down arrow key to pop open conversation window below, etc.)
- Fix/Revise Voice mode start/stop toggling & related resets. If you click to exit voice mode during transcription or audio playback, sometimes quits entire program
- Add voice name selection (and model size) for kokoro in SettingsWidget
- Add hotkey to take selectable area screenshot (or Prnt Scrn) and have it auto-appended to the input chat to easily show the AI what you are looking at/working with.
- Build tests for all functionality (Prompt input chat, Reply input chat, Conversation History validation, etc)
- Add a model pricing table to calculate total price of input & output per message/websearch/file upload, etc. Could add option to display below message bubbles, etc.
- Add multi-file attachment capability (limited to 1 now)
- Track token usage per message. Could add option to display below message bubbles, etc.
- Determine the best way to provide executables that work for Windows, Mac, and Linux that will not require the user to do anything other than install our app and run it on a fresh install of each OS (without the user installing Python, CUDA, etc). Or an install script to help get things setup, or Docker, etc
- Add fine-tuning settings to allow Temp, Top P, Repeat Penalty, etc. to be defined in Settings Widget (something clean/intuitive, maybe a gray horizontal bar like the one to expand the chat window, but above the UI, that lets you adjust these things quickly?) Also need to allow max_tokens per-model for Claude (or get error on API call), new models have thinking tokens/effort, etc. Should add support for all that in the same clean/intuitive SettingsWidget style we have now.
- Merge to one main Window (right now 2 windows launch in taskbar, one for each area)
- Option to pop open another chat window with a magnet clip link between them, when selected it links the two input prompts so you can chat with two models at once easily. When not selected you can type different prompts into each. Perhaps a transparent + icon in the upper left of the initial chat bubble that lets you spawn in the other chat bubble?
- Add WebUI Browser-Use functionality & option to toggle in SettingsWidget: https://github.com/browser-use/web-ui (might need to create a mini version, WebUI is too slow for real-time usage). Have to scan input prompt/transcription for keywords or do tool call to trigger, options for headless to show/hide browser if desired, etc. Then respond once finished to confirm it was done, etc. (See below might want to create our own)
- Computer interactions that you'd actually want to use. For example, 'Update the system prompt we have in ClickUi. Keep the tool calls and most functionality, but do XYZ...' and it would open ClickUi, navigate to settings, and paste in the new prompt and then say it's done. Or 'Take this prompt and run it through Google AI Studio with Model 1 and Model 2, Anthropic Console with Claude 3 dash 7, and OpenAI o1'. These things would be awesome and revolutionary! Totally possible if we all put our heads together. The solution has to be versatile and not require much setup/tuning for the user
python clickui.py
-
Keep the files together in one folder
You need the sonos.py, the .svg's, etc for the default program to run.
These are all provided in the GitHub folder, make sure your folder has the same contents as this Repo -
Install Anaconda/Conda
Download and install Anaconda/Conda from:
https://www.anaconda.com/download/success
This allows for easier environment management and Python setup. Install system-wide and add to the PATH. -
Create new Conda environment
- Run
conda -h
in your terminal to check if conda is installed correctly. - Open Command Prompt and create a new Conda environment called
cuda
with Python version 3.11:
conda create -n cuda python==3.11
This creates a new Conda environment named
cuda
where Python and required libraries will reside.- To activate the environment, run:
conda activate cuda
Your terminal prompt should now display the environment name.
- Run
-
Install CUDA Toolkit and Related Libraries
⚠️ Enables GPU voice transcription & generation. Is not required, you can use the CPU, but it will be noticeably slower and less enjoyable to use.⚠️ Only for NVIDIA GPUs-
A. Install CUDA Toolkit (for Kokoro & Whisper)
These are not required for chat-based functionality but are essential for Voice-mode responsiveness. Without a NVIDIA GPU, voice transcription and generation will be slower.
Install cudatoolkit v11.8.0 from:
https://anaconda.org/conda-forge/cudatoolkitconda install -c conda-forge cudatoolkit
-
B. Install cuDNN
Not required for chat-based functionality.
Install cudnn v8.9.7 from:
https://anaconda.org/conda-forge/cudnnconda install -c conda-forge cudnn
-
C. Install Pytorch
Not required for chat-based functionality.
Install Pytorch from:
https://pytorch.org/conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
-
D. Install Tensorflow
Not required for chat-based functionality.
Install Tensorflow 2.14.0 (the last version compatible with CUDA 11.8) as referenced here:
https://www.tensorflow.org/install/source#gpuconda install -c conda-forge tensorflow=2.14.0=cuda118py311heb1bdc4_0
-
-
Other Libraries
Test your installation by running:python clickui.py
If you encounter import errors, install the missing libraries via pip. For example:
pip install kokoro pip install pyperclip pip install keyboard
-
Start the Program
- With your command prompt active in the correct conda environment and in the directory containing
clickui.py
, run:
python clickui.py
- Once you see the message
Ready!...
, pressCtrl+k
to bring up the ClickUi interface.
- With your command prompt active in the correct conda environment and in the directory containing
Configure clickui by editing the .voiceconfig
file in the root directory. Key settings include:
{
"use_sonos": false,
"use_conversation_history": true,
"BROWSER_TYPE": "chrome",
"CHROME_USER_DATA": "C:\\Users\\PC\\AppData\\Local\\Google\\Chrome\\User Data",
"CHROME_DRIVER_PATH": "C:\\Users\\PC\\Downloads\\chromedriver.exe",
"CHROME_PROFILE": "Profile 10",
"ENGINE": "OpenAI",
"MODEL_ENGINE": "gpt-4o",
"OPENAI_API_KEY": "your-api-key-here",
"GOOGLE_API_KEY": "your-google-api-key-here",
"days_back_to_load": 15,
"HOTKEY_LAUNCH": "ctrl+k"
}
Adjust these settings according to your preferences and API keys.
The AI Assistant uses the Whisper model for speech recognition. Here's an implementation example:
import whisper as openai_whisper
whisper_model = openai_whisper.load_model("base", device='cuda')
def record_and_transcribe_once() -> str:
# ... recording logic ...
def transcribe_audio(audio_data, samplerate):
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
temp_wav_name = tmp.name
sf.write(temp_wav_name, audio_data, samplerate)
result = whisper_model.transcribe(temp_wav_name, fp16=False)
return result["text"]
# ... more recording and transcription logic ...
The application supports multiple AI models (OpenAI, Google, Ollama, Claude, Groq, and OpenRouter). An example for the OpenAI model integration is:
def call_openai(prompt: str, model_name: str, reasoning_effort: str) -> str:
import openai
import json
global conversation_messages, OPENAI_API_KEY
ensure_system_prompt()
conversation_messages.append({"role": "user", "content": prompt})
openai.api_key = OPENAI_API_KEY
if not openai.api_key:
stop_spinner()
print(f"{RED}No OpenAI API key found.{RESET}")
return ""
# ... API call logic ...
try:
response = openai.chat.completions.create(**api_params)
except Exception as e:
print(f"{RED}Error connecting to OpenAI: {e}{RESET}")
return ""
# ... response handling ...
The AI Assistant includes web scraping capabilities for Google searches and property lookups. Below is an example for the Google search function:
def google_search(query: str) -> str:
global BROWSER_TYPE
stop_spinner()
print(f"{MAGENTA}Google search is: {query}{RESET}")
encoded_query = quote_plus(query)
url = f"https://www.google.com/search?q={encoded_query}"
with sync_playwright() as p:
browser = p.chromium.launch(headless=True, args=["--disable-blink-features=AutomationControlled"])
if BROWSER_TYPE == 'chrome':
context = browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ..."
)
# ... more browser setup ...
page = context.new_page()
page.goto(url)
page.wait_for_load_state("networkidle")
html = page.content()
browser.close()
soup = BeautifulSoup(html, 'html.parser')
text = soup.get_text()
cleaned_text = ' '.join(text.split())[0:5000]
print(cleaned_text)
return cleaned_text
The graphical user interface is implemented using PySide6 (Qt for Python). Below is an example of the main window class:
class BottomBubbleWindow(QWidget):
global last_chat_geometry
response_ready = Signal(str, object, object)
def __init__(self):
global last_main_geometry, last_chat_geometry
super().__init__()
self.setWindowFlags(Qt.FramelessWindowHint)
self.setAttribute(Qt.WA_TranslucentBackground, True)
self.setAttribute(Qt.WA_DeleteOnClose)
self.response_ready.connect(self.update_ai_reply)
# Initialize chat dialog with empty content
self.chat_dialog = ChatDialog(host_window=self)
if last_chat_geometry:
self.chat_dialog.setGeometry(last_chat_geometry)
self.chat_dialog.hide()
# ... more initialization ...
def on_message_sent(self, text):
# ... message handling logic ...
def process_ai_reply(self, text, container, lb, fresh):
try:
ai_reply = call_current_engine(text, fresh=fresh)
except Exception as e:
print(f"Error in AI thread: {e}")
ai_reply = f"[Error: {e}]"
self.response_ready.emit(ai_reply, container, lb)
# ... more methods ...
The AI Assistant supports voice interactions using the Whisper model for speech recognition and a text-to-speech engine for responses. An example implementation for voice recording is:
def record_and_transcribe_once() -> str:
global recording_flag, stop_chat_loop, whisper_model
model = whisper_model
if recording_flag:
return ""
recording_flag = True
audio_q.queue.clear()
samplerate = 24000
blocksize = 1024
silence_threshold = 70
max_silence_seconds = 0.9
MIN_RECORD_DURATION = 1.0
recorded_frames = []
speaking_detected = False
silence_start_time = None
with sd.InputStream(channels=1, samplerate=samplerate, blocksize=blocksize, callback=audio_callback):
print(f"{YELLOW}Recording started. Waiting for speech...{RESET}")
play_wav_file_blocking("recording_started.wav")
while True:
if stop_chat_loop:
break
# ... recording logic ...
if stop_chat_loop:
recording_flag = False
return ""
print(f"{GREEN}Recording ended. Transcribing...{RESET}")
# ... transcription logic ...
return text_result
Users can interact via text input. The chat interface is implemented within the GUI:
class ChatDialog(QWidget):
global conversation_messages
def __init__(self, host_window):
global conversation_messages
super().__init__()
self.host_window = host_window
self.setWindowFlags(Qt.FramelessWindowHint)
self.setAttribute(Qt.WA_TranslucentBackground, True)
self.setAttribute(Qt.WA_DeleteOnClose)
# ... UI setup ...
self.reply_line = QLineEdit()
self.reply_line.setPlaceholderText("Type your reply...")
reply_layout.addWidget(self.reply_line, stretch=1)
self.reply_send_button = QToolButton()
self.reply_send_button.setText("↑")
self.reply_send_button.setToolTip("Send Reply")
reply_layout.addWidget(self.reply_send_button)
self.reply_send_button.clicked.connect(self.handle_reply_send)
self.reply_line.returnPressed.connect(self.handle_reply_send)
def handle_reply_send(self):
text = self.reply_line.text().strip()
if text:
self.add_message(text, role="user")
self.reply_line.clear()
container, lb = self.add_loading_bubble()
def do_ai_work():
try:
ai_reply = call_current_engine(text, fresh=False)
except Exception as e:
print("Error in AI thread:", e)
ai_reply = f"[Error: {e}]"
self.host_window.response_ready.emit(ai_reply, container, lb)
th = threading.Thread(target=do_ai_work, daemon=True)
th.start()
# ... more methods ...
The AI Assistant supports file attachments for text-based files. File handling is implemented as follows:
class FileDropLineEdit(QLineEdit):
file_attached = Signal(list) # Signal to notify when a file is attached
def __init__(self, parent=None):
super().__init__(parent)
self.setAcceptDrops(True)
self.attachments = [] # Holds dictionaries: {'filename': ..., 'content': ...}
def dragEnterEvent(self, event):
if event.mimeData().hasUrls():
for url in event.mimeData().urls():
file_path = url.toLocalFile()
if os.path.splitext(file_path)[1].lower() in ['.txt', '.csv', '.xlsx', '.xls']:
event.acceptProposedAction()
return
event.ignore()
else:
super().dragEnterEvent(event)
def dropEvent(self, event):
if event.mimeData().hasUrls():
attachments = []
for url in event.mimeData().urls():
file_path = url.toLocalFile()
ext = os.path.splitext(file_path)[1].lower()
if ext in ['.txt', '.csv', '.xlsx', '.xls']:
file_name = os.path.basename(file_path)
try:
content = read_file_content(file_path)
attachments.append({'filename': file_name, 'content': content})
except Exception as e:
attachments.append({'filename': file_name, 'content': f"Error reading file: {str(e)}"})
if attachments:
self.attachments = attachments
self.file_attached.emit(attachments)
event.acceptProposedAction()
else:
super().dropEvent(event)
The assistant can retrieve property value estimates from Zillow and Redfin. An example implementation:
def fetch_property_value(address: str) -> str:
global driver
# Kill any lingering Chromium instances before starting a new search.
kill_chromium_instances()
try:
driver
except NameError:
# ... driver setup ...
stop_spinner()
print(f"{MAGENTA}Address for search: {address}{RESET}")
stop_spinner()
search_url = "https://www.google.com/search?q=" + address.replace(' ', '+')
try:
driver.get(search_url)
time.sleep(3.5)
except Exception as e:
stop_spinner()
print(f"{RED}[DEBUG] Exception during driver.get: {e}{RESET}")
stop_spinner()
return "Error performing Google search."
# ... search for Zillow and Redfin links ...
def open_in_new_tab(url):
# ... open URL in new tab and return page HTML ...
def parse_redfin_value(source):
# ... parse Redfin value from HTML ...
def parse_zillow_value(source):
# ... parse Zillow value from HTML ...
property_values = []
for domain, link in links_found.items():
if not link:
continue
page_html = open_in_new_tab(link)
extracted_value = None
if domain == 'Redfin':
extracted_value = parse_redfin_value(page_html)
elif domain == 'Zillow':
extracted_value = parse_zillow_value(page_html)
if extracted_value:
property_values.append((domain, extracted_value))
if not property_values:
return "Could not retrieve property values."
result_phrases = []
for domain, value in property_values:
result_phrases.append(f"{domain} estimates the home is worth {value}")
return ", and ".join(result_phrases)
The AI Assistant can perform Google searches to fetch up-to-date information:
def google_search(query: str) -> str:
global BROWSER_TYPE
stop_spinner()
print(f"{MAGENTA}Google search is: {query}{RESET}")
encoded_query = quote_plus(query)
url = f"https://www.google.com/search?q={encoded_query}"
with sync_playwright() as p:
browser = p.chromium.launch(headless=True, args=["--disable-blink-features=AutomationControlled"])
if BROWSER_TYPE == 'chrome':
context = browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ..."
)
if BROWSER_TYPE == 'chromium':
context = browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ..."
)
page = context.new_page()
page.goto(url)
page.wait_for_load_state("networkidle")
html = page.content()
browser.close()
soup = BeautifulSoup(html, 'html.parser')
text = soup.get_text()
cleaned_text = ' '.join(text.split())[0:5000]
print(cleaned_text)
return cleaned_text
To integrate a custom AI model, add a new API call function and update the ENGINE_MODELS
dictionary. For example:
def call_custom_model(prompt: str, model_name: str) -> str:
# Implement your custom model API call here
# Example:
response = requests.post(
"https://api.custom-model.com/generate",
json={"prompt": prompt, "model": model_name}
)
return response.json()["generated_text"]
# Add to ENGINE_MODELS
ENGINE_MODELS["CustomAI"] = ["custom-model-1", "custom-model-2"]
# Update call_current_engine
def call_current_engine(prompt: str, fresh: bool = False) -> str:
global ENGINE, MODEL_ENGINE
if ENGINE == "CustomAI":
return call_custom_model(prompt, MODEL_ENGINE)
elif ENGINE == "Ollama":
return call_ollama(prompt, MODEL_ENGINE)
# ... existing code for other engines ...
To add new features or tools, create new functions and integrate them into the workflow. For example, to add a weather lookup feature:
import requests
def weather_lookup(city: str) -> str:
api_key = "your_weather_api_key"
url = f"https://api.openweathermap.org/data/2.5/weather?q={city}&appid={api_key}&units=metric"
response = requests.get(url)
data = response.json()
if response.status_code == 200:
temp
If your ChromeDriver is proper for the version of Chrome/Chromium you are using in the paths, and you have the paths setup properly, and the code is actually getting triggered to run the web-search but it's returning errors, double check the UserAgent in the code and that no instance of Chrome or Chromium is running in the Task Manager (end all before running to verify). Also, use the profile you are setting to pull up the site manually via the browser, see if it let's you access the URL. Try using your main Chrome/Chromium profile info for the best results.