Skip to content

Commit

Permalink
Meilisearch (#85)
Browse files Browse the repository at this point in the history
* first meili version

* rephrase README.md
  • Loading branch information
BennyThink authored Apr 1, 2023
1 parent d558c53 commit 9c3cea0
Show file tree
Hide file tree
Showing 23 changed files with 386 additions and 539 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/builder.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,8 @@ jobs:
platforms: linux/arm,linux/amd64,linux/arm64
push: true
tags: |
${{ steps.dh_string.outputs.lowercase }}
ghcr.io/${{ steps.ghcr_string.outputs.lowercase }}
${{ steps.dh_string.outputs.lowercase }}:ng
ghcr.io/${{ steps.ghcr_string.outputs.lowercase }}:ng
cache-from: type=local,src=/tmp/.buildx-cache
cache-to: type=local,dest=/tmp/.buildx-cache-new,mode=max
Expand All @@ -80,8 +80,8 @@ jobs:
username: root
key: ${{ secrets.SSH_KEY }}
script: |
docker save bennythink/searchgram > /root/searchgram-old.tar
docker pull bennythink/searchgram
docker save bennythink/searchgram:ng > /root/searchgram-old.tar
docker pull bennythink/searchgram:ng
docker-compose -f /home/SearchGram/docker-compose.yml up -d
curl "https://api.telegram.org/bot$TOKEN/sendMessage?chat_id=260260121&text=SearchGram%20upgrade%20complete!"
echo "SearchGram upgrade complete!"
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -140,4 +140,6 @@ es_data/
/.ash_history
*.img
mongo_data/*
searchgram/session/*
searchgram/session/*
/sg_data/*
/.idea/*
12 changes: 0 additions & 12 deletions .idea/dataSources.xml

This file was deleted.

103 changes: 65 additions & 38 deletions Docker.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,29 @@
> This document will help you to go through the entire process of running this utility.
> This document provides a step-by-step guide to help you run this utility.
# 1. Prepare the Environment and Download the Appropriate Docker Compose File

# 1. Prepare environment and clone this repository
To get started, install Docker and Docker Compose on your server.

Install docker and docker-compose on your server, clone this repository to any directory you want.
You can choose to use either the legacy version, which is powered by MongoDB, by using the docker-compose.legacy.yml
file
or the latest version, which is powered by MeiliSearch, by using the docker-compose.yml file.

# 2. (Optional) Prepare Encryption data volume
# 2. (Optional) Prepare the Encrypted Data Volume

It's highly recommend to use encrypted data volume. You can use LUKS.
For added security, it's highly recommended to use an encrypted data volume.

Here there is an example of using loop+LVM+LUKS, you can also use simple make commands:
You can use LUKS for this purpose.

Here's an example of how to use loop+LVM+LUKS to set it up, but you can also use simple make commands:

```shell
make encrypt
make format
```

## 2.1 Create loop device
## 2.1 Create the loop device

Start by creating a loop file and loop device:

```shell
# create loop file and loop device
Expand All @@ -31,64 +38,79 @@ I/O size (minimum/optimal): 512 bytes / 512 bytes

```

## 2.2 create LVM
## 2.2 Create LVM

Create the physical volume and volume group:

```shell
pvcreate /dev/loop0
vgcreate vg_mongo_data /dev/loop0
vgcreate vg_sg_data /dev/loop0
# use vgdisplay to confirm Volume Group
vgdisplay

# create logical volume
lvcreate --extents 100%FREE vg_mongo_data -n lv_mongo_data
lvcreate --extents 100%FREE vg_sg_data -n lv_sg_data

# You should have device here
file /dev/vg_mongo_data/lv_mongo_data
file /dev/vg_sg_data/lv_sg_data
```

## 2.3 luks

Format LUKS and enter your password:

```shell
# format lucks and input your password
cryptsetup luksFormat /dev/vg_mongo_data/lv_mongo_data
cryptsetup luksFormat /dev/vg_sg_data/lv_sg_data
# open device
cryptsetup luksOpen /dev/vg_mongo_data/lv_mongo_data mongo_data
# you should see /dev/mapper/mongo_data
file /dev/mapper/mongo_data
cryptsetup status mongo_data
cryptsetup luksOpen /dev/vg_sg_data/lv_sg_data sg_data
# you should see /dev/mapper/sg_data
file /dev/mapper/sg_data
cryptsetup status sg_data
```

## 2.4 format and mount
## 2.4 Format and Mount

Format and mount the device:

```shell
mkfs.ext4 /dev/mapper/mongo_data
mkdir -p mongo_data
mount /dev/mapper/mongo_data ./mongo_data
chmod 777 mongo_data
mkfs.ext4 /dev/mapper/sg_data
mkdir -p sg_data
mount /dev/mapper/sg_data ./sg_data
chmod 777 sg_data
```

## 2.5 unmount and remove
## 2.5 Unmount and Remove

Unmount and remove the device:

```shell
umount /dev/mapper/mongo_data
cryptsetup luksClose mongo_data
umount /dev/mapper/sg_data
cryptsetup luksClose sg_data
````

# 3. Prepare APP_ID, APP_HASH and bot token
# 3. Obtain APP_ID, APP_HASH, and Bot Token

1. You can get APP_ID and APP_HASH from https://core.telegram.org/
2. Talk to @BotFather to get your bot token
3. Talk to @blog_update_bot to get your user id and your bot's id
To get started with SearchGram, you'll need to
1. obtain your APP_ID and APP_HASH from https://core.telegram.org/,
2. get your bot token by contacting @BotFather
3. get your user ID and bot ID by contacting @blog_update_bot.
# 4. Modify env file
The MEILI_MASTER_KEY is a credential used to access the Web UI of MeiliSearch.
To simplify things, you can use your bot token instead.
```shell
# vim env/gram.env
TOKEN=token
APP_ID=id
APP_HASH=hash
OWNER_ID=your user_id
BOT_ID=your bot_id
MEILI_MASTER_KEY=token
```
# 5. Login to client
Expand All @@ -97,7 +119,7 @@ BOT_ID=your bot_id
make init
```
And then you'll be dropped into a container shell.
After running make init, you will be dropped into a container shell.
```shell
python client.py
Expand All @@ -110,15 +132,18 @@ under `searchgram/session/client.session`.
# 6. (optional)setup sync id
If you would like to sync all the chat history for any user, group or channel, you can configure sync id.
To synchronize the chat history for a user, group, or channel, you can configure the sync ID.
First thing you need is obtaining chat peer, it could be integer or username. Use https://t.me/blog_update_bot to get
what
you want.
This allows you to specify which chats you want to sync the history for.
Secondly, you'll have to manually edit `sync.ini`
**Please use username as much as possible.
If you want to use user_id, please talk to the person immediately after starting `client.py`. You have 30s to do so.**
The first step in configuring the sync ID is to obtain the chat peer, which can be either an integer or a username.
You can obtain the chat peer by using https://t.me/blog_update_bot.
Next, you will need to manually edit the sync.ini file.
**It is recommended to use usernames whenever possible when configuring the sync ID.
If you need to use a user ID instead, it is important to talk to the person immediately after starting `client.py`
because you only have 30 seconds to do so.**
```ini
[chat]
Expand All @@ -133,6 +158,8 @@ BennyThink # will sync this
docker-compose up -d
```

Now you can talk to your friends and search in your bot.
Once you have completed the previous steps, you can talk to your friends and search in your bot.

You can also use http://localhost:7700 to access the MeiliSearch Web UI.

If you configure sync id, you can monitor sync status in Saved Messages.
If you have configured the sync ID, you can monitor the sync status in the Saved Messages.
79 changes: 37 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,62 @@
# SearchGram

A telegram Bot that can search for CJK and other languages, as well as message backup utility.

**⚠️️⚠️⚠️Warning: this application is migrating to MeiliSearch. Expect data migration if you want to use new version of it.⚠️⚠️⚠️**
SearchGram is a Telegram bot that improves search experience for Chinese, Japanese, and Korean (CJK) languages and
provides message backup functionality.

# Introduction

Telegram has bad search experience for CJK languages because those languages are not separated by spacing.
Telegram's search function has poor support for CJK languages because there are no spaces to separate words.

Bug issues were submitted years ago but never fixed.
Issues regarding this have been reported years ago but have yet to be resolved.

* https://github.com/tdlib/td/issues/1004
* https://bugs.telegram.org/c/724

I'm not planning to be sitting ducks, so I create a bot that can search for CJK languages.

# Feature

* support text message
* support caption inside photo and document
* support chat username hints
* support import user supplied chat history
* support seamless sync specified chat history in background
* search for one specific user: `/user <username>|<id>|<firstname> keyword`
* Supports text message search
* Provides typo-tolerant and fuzzy search for CJK languages
* Supports filters for GROUP, CHANNEL, PRIVATE, SUPERGROUP, and BOT
* Supports username/ID filtering
* Supports caption search for photos and documents
* Supports seamless chat history sync in the background
* Provides pagination
* Uses a WebUI for searching

# Theory

1. Telegram allows multiple sessions, maximum is 10 clients.
2. We create a hidden session
3. We use this session to store all your incoming and outgoing text messages to MongoDB
4. We create another bot to search MongoDB
5. We return the whole sentence, so you could use Telegram's built-in buggy search feature.
SearchGram works by:

What about history chats before running this bot?
1. Allowing multiple sessions, with a maximum of 10 clients.
2. Creating a hidden session to store all incoming and outgoing text messages to MeiliSearch.
3. Creating another bot to query MeiliSearch.
4. Returning the whole sentence to use Telegram's built-in search feature, which is known to be buggy.

Don't worry, we can either import your history chats, or use config file to sync your history chats.
If you're concerned about chat history prior to running the bot,

# Screenshots
you can relax because SearchGram offers a solution to sync your chat history using a configuration file.

![](assets/1.jpeg)
# Screenshots

![](assets/1.png)
![](assets/2.png)

https://user-images.githubusercontent.com/14024832/164222317-ea6b228c-bda3-4983-afd7-7bc8f6af5409.mp4
![](assets/3.png)

# Installation

**Because chat history is very important, and it should be kept privately, so I don't offer any public bots.**
**Note: Because chat history should be kept private, we do not offer any public bots.**
**To learn how to use SearchGram in Docker, please refer to the [Docker.md](Docker.md)**

**For how to use it in docker, please refer to [Docker.md](Docker.md)**
Please follow the steps below to install SearchGram:

## 1. Preparation

* Download or clone this repository
* Install Python from here: https://www.python.org/downloads/
* Install MongoDB from here: https://www.mongodb.com/download/
* Install MeiliSearch from here: https://github.com/meilisearch/meilisearch
* Apply for APP_ID and APP_HASH from here: https://my.telegram.org/
* Talk to https://t.me/BotFather to get your bot token
* Talk to https://t.me/blog_update_bot to get your user id
* Obtain your bot token by contacting https://t.me/BotFather.
* Obtain your user ID by contacting https://t.me/blog_update_bot.

## 2. Modify environment file

Expand All @@ -67,44 +66,39 @@ Use your favorite editor to modify `config.py`, example:
APP_ID = 176552
APP_HASH = "667276jkajhw"
TOKEN = "123456:8hjhad"
MONGO_HOST = "localhost"
MEILI_HOST = "localhost"
OWNER_ID = "2311231"
```

If your network is limited(like in China), you need to setup proxy:
If you have limited network access, such as in China, you will need to set up a proxy.

```python
PROXY = {"scheme": "socks5", "hostname": "localhost", "port": 1080}
```

## 3. Login to client

Open a terminal(cmd, iTerm, etc), change directory to your code, and then:
Open a terminal (such as cmd or iTerm), navigate to the directory where you have saved the code, and then:

```shell
python client.py
```

Input your phone number and login to the client. Ctrl + C to exit
Enter your phone number and log in to the client. You can exit by pressing `Ctrl + C`.

## 4. (optional)Setup sync id

See [here](Docker.md#6-optionalsetup-sync-id)

## 5. Run!

Open two terminals, and respectively:
Open two terminals and run the following commands in each terminal:

```shell
python client.py
python bot.py
```

# Roadmap and TODOs

- [x] chat history
- [ ] jieba

# Sponsor

* [Buy me a coffee](https://www.buymeacoffee.com/bennythink)
Expand All @@ -113,8 +107,9 @@ python bot.py

## Stripe

You can choose to donate via Stripe. Please click the button below to donate via Stripe.
Choose the currency and payment method that suits you.
If you would like to donate to the project using Stripe, please click on the button below.

You can choose the currency and payment method that best suits you.

| USD(Card, Apple Pay and Google Pay) | SEK(Card, Apple Pay and Google Pay) | CNY(Card, Apple Pay, Google Pay and Alipay) |
|--------------------------------------------------|--------------------------------------------------|--------------------------------------------------|
Expand All @@ -123,4 +118,4 @@ Choose the currency and payment method that suits you.

# License

This project is LICENSED under the GNU GENERAL PUBLIC LICENSE Version 3.
This project is licensed under the GNU GENERAL PUBLIC LICENSE Version 3.
Binary file removed assets/1.jpeg
Binary file not shown.
Binary file added assets/1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 9c3cea0

Please sign in to comment.