Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for pgvecto.rs and other extensions. #47

Closed
ShelbyJenkins opened this issue Apr 1, 2024 · 13 comments
Closed

Add support for pgvecto.rs and other extensions. #47

ShelbyJenkins opened this issue Apr 1, 2024 · 13 comments
Labels
enhancement New feature or request

Comments

@ShelbyJenkins
Copy link

First of all, very nice project. I got it working when the other option I tried did not work.

I wouldn't ask for this feature request, but upon looking at the code for the archive module, I think that it would be fairly easy to download and install pg extensions.

This could be a solid use case for this project because it would allow you to easily embed a vectorDB in a rust app. Right now the rust options for vectorDBs are all server/client, and there are no options for embedded/in-process vectorDBs with rust. I know PG is not in-process, but what could be accomplished is a vectorDB that could be installed with a cargo, which would be close enough!

@brianheineman
Copy link
Contributor

brianheineman commented Apr 2, 2024

@ShelbyJenkins Thank you for suggesting this use case. I have been thinking about what the implications of adding extension support might look like for this project. The release archives for pgvecto.rs look like they would lend themselves to automated downloading and installing. This an area I would like to look into, but I cannot commit on if/when it may happen.

@bobmcwhirter
Copy link

+1, we'd also be interested in easy embedded with Trusted Language Extensions (AWS for rustpl etc) and Apache AGE (for graphy stuff)

@brianheineman brianheineman added the enhancement New feature or request label Apr 26, 2024
@dukeeagle
Copy link

Would love to see this as well! @brianheineman - is this project accepting sponsorship? I would love to more formally support it, especially as you consider extensions like this.

@brianheineman
Copy link
Contributor

brianheineman commented Jun 18, 2024

Thanks for suggestion @dukeeagle; I went ahead and enabled sponsorship on my personal profile. I would like to support downloading/installing (bundling?) PostgreSQL plugins so they can easily be used in another project I've been working on; rsql. The PostgreSQL plugin/FDW ecosystem is pretty diverse with different build and distribution systems and I want to try to avoid taking on the effort of building/releasing every plugin. From what I have seen of other projects, they just build/bundle what they need and keep adding as they go. For this project, I would like to avoid having to build/support every plugin and instead provide a solution where the plugin artifacts can be pulled from another location similar to how the postgresql binaries are handled.

I'm interested in any feedback/comments/concerns folks have.

@brianheineman
Copy link
Contributor

brianheineman commented Jun 20, 2024

Some extensions require that PostgreSQL is compiled with specific compiler options (e.g. --with-gssapi) enabled. I just released 0.11.0 of the crate to support custom PostgreSQL archives. This will allow users create their own PostgreSQL binaries and include any required plugins in the archive. The release archive should be hosted on GitHub with the same name pattern and format as those provided by postgresql-binaries. The custom URL can be specified in Settings: https://github.com/theseus-rs/postgresql-embedded/blob/main/postgresql_embedded/src/settings.rs#L21-L22.

I am planning on continuing to look into support for downloading/installing custom extensions, but this new feature can be used in the interim (and may still be required for some extensions in the future).

@spikecodes
Copy link

I made a toy repo showing an implementation of a postgresql-embedded db with pgvecto.rs installed that passes various vector operation test cases: https://github.com/portalcorp/pgevdb

Hope this helps @ShelbyJenkins @bobmcwhirter @dukeeagle

@brianheineman
Copy link
Contributor

@spikecodes thanks for providing an example for folks! In order to provide more transparency for this effort, I have created a draft PR #110 that I will update with changes as I go so folks can monitor progress and/or provide feedback.

@brianheineman
Copy link
Contributor

@ShelbyJenkins et al; I just released 0.15.0 with the initial support for postgresql extensions; you can find an example of how to use this new capability here. As part of this effort, the MSRV has also been updated to 1.80.

@spikecodes
Copy link

spikecodes commented Aug 2, 2024

This is great! @brianheineman My company actually switched from pgvecto.rs to the more popular pgvector which is cross-platform whereas pgvecto.rs only supports Linux.

If you're interested in supporting this extensions with your new postgresql_embedded install helper, I created a repo (see the Releases tabs) for zips of pgvector's extension compiled files for MacOS, Linux, and Windows: https://github.com/portalcorp/pgvector_compiled/

Check out the README if you're interested in what files get produced for each operating system. Then all you have to do is copy these into the local installation of postgresql_embedded.

@spikecodes
Copy link

spikecodes commented Aug 2, 2024

And if it helps, here's the code from a (currently private) repo that I wrote to do this cross-platform copying process:

pub async fn with_config(config: VectorDBConfig) -> Result<Self> {
		let storage_dir: PathBuf = Path::new(&config.clone().path).to_path_buf();

		let mut settings = Settings::default();
		settings.password_file = storage_dir.join(".pgpass");
		if settings.password_file.exists() {
			settings.password = std::fs::read_to_string(settings.password_file.clone())?;
		}

		let installation_dir = storage_dir.join("pg");
		let data_dir = storage_dir.join("pg_data");
		settings.installation_dir = installation_dir.clone();
		settings.data_dir = data_dir;
		settings.temporary = false;
		settings.version = VersionReq::parse(format!("={}", PG_VERSION).as_str())?;

		info!("Starting PostgreSQL v{}", PG_VERSION);
		let mut postgresql = PostgreSQL::new(settings);
		postgresql.setup().await?;
		postgresql.start().await?;

		if !postgresql.database_exists(DATABASE_NAME).await? {
			info!("Creating database '{}'", DATABASE_NAME);
			postgresql.create_database(DATABASE_NAME).await?;
		}
		let database_url = postgresql.settings().url(DATABASE_NAME);

		let pool = PgPool::connect(database_url.as_str()).await?;

		let mut db = Self {
			pool,
			config,
			postgresql,
		};

		db.setup_pg_vectors_extension().await?;
		db.setup_tables().await?;

		Ok(db)
	}

	async fn setup_pg_vectors_extension(&mut self) -> Result<()> {
		info!("Checking if pg_vectors extension is installed");
		if !self.is_pg_vectors_extension_installed().await? {
			info!("Installing pg_vectors extension");

			self.install_pg_vectors_extension().await?;
			// self.configure_pg_vectors_extension().await?;
			info!("Successfully set up pg_vectors extension");

			// Restart PostgreSQL to apply changes and reconnect pool
			self.postgresql.stop().await?;
			self.postgresql.start().await?;

			info!("Enabling pg_vectors extension");
			self.enable_pg_vectors_extension().await?;
		}
		Ok(())
	}

	async fn is_pg_vectors_extension_installed(&self) -> Result<bool> {
		Ok(self
			.postgresql
			.settings()
			.installation_dir
			.join("lib")
			.join(if cfg!(target_os = "windows") {
				"vector.dll"
			} else if cfg!(target_os = "macos") {
				"vector.dylib"
			} else {
				"vector.so"
			})
			.exists())
	}

	async fn install_pg_vectors_extension(&self) -> Result<()> {
		info!("Setting up PostgreSQL vector extension");

		// Determine the correct URL based on the operating system
		let (url, os_type) = if cfg!(target_os = "linux") {
			("https://github.com/portalcorp/pgvector_compiled/releases/latest/download/pgvector-x86_64-unknown-linux-gnu-pg16.zip", "Linux")
		} else if cfg!(target_os = "windows") {
			("https://github.com/portalcorp/pgvector_compiled/releases/latest/download/pgvector-x86_64-pc-windows-msvc-pg16.zip", "Windows")
		} else if cfg!(target_os = "macos") {
			("https://github.com/portalcorp/pgvector_compiled/releases/latest/download/pgvector-aarch64-apple-darwin-pg16.zip", "macOS")
		} else {
			return Err(anyhow::anyhow!("Unsupported operating system"));
		};

		info!("Downloading extension from {}", url);
		let response = reqwest::get(url).await?;
		let bytes = response.bytes().await?;

		// Extract zip
		let zip = PathBuf::from(&self.config.path).join("pgvector");
		info!("Extracting zip to {:?}", zip);
		zip_extract::extract(Cursor::new(bytes), &zip, true)?;

		// Get PostgreSQL directories
		let pg_dir = self.postgresql.settings().installation_dir.clone();

		match os_type {
			"Linux" => {
				let lib_dir = pg_dir.join("lib");
				let share_dir = pg_dir.join("share");
				let extension_dir = share_dir.join("extension");
				let include_dir = pg_dir
					.join("include")
					.join("server")
					.join("extension")
					.join("vector");
				std::fs::create_dir_all(lib_dir.join("bitcode").join("vector").join("src"))?;
				std::fs::create_dir_all(&include_dir)?;

				// Copy shared object file
				std::fs::copy(zip.join("lib").join("vector.so"), lib_dir.join("vector.so"))?;

				println!("Zip: {:?}", zip);
				println!("Lib: {:?}", lib_dir);

				// Copy bitcode files
				std::fs::copy(
					zip.join("lib").join("bitcode").join("vector.index.bc"),
					lib_dir.join("bitcode").join("vector.index.bc"),
				)?;
				for entry in
					std::fs::read_dir(zip.join("lib").join("bitcode").join("vector").join("src"))?
				{
					let entry = entry?;
					let path = entry.path();
					if path.is_file() {
						std::fs::copy(
							&path,
							lib_dir
								.join("bitcode")
								.join("vector")
								.join("src")
								.join(path.file_name().unwrap()),
						)?;
					}
				}

				// Copy extension files
				for entry in std::fs::read_dir(zip.join("share").join("extension"))? {
					let entry = entry?;
					let path = entry.path();
					if path.is_file() {
						std::fs::copy(&path, extension_dir.join(path.file_name().unwrap()))?;
					}
				}

				// Copy header files
				for entry in std::fs::read_dir(
					zip.join("include")
						.join("server")
						.join("extension")
						.join("vector"),
				)? {
					let entry = entry?;
					let path = entry.path();
					if path.is_file() {
						std::fs::copy(&path, include_dir.join(path.file_name().unwrap()))?;
					}
				}
			},
			"Windows" => {
				let lib_dir = pg_dir.join("lib");
				let share_dir = pg_dir.join("share");
				let extension_dir = share_dir.join("extension");
				let include_dir = pg_dir
					.join("include")
					.join("server")
					.join("extension")
					.join("vector");
				std::fs::create_dir_all(&include_dir)?;

				// Copy DLL
				std::fs::copy(
					zip.join("lib").join("vector.dll"),
					lib_dir.join("vector.dll"),
				)?;

				// Copy extension files
				for entry in std::fs::read_dir(zip.join("share").join("extension"))? {
					let entry = entry?;
					let path = entry.path();
					if path.is_file() {
						std::fs::copy(&path, extension_dir.join(path.file_name().unwrap()))?;
					}
				}

				// Copy header files
				for entry in std::fs::read_dir(
					zip.join("include")
						.join("server")
						.join("extension")
						.join("vector"),
				)? {
					let entry = entry?;
					let path = entry.path();
					if path.is_file() {
						std::fs::copy(&path, include_dir.join(path.file_name().unwrap()))?;
					}
				}
			},
			"macOS" => {
				let lib_dir = pg_dir.join("lib");
				let share_dir = pg_dir.join("share");
				let extension_dir = share_dir.join("extension");
				let include_dir = pg_dir
					.join("include")
					.join("server")
					.join("extension")
					.join("vector");
				std::fs::create_dir_all(&include_dir)?;

				// Copy shared library
				std::fs::copy(
					zip.join("lib").join("vector.dylib"),
					lib_dir.join("vector.dylib"),
				)?;

				// // Create a .so symlink for compatibility
				// std::os::unix::fs::symlink(
				// 	lib_dir.join("vector.dylib"),
				// 	lib_dir.join("vector.so"),
				// )?;

				// Copy extension files
				for entry in std::fs::read_dir(zip.join("share").join("extension"))? {
					let entry = entry?;
					let path = entry.path();
					if path.is_file() {
						std::fs::copy(&path, extension_dir.join(path.file_name().unwrap()))?;
					}
				}

				// Copy header files
				for entry in std::fs::read_dir(
					zip.join("include")
						.join("server")
						.join("extension")
						.join("vector"),
				)? {
					let entry = entry?;
					let path = entry.path();
					if path.is_file() {
						std::fs::copy(&path, include_dir.join(path.file_name().unwrap()))?;
					}
				}
			},
			_ => return Err(anyhow::anyhow!("Unsupported operating system")),
		}

		// Delete the extracted pgvector directory
		info!("Deleting extracted pgvector directory");
		std::fs::remove_dir_all(zip)?;

		info!("PostgreSQL vector extension install complete");

		Ok(())
	}

	async fn enable_pg_vectors_extension(&self) -> Result<()> {
		let query = "CREATE EXTENSION IF NOT EXISTS vector;";
		sqlx::query(query).execute(&self.pool).await?;
		Ok(())
	}

gaocegege pushed a commit to tensorchord/pgvecto.rs-docs that referenced this issue Aug 2, 2024
Support for pgvecto.rs was [requested by several users](theseus-rs/postgresql-embedded#47) of [postgresql_embedded](https://github.com/theseus-rs/postgresql-embedded); adding an adopter link for other users that may be interested.
@brianheineman
Copy link
Contributor

@spikecodes thank you for providing another extension repository. I added initial support for the portal corp extensions as part of #112; please take a look and feel free to suggest any changes/improvements in a new issue/PR.

@spikecodes
Copy link

Awesome, thanks for putting that together. I left a few comments on the review, looking forward to using this feature in production.

@brianheineman
Copy link
Contributor

I just release 0.16.0 with support for another vector extension (thanks @spikecodes) and a number of improvements to simplify the addition of new extension repositories. I expect there will be more work that needs to be done to improve and expand upon the new extension functionality; however, I am going to close this issue as I believe the core of it has been addressed. Please feel free to open new issues and/or PRs for any new bugs/features/changes. Thanks to everyone on this thread for your input!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants