-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for pgvecto.rs and other extensions. #47
Comments
@ShelbyJenkins Thank you for suggesting this use case. I have been thinking about what the implications of adding extension support might look like for this project. The release archives for pgvecto.rs look like they would lend themselves to automated downloading and installing. This an area I would like to look into, but I cannot commit on if/when it may happen. |
+1, we'd also be interested in easy embedded with Trusted Language Extensions (AWS for rustpl etc) and Apache AGE (for graphy stuff) |
Would love to see this as well! @brianheineman - is this project accepting sponsorship? I would love to more formally support it, especially as you consider extensions like this. |
Thanks for suggestion @dukeeagle; I went ahead and enabled sponsorship on my personal profile. I would like to support downloading/installing (bundling?) PostgreSQL plugins so they can easily be used in another project I've been working on; rsql. The PostgreSQL plugin/FDW ecosystem is pretty diverse with different build and distribution systems and I want to try to avoid taking on the effort of building/releasing every plugin. From what I have seen of other projects, they just build/bundle what they need and keep adding as they go. For this project, I would like to avoid having to build/support every plugin and instead provide a solution where the plugin artifacts can be pulled from another location similar to how the postgresql binaries are handled. I'm interested in any feedback/comments/concerns folks have. |
Some extensions require that PostgreSQL is compiled with specific compiler options (e.g. I am planning on continuing to look into support for downloading/installing custom extensions, but this new feature can be used in the interim (and may still be required for some extensions in the future). |
I made a toy repo showing an implementation of a Hope this helps @ShelbyJenkins @bobmcwhirter @dukeeagle |
@spikecodes thanks for providing an example for folks! In order to provide more transparency for this effort, I have created a draft PR #110 that I will update with changes as I go so folks can monitor progress and/or provide feedback. |
@ShelbyJenkins et al; I just released 0.15.0 with the initial support for postgresql extensions; you can find an example of how to use this new capability here. As part of this effort, the MSRV has also been updated to 1.80. |
This is great! @brianheineman My company actually switched from pgvecto.rs to the more popular pgvector which is cross-platform whereas pgvecto.rs only supports Linux. If you're interested in supporting this extensions with your new postgresql_embedded Check out the README if you're interested in what files get produced for each operating system. Then all you have to do is copy these into the local installation of postgresql_embedded. |
And if it helps, here's the code from a (currently private) repo that I wrote to do this cross-platform copying process: pub async fn with_config(config: VectorDBConfig) -> Result<Self> {
let storage_dir: PathBuf = Path::new(&config.clone().path).to_path_buf();
let mut settings = Settings::default();
settings.password_file = storage_dir.join(".pgpass");
if settings.password_file.exists() {
settings.password = std::fs::read_to_string(settings.password_file.clone())?;
}
let installation_dir = storage_dir.join("pg");
let data_dir = storage_dir.join("pg_data");
settings.installation_dir = installation_dir.clone();
settings.data_dir = data_dir;
settings.temporary = false;
settings.version = VersionReq::parse(format!("={}", PG_VERSION).as_str())?;
info!("Starting PostgreSQL v{}", PG_VERSION);
let mut postgresql = PostgreSQL::new(settings);
postgresql.setup().await?;
postgresql.start().await?;
if !postgresql.database_exists(DATABASE_NAME).await? {
info!("Creating database '{}'", DATABASE_NAME);
postgresql.create_database(DATABASE_NAME).await?;
}
let database_url = postgresql.settings().url(DATABASE_NAME);
let pool = PgPool::connect(database_url.as_str()).await?;
let mut db = Self {
pool,
config,
postgresql,
};
db.setup_pg_vectors_extension().await?;
db.setup_tables().await?;
Ok(db)
}
async fn setup_pg_vectors_extension(&mut self) -> Result<()> {
info!("Checking if pg_vectors extension is installed");
if !self.is_pg_vectors_extension_installed().await? {
info!("Installing pg_vectors extension");
self.install_pg_vectors_extension().await?;
// self.configure_pg_vectors_extension().await?;
info!("Successfully set up pg_vectors extension");
// Restart PostgreSQL to apply changes and reconnect pool
self.postgresql.stop().await?;
self.postgresql.start().await?;
info!("Enabling pg_vectors extension");
self.enable_pg_vectors_extension().await?;
}
Ok(())
}
async fn is_pg_vectors_extension_installed(&self) -> Result<bool> {
Ok(self
.postgresql
.settings()
.installation_dir
.join("lib")
.join(if cfg!(target_os = "windows") {
"vector.dll"
} else if cfg!(target_os = "macos") {
"vector.dylib"
} else {
"vector.so"
})
.exists())
}
async fn install_pg_vectors_extension(&self) -> Result<()> {
info!("Setting up PostgreSQL vector extension");
// Determine the correct URL based on the operating system
let (url, os_type) = if cfg!(target_os = "linux") {
("https://github.com/portalcorp/pgvector_compiled/releases/latest/download/pgvector-x86_64-unknown-linux-gnu-pg16.zip", "Linux")
} else if cfg!(target_os = "windows") {
("https://github.com/portalcorp/pgvector_compiled/releases/latest/download/pgvector-x86_64-pc-windows-msvc-pg16.zip", "Windows")
} else if cfg!(target_os = "macos") {
("https://github.com/portalcorp/pgvector_compiled/releases/latest/download/pgvector-aarch64-apple-darwin-pg16.zip", "macOS")
} else {
return Err(anyhow::anyhow!("Unsupported operating system"));
};
info!("Downloading extension from {}", url);
let response = reqwest::get(url).await?;
let bytes = response.bytes().await?;
// Extract zip
let zip = PathBuf::from(&self.config.path).join("pgvector");
info!("Extracting zip to {:?}", zip);
zip_extract::extract(Cursor::new(bytes), &zip, true)?;
// Get PostgreSQL directories
let pg_dir = self.postgresql.settings().installation_dir.clone();
match os_type {
"Linux" => {
let lib_dir = pg_dir.join("lib");
let share_dir = pg_dir.join("share");
let extension_dir = share_dir.join("extension");
let include_dir = pg_dir
.join("include")
.join("server")
.join("extension")
.join("vector");
std::fs::create_dir_all(lib_dir.join("bitcode").join("vector").join("src"))?;
std::fs::create_dir_all(&include_dir)?;
// Copy shared object file
std::fs::copy(zip.join("lib").join("vector.so"), lib_dir.join("vector.so"))?;
println!("Zip: {:?}", zip);
println!("Lib: {:?}", lib_dir);
// Copy bitcode files
std::fs::copy(
zip.join("lib").join("bitcode").join("vector.index.bc"),
lib_dir.join("bitcode").join("vector.index.bc"),
)?;
for entry in
std::fs::read_dir(zip.join("lib").join("bitcode").join("vector").join("src"))?
{
let entry = entry?;
let path = entry.path();
if path.is_file() {
std::fs::copy(
&path,
lib_dir
.join("bitcode")
.join("vector")
.join("src")
.join(path.file_name().unwrap()),
)?;
}
}
// Copy extension files
for entry in std::fs::read_dir(zip.join("share").join("extension"))? {
let entry = entry?;
let path = entry.path();
if path.is_file() {
std::fs::copy(&path, extension_dir.join(path.file_name().unwrap()))?;
}
}
// Copy header files
for entry in std::fs::read_dir(
zip.join("include")
.join("server")
.join("extension")
.join("vector"),
)? {
let entry = entry?;
let path = entry.path();
if path.is_file() {
std::fs::copy(&path, include_dir.join(path.file_name().unwrap()))?;
}
}
},
"Windows" => {
let lib_dir = pg_dir.join("lib");
let share_dir = pg_dir.join("share");
let extension_dir = share_dir.join("extension");
let include_dir = pg_dir
.join("include")
.join("server")
.join("extension")
.join("vector");
std::fs::create_dir_all(&include_dir)?;
// Copy DLL
std::fs::copy(
zip.join("lib").join("vector.dll"),
lib_dir.join("vector.dll"),
)?;
// Copy extension files
for entry in std::fs::read_dir(zip.join("share").join("extension"))? {
let entry = entry?;
let path = entry.path();
if path.is_file() {
std::fs::copy(&path, extension_dir.join(path.file_name().unwrap()))?;
}
}
// Copy header files
for entry in std::fs::read_dir(
zip.join("include")
.join("server")
.join("extension")
.join("vector"),
)? {
let entry = entry?;
let path = entry.path();
if path.is_file() {
std::fs::copy(&path, include_dir.join(path.file_name().unwrap()))?;
}
}
},
"macOS" => {
let lib_dir = pg_dir.join("lib");
let share_dir = pg_dir.join("share");
let extension_dir = share_dir.join("extension");
let include_dir = pg_dir
.join("include")
.join("server")
.join("extension")
.join("vector");
std::fs::create_dir_all(&include_dir)?;
// Copy shared library
std::fs::copy(
zip.join("lib").join("vector.dylib"),
lib_dir.join("vector.dylib"),
)?;
// // Create a .so symlink for compatibility
// std::os::unix::fs::symlink(
// lib_dir.join("vector.dylib"),
// lib_dir.join("vector.so"),
// )?;
// Copy extension files
for entry in std::fs::read_dir(zip.join("share").join("extension"))? {
let entry = entry?;
let path = entry.path();
if path.is_file() {
std::fs::copy(&path, extension_dir.join(path.file_name().unwrap()))?;
}
}
// Copy header files
for entry in std::fs::read_dir(
zip.join("include")
.join("server")
.join("extension")
.join("vector"),
)? {
let entry = entry?;
let path = entry.path();
if path.is_file() {
std::fs::copy(&path, include_dir.join(path.file_name().unwrap()))?;
}
}
},
_ => return Err(anyhow::anyhow!("Unsupported operating system")),
}
// Delete the extracted pgvector directory
info!("Deleting extracted pgvector directory");
std::fs::remove_dir_all(zip)?;
info!("PostgreSQL vector extension install complete");
Ok(())
}
async fn enable_pg_vectors_extension(&self) -> Result<()> {
let query = "CREATE EXTENSION IF NOT EXISTS vector;";
sqlx::query(query).execute(&self.pool).await?;
Ok(())
} |
Support for pgvecto.rs was [requested by several users](theseus-rs/postgresql-embedded#47) of [postgresql_embedded](https://github.com/theseus-rs/postgresql-embedded); adding an adopter link for other users that may be interested.
@spikecodes thank you for providing another extension repository. I added initial support for the portal corp extensions as part of #112; please take a look and feel free to suggest any changes/improvements in a new issue/PR. |
Awesome, thanks for putting that together. I left a few comments on the review, looking forward to using this feature in production. |
I just release 0.16.0 with support for another vector extension (thanks @spikecodes) and a number of improvements to simplify the addition of new extension repositories. I expect there will be more work that needs to be done to improve and expand upon the new extension functionality; however, I am going to close this issue as I believe the core of it has been addressed. Please feel free to open new issues and/or PRs for any new bugs/features/changes. Thanks to everyone on this thread for your input! |
First of all, very nice project. I got it working when the other option I tried did not work.
I wouldn't ask for this feature request, but upon looking at the code for the archive module, I think that it would be fairly easy to download and install pg extensions.
This could be a solid use case for this project because it would allow you to easily embed a vectorDB in a rust app. Right now the rust options for vectorDBs are all server/client, and there are no options for embedded/in-process vectorDBs with rust. I know PG is not in-process, but what could be accomplished is a vectorDB that could be installed with a cargo, which would be close enough!
The text was updated successfully, but these errors were encountered: