Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-Database Configuration #1562

Closed
wolfgangwalther opened this issue Jul 20, 2020 · 16 comments
Closed

In-Database Configuration #1562

wolfgangwalther opened this issue Jul 20, 2020 · 16 comments
Assignees
Labels
config related to the configuration options

Comments

@wolfgangwalther
Copy link
Member

This is a suggestion for in-database configuration, based on ideas in #1429 and #1559 (comment).

Currently, all configuration is done file-based in postgrest.conf. After the recent addition of #1544, reloading the config on the fly is now possible. Not sure whether this is already possible with NOTIFY as mentioned in #1542. As mentioned in #1544 this reloads only part of the config, while other parts (like database connection info) need a restart of the server.

The downside of the current implementation is, that we don't have a single source of truth for everything set in the config file, that e.g. references database objects. Stuff like:

  • db-schema needs to be adjusted together with the schema created/dropped (multi-tenancy Feature Request Configure to Exclude Schemas #1521)

  • pre-request needs to be set to the corresponding function

  • jwt-secret is often copied as app.settings.jwt-secret or something like that

It would be much nicer, if we could (optionally) have the whole reloadable configuration in the database as well.

Using local, session and database variables

We are already using local variables (SET LOCAL) to set some values per transaction (request.*, response.*, app.settings.*). We can extend this to use session and database variables as well and add all the reloadable config options as follows:

  • Use ALTER DATABASE xyz SET pgrst.db_schema='public'; to set config options, just like in the current postgrest.conf.

  • For backwards compatibility: Keep a copy of the config file postgrest.conf in memory and add the reloadable parts to each session with SET SESSION. Those will overwrite the database variables automatically. (This will also support use-cases where ALTER DATABASE permissions are not available).

  • Query all those config variables when updating the schema cache. Since the SET SESSION will have been executed for this session as well, this will result in the currently active config, with both in-database and file-based merged together. Use this config from the schema cache for operation.

  • For each request, add the request.* and response.* variables as SET LOCAL like we do already.

This would allow us to:

  • use only one variable for the jwt secret, instead of two

  • do stuff like this:

CREATE SCHEMA new_customer;
ALTER DATABASE mydb SET pgrst.db_schema='...,new_customer';
NOTIFY pgrst;

Note: When changing database variables, those are only in effect for new sessions. That means, we would need to recreate all existing sessions when reloading the schema cache.

Extensibility

If we were to implement the pg_postgrest SQL extension to move parts of the code to the database as mentioned in #1511 (comment), we could add a function to set all those variables like this:

CREATE FUNCTION pgrst.set (vars JSON, local BOOL) RETURNS void
AS $$
-- pseudo-code
FOR EACH key, value IN vars:
  SET LOCAL|SESSION key=value;
$$;

Instead of calling SET manually, we would then make the call to this function to set the reloadable config options on a SESSION level and to set the request variables on a LOCAL level.

This would allow overriding this function. By overriding this with a custom implementation, users could make use of version specific features for new PostgreSQL versions, while we still support all of them. When schema variables are finally available, this will allow great stuff like:

CREATE VARIABLE pgrst.jwt_secret AS TEXT DEFAULT 'xxx';
GRANT READ ON pgrst.jwt_secret TO auth_user;
ALTER FUNCTION create_token OWNER TO auth_user;
ALTER FUNCTION create_token SECURITY DEFINER;

Now the jwt_secret would be available in the database for token creation, but it would only be readable by the create_token function - and not by any sql-injected* request or something like this.

* not that this should even be possible... but still ;)

@dwagin
Copy link
Contributor

dwagin commented Jul 20, 2020

@wolfgangwalther pgrst.db_schema it's bad idea, this prevents you from using the same database for multiple APIs.

@wolfgangwalther
Copy link
Member Author

@dwagin I see your point. You could still use the current postgrest.conf setup to run multiple instances / db-schema configs independently in the first step.

However, the problem will come up again with the pg_postgrest SQL extension. Since those postgrest instances can run different versions of postgrest, the pg extension would have to be installed multiple times in different versions. To be able to do this, we would need a non-reloadable config option pgrst_schema that defaults to pgrst to set the schema in which to install the extension. This could then be used as a prefix for the config variables as well (which makes it more like an "instance name" conceptually?), so pgrst.db_schema would then become instancename.db_schema.

@steve-chavez
Copy link
Member

The schema variables proposal is really interesting. Nice to see it keeps progressing.

add the reloadable parts to each session with SET SESSION

I'm unsure of how to do that with Hasql. The app.settings.* vars are actually LOCAL because of that. They used to be at the SESSION level, but the vars dissapeared when pgrest recovered a lost connection. So I changed them to LOCAL.

I guess it should be possible to attach the session vars to each connection in the Hasql pool, but not sure if that's supported(maybe we need to patch upstream).


To keep it simple in the meantime, we could just get the db settings done with ALTER DATABASE . It would only work for users that control their db.

This could then be used as a prefix for the config variables as well

Yeah, prefix in the config could be like:

db-settings = "pgrst.instance1"

And then we'd get the settings from:

ALTER DATABASE mydb 
SET pgrst.instance1.db_schema='..'
SET pgrst.instance1.pre-request='...'
SET pgrst.instance1.db_extra_search_path='...'

@wolfgangwalther wolfgangwalther added the config related to the configuration options label Nov 22, 2020
@steve-chavez
Copy link
Member

Just learned about the pg_db_role_setting, with that we can query the database settings.

When doing ALTER DATABASE.. SET "a.setting", things like SHOW/current_setting "a.setting" only work after login. However pg_db_role_setting shows the setting without relogin.

So we could query pg_db_role_setting at startup(or reload) when the proposed db-settings option is used.

@wolfgangwalther What do you think about implemeting this just for the ALTER DATABASE case first?

@wolfgangwalther
Copy link
Member Author

Yeah, prefix in the config could be like:

db-settings = "pgrst.instance1"

Let's make that db-settings = "instance1" -> SET pgrst.instance1.db_schema='..'. So add the pgrst. in any case, to make sure we always stay in our namespace.

Just learned about the pg_db_role_setting, with that we can query the database settings.

This is in fact great!

I'm unsure of how to do that with Hasql. The app.settings.* vars are actually LOCAL because of that. They used to be at the SESSION level, but the vars dissapeared when pgrest recovered a lost connection. So I changed them to LOCAL.

I guess it should be possible to attach the session vars to each connection in the Hasql pool, but not sure if that's supported(maybe we need to patch upstream).

SET SESSION was a bad idea - I now have a better one, where we don't need it at all. The suggestion upthread was to have the following "order" of loading the config and overriding values:

config file > database settings

So config file values would take precedence over database settings. This doesn't make sense.

A better order, including environment variables (#1624):

role settings > database settings > environment variables > config file

So, the config file is loaded first - this is the base. Environment variables can override those settings. Only with those two, the "restart-needed" settings can be set. Once those are set, the database connection can be made and the settings can be queried from pg_db_role_setting. Those settings then override those settings that are already there and are not "restart-needed".

No SET SESSION or whatever needed so far.

Settings should probably be loaded whenever the schema cache is reloaded, but in advance of that. New settings might change the queries we need to make (think db-schema). Maybe we can also make the schema cache "database settings aware", so we can provide the config/env defaults via query and read the database values live. Then we wouldn't need two steps.

When doing ALTER DATABASE.. SET "a.setting", things like SHOW/current_setting "a.setting" only work after login. However pg_db_role_setting shows the setting without relogin.

Ah, I think this can solve two problems:

  • No relogin / reconnect needed. Just read the table and then set all the relevant parameters (e.g. app.settings) in the transaction as usual. Should always be as-live-as the schema cache.

  • ALTER ROLE ... SET does not work at all with the roles, that the postgrest authenticator switches to with SET ROLE. So maybe we can have a small query like SELECT set_config(...) FROM pg_db_role_setting WHERE role_from_jwt to make it as much like a login as possible.

So we could query pg_db_role_setting at startup(or reload) when the proposed db-settings option is used.
@wolfgangwalther What do you think about implemeting this just for the ALTER DATABASE case first?

Yes, that sounds like a plan!

@steve-chavez
Copy link
Member

@wolfgangwalther How about if for dynamic configuration we (ab?)use our NOTIFY channel.

Similarly to your idea, we start from the config file as the base but then allow modifying it with a NOTIFY payload:

NOTIFY pgrst, '{ "db-schemas" : "v1, v2, v3", "jwt-secret": "reallyreally.." }';

That one would trigger a SIGUSR2 automatically. Good thing is that would work without ALTER DATABASE, no need for high privileges.

The only issue I see is: How to check what's the current config after many NOTIFYs?

Printing them on the logs doesn't seem appropriate. The other thing that occurs to me is to have another command that NOTIFYes on another channel:

-- users sends
NOTIFY pgrst, '{ "command": "--dump-config" }';

-- postgrest listens and then also notifies on another channel with
NOTIFY pgrst_reply, 'full --dump-config here';

-- clients gets the reply on
LISTEN pgrst_reply;
Asynchronous notification "pgrst" with payload "{"jwt-secret": "reallyreally..", "db-schemas": "v1,v2,v3"}" received from server process with PID 7352.

-- I think we would not surpass the payload limit of 8k since our config is small

Perhaps it's too complicated, but I'm not seeing better options for now. WDYT?

@wolfgangwalther
Copy link
Member Author

How about if for dynamic configuration we (ab?)use our NOTIFY channel.

For me, the "In-Database configuration" is not dynamic. Maybe I just mis-understand "dynamic" here, though. ALTER DATABASE is persistent across restarts of both postgres and postgrest - while configuration via NOTIFY seems not to be?

Is your suggestion an extension on top of mine or an alternative?

Similarly to your idea, we start from the config file as the base but then allow modifying it with a NOTIFY payload:
[...]
That one would trigger a SIGUSR2 automatically. Good thing is that would work without ALTER DATABASE, no need for high privileges.

The need for high privileges for ALTER DATABASE is a good thing. Changing postgrest configuration settings should be protected by high privileges. NOTIFY does not need any privileges at all. This seems like a major break of our security model.

Perhaps it's too complicated, but I'm not seeing better options for now. WDYT?

I'm not sure what problem is solved with that.

Back to my question above: If this was an alternative to my revised suggestion here, then I would see better options... namely my revised suggestion :D. What problems do you see with it?

If this was an extension to my revised suggestion.. I don't see the use-case yet. Why do I need to change configuration settings dynamically, i.e. non-persistent?

@steve-chavez
Copy link
Member

@wolfgangwalther Right, that was a bad idea. I wanted to avoid the ALTER DATABASE because it's not possible on cloud providers. But as you mention it's important because it requires high privileges and it's also persistent.

So I think it's all set for experimenting with an alter db implementation. I'll do that and come back with feedback.

@wolfgangwalther
Copy link
Member Author

@wolfgangwalther Right, that was a bad idea. I wanted to avoid the ALTER DATABASE because it's not possible on cloud providers.

Is ALTER ROLE xx SET.. possible on cloud providers? Maybe we can extend it further that way, when we have the ALTER DATABASE in place. We can set those on the authenticator role and then it should be just a simple extension of the query to pg_db_role_setting to something like COALESCE(authenticator_role, database).

@steve-chavez
Copy link
Member

@wolfgangwalther Yes, AFAICT google cloud sql and aws rds allow ALTER ROLE.

pg_db_role_setting to something like COALESCE(authenticator_role, database).

Perfect! That pretty much solves it.

@steve-chavez
Copy link
Member

One note here. According to pg docs:

A role with CREATEROLE privilege can also alter and drop other roles.

That's not true for changing placeholder GUCs though.

postgres> create role x createrole noinherit login;
postgres> create database x;
postgres>\q
psql -U x
x> alter role x set x.value to 1;
2020-12-11 19:48:27.452 -05 [21008] ERROR:  permission denied to set parameter "x.value"
2020-12-11 19:48:27.452 -05 [21008] STATEMENT:  alter role x set "x"."value" to 1;
ERROR:  42501: permission denied to set parameter "x.value"
LOCATION:  validate_option_array_item, guc.c:10648

-- changing regular gucs does work though 
x> alter role x set client_min_messages to warning;
ALTER ROLE

If the role has SUPERUSER, then that works as usual.

Freenode chat log(trimmed)
RhodiumToad | only a superuser can set a _placeholder_ GUC on a role
RhodiumToad | for real GUCs created by actual modules, the permissions of the real GUC apply
RhodiumToad | but one of the restrictions of abusing placeholder GUCs for application variables is that only the superuser
            | can set per-role or per-db values
RhodiumToad | a placeholder is a name of the form x.y that hasn't been declared using DefineCustomVariable
RhodiumToad | the significance of it being a placeholder is that the db doesn't know what permissions checks would apply to
            | setting an actual value for it.

@wolfgangwalther
Copy link
Member Author

the significance of it being a placeholder is that the db doesn't know what permissions checks would apply to setting an actual value for it.

That's really inconsistent :/. If placeholder GUCs can only be set by superusers because of security concerns - why can we set them easily with SET or SET LOCAL with any unprivileged user?

for real GUCs created by actual modules, the permissions of the real GUC apply
a placeholder is a name of the form x.y that hasn't been declared using DefineCustomVariable

It doesn't seem to hard to set up an extension that registers those GUCS. Here's an example from pg_cron:

https://github.com/citusdata/pg_cron/blob/26b127072e935f09c98aed243fad02842f315a85/src/pg_cron.c#L180-L253

But that would hardly help, because custom extensions are not possible with cloud providers either, right?

@steve-chavez
Copy link
Member

That's really inconsistent :/

Yeah :/. The worse part is that AWS RDS doesn't let creating a role with the SUPERUSER option. It has an rds_superuser role that you can assign by grant rds_superuser to <role>.

So this makes our ALTER ROLE solution incompatible with cloud providers. It's still good for self-hosted though.

because custom extensions are not possible with cloud providers either, right?

That's true, but with enough usage cloud providers could(potentially) include our extension.

Registering the GUCs looks interesting, perhaps that can be part of postgrest-contrib.

@steve-chavez
Copy link
Member

steve-chavez commented Jan 12, 2021

pgrst.db_schema it's bad idea, this prevents you from using the same database for multiple APIs.

@dwagin FYI, now we're planning to use ALTER ROLE on the authenticator role instead, like this:

ALTER ROLE postgrest_test_authenticator SET pgrst."jwt-secret" = 'REALLYREALLYREALLYREALLYVERYSAFE';
ALTER ROLE postgrest_test_authenticator SET pgrst."db-schemas" = 'test, tenant1, tenant2';

So with that you could use the same db for multiple APIs, by creating different authenticator ROLEs(with their own config options) for other postgrest instances.

Would that work for you?

@steve-chavez
Copy link
Member

It doesn't seem to hard to set up an extension that registers those GUCS. Here's an example from pg_cron:
https://github.com/citusdata/pg_cron/blob/26b127072e935f09c98aed243fad02842f315a85/src/pg_cron.c#L180-L25

@wolfgangwalther Without the prefix config, that might be possible now(since the GUCs are defined statically).

But that would hardly help, because custom extensions are not possible with cloud providers either, right?

DigitalOcean did tutorials for us before, once we do our postgrest-contrib extension, we could ask them to include it on their managed db. That would be a start.

@steve-chavez
Copy link
Member

Implemented in #1729

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
config related to the configuration options
Development

No branches or pull requests

3 participants