Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion] how should we provide access to internal cluster DBs? #11

Open
OriHoch opened this issue Jan 24, 2018 · 7 comments
Open

[discussion] how should we provide access to internal cluster DBs? #11

OriHoch opened this issue Jan 24, 2018 · 7 comments

Comments

@OriHoch
Copy link
Contributor

OriHoch commented Jan 24, 2018

developers and testers need a way to debug with production data, to do that they need access to the production / staging DB data

possible options:

  • don't keep DBs in the cluster, use external services
  • expose the DB publicly in a secure and authenticated way
    • each app might use a different DB (volunteers use mongo, drupal uses mysql, spark uses mariadb), need to check if there is a cross-DB way to do it, or if we need to do some work on each separate DB
  • provide daily SQL dumps
    • this is provided today by each DBs Kubernetes Helm chart - as part of the backup process
  • provide pre-populated docker DB image (this is used for Drupal)
@ghost
Copy link

ghost commented Jan 24, 2018 via email

@ghost
Copy link

ghost commented Jan 24, 2018

In any case, exposing the DB publicly is not a good option.
That’s just asking for trouble :)

@mitraed
Copy link

mitraed commented Feb 2, 2018

why db should be exposed to the public ?
create a vpn server (openswan/strong-swan etc) and let people connect to it using vpn.
this would create encrypted vpn network to manage all systems in a secured way.
this is not related only to dbs.

also we should also classify what these dbs hold. and if they hold sensitive data.

@OriHoch
Copy link
Contributor Author

OriHoch commented Feb 3, 2018

this DBs all contain personal / private data so I guess I would consider them sensitive

Personally I'm opposed to giving direct access to DB and prefer to keep external access to a minimum
I think that most user needs can be accommodated by -

  1. adminer - db web UI, deployed inside the cluster, secured by nginx http auth
  2. daily DB dumps
  3. Metabase or other BI tool which allows to "ask questions" about the data, across multiple DBs. deployed inside the cluster, secured by Google oauth (or other methods)

points 1 and 2 are already implemented and works

I started working on 3 - hopefully will be ready next week

we will have a discussion about it in our weekly meeting - Monday 21:30-23:30 at IronSource Israel (Can do a remote video chat if anyone wants)

@mitraed
Copy link

mitraed commented Feb 4, 2018

we need to find the fine line between having a system that works and how we protect this data.
i think this should be included in our security management. and our method of working with remote servers is a part of this. ie to create a policy .

what about all other servers ? how do we manage them ?

access to db from which source ip addresses exactly ?
who would maintain the security groups ips you open ? how would you know they are opened ?
holding PII details have certain legislation rules also regarding encryption and masking .
i dont know where and how the topology is built, but db should be protected differently than web.

a simple decision on vpn would solve all these.
a policy where DBs will be accessible only via vpn would retire the approach of managing security groups.

@mitraed
Copy link

mitraed commented Feb 4, 2018

will join to slack discuss there :)

@OriHoch
Copy link
Contributor Author

OriHoch commented Feb 4, 2018

good to have you @mitraed - indeed a lot of work ahead of us :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants