Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manage Access to credentialed-access projects Using Access Point Policies #2293

Open
wants to merge 11 commits into
base: dev
Choose a base branch
from

Conversation

Chrystinne
Copy link
Contributor

This Pull Request implements the use of AWS S3 Access Point policies to manage access to restricted-access projects stored in S3 buckets.

The key change in this code is to ensure scalability, enabling access management as the number of projects and users grows.

…ring projects with a 'RESTRICTED/CREDENTIALED' access policy.
…s being created for the open data bucket or the controlled data bucket. This update also changes how we grant access to users for the controlled-access dataset by using Access Points (APs). It includes creating and listing APs, creating and updating AP policies, and associating AWS users with APs.
… view to use update_data_access_point_policy instead of the old bucket policy update method. Ensure the S3 credentials exist before updating the data access point policy.
@Chrystinne Chrystinne assigned Chrystinne and bemoody and unassigned Chrystinne and bemoody Sep 17, 2024
@Chrystinne Chrystinne changed the title Manage Access to Restricted S3 Buckets Using Access Point Policies Manage Access to credentialed-access projects Using Access Point Policies Sep 17, 2024
@bemoody
Copy link
Collaborator

bemoody commented Sep 18, 2024

class AccessPoint(models.Model):
    aws = models.ForeignKey(AWS, related_name='access_points', on_delete=models.CASCADE)
    name = models.CharField(max_length=255)
    users = models.ManyToManyField('AccessPointUser', related_name='access_points')

    def __str__(self):
        return self.name

class AccessPointUser(models.Model):
    aws_id = models.CharField(max_length=20)  # Assuming AWS ID is a string like '053677451470'

    def __str__(self):
        return self.aws_id

First observation: the name of the class should probably be AWSAccessPoint or S3AccessPoint. Make it clear what this is for.

Second: we want to associate access points with particular Users, not particular AWS principals. Because:

  • If somebody removes the AWS account from their PhysioNet profile, we need to remove them from any access points that they had access to.

  • If they subsequently add a different AWS account to their PhysioNet profile, I think we want to add them back to the same access points they were using previously (so their existing scripts will work with their new AWS credentials.)

We could define users as a ManyToManyField pointing to user.User. But I think it'd be better to do something like:

class AWSAccessPointUser(models.Model):
    access_point = models.ForeignKey(AWSAccessPoint, related_name='users',
                                     on_delete=models.CASCADE)
    user = models.ForeignKey('user.User', related_name='aws_access_points',
                             on_delete=models.CASCADE)

    class Meta:
        unique_together = [('access_point', 'user')]

Which is essentially the same thing as ManyToManyField at the database level, but at the Python level it gives more flexibility for the future.

@bemoody
Copy link
Collaborator

bemoody commented Sep 18, 2024

we want to associate access points with particular Users, not particular AWS principals

Now, I say this, but to be fair, there is still an open question of whether we want to allow one person to use multiple AWS principals. Doing that, however, could be quite messy UX-wise (we'd either have to tell people "use access point X if you're using principal A, use access point Y if you're using principal B"... or else we'd have to tell people that every time they add a new principal they might be bumped to a different AP.)

Anyway, I think if we want to add that feature down the road, we can define new models at that point to support it.

@bemoody
Copy link
Collaborator

bemoody commented Sep 18, 2024

I don't like having "aws_id" as an argument to AWS.s3_uri(). What I think we want is to define an s3_uri method in the AccessPoint class.

To obtain the S3 URI for a particular authorized user, we might end up doing something like

    s3_uri = None
    if project.aws:
        if project.aws.is_private:
            ap = project.aws.access_points.filter(users__user=user).first()
            if ap:
                s3_uri = ap.s3_uri()
        else:
            s3_uri = project.aws.s3_uri()

If the region and account ID are required as part of the S3 URI for the access point, those things should be stored in the AccessPoint model (they shouldn't be hard-coded or inferred.)

@bemoody
Copy link
Collaborator

bemoody commented Sep 20, 2024

The basic concept here, I think, is to automatically grant access to everyone who has a linked AWS account and has permission to access the project.

Still some details need working out, but is that broadly how we want this to work? Or should we instead grant access only to people who request it?

I don't think it makes a huge difference, but doing the latter has some advantages: fewer APs to manage, and we get some feedback about how many people are using the feature.

@bemoody
Copy link
Collaborator

bemoody commented Sep 27, 2024

  • In upload_project_to_S3, the controlled-access bucket is created if it doesn't exist, but its bucket policy is not set. We need to set the bucket policy.

  • create_controlled_bucket_policy produces a policy that doesn't work; you want "s3:DataAccessPointAccount": settings.AWS_ACCOUNT_ID (not f"s3:DataAccessPointAccount": "{settings.AWS_ACCOUNT_ID}").

  • create_data_access_point_policy needs to restrict the s3:prefix for ListBucket actions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants