Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup hotfix script #15260

Merged
merged 36 commits into from
Jun 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
2ddef34
Initial check
jbradberry Apr 30, 2024
365f798
Print out details of all of the crosslinked roles
jbradberry Apr 30, 2024
052b1da
Specifically examine the InstanceGroup roles
jbradberry Apr 30, 2024
1c8be2c
First full check script
jbradberry Apr 30, 2024
689b9f3
Attempt to be more efficient about grouping the content types
jbradberry Apr 30, 2024
08da0a6
When checking reverse links, treat duplicate Roles different from bad…
jbradberry May 1, 2024
ce9368d
Set up Seth's bad role scenario
jbradberry May 1, 2024
1e0d5b1
Set up an enhanced version of Seth's bad role scenario
jbradberry May 1, 2024
1401be7
Treat resources with null role fks differently
jbradberry May 1, 2024
73025f3
Start a new script that can be used to examine a Role's ancestry
jbradberry May 3, 2024
3e4bdb1
Make the role_chain.py script emit a Graphviz file
jbradberry May 3, 2024
8068a32
Check for a broken ContentType -> model and log and skip
jbradberry May 6, 2024
e76a0ad
Graph out only the parent/child chains from a given Role
jbradberry May 6, 2024
08b4676
Handle the case where a resource points to a Role which isn't in the db
jbradberry May 6, 2024
d43946e
Set up a scenario where IG.use_role_id points to something no longer …
jbradberry May 6, 2024
c45bcb7
First cut at checking the role hierarchy
jbradberry May 7, 2024
e4d1053
Attempt to more thoroughly check the parents of each Role
jbradberry May 7, 2024
923311d
Exclude the team grant false positives
jbradberry May 7, 2024
c2dcbf8
Modify the role parent check logic to stay in the roles as much as po…
jbradberry May 7, 2024
804faf2
Exclude more files in the .gitignore
jbradberry May 8, 2024
1c4ddf5
Attempt to correct any crosslinked parents
jbradberry May 8, 2024
4428484
Remove the role_chain.py module
jbradberry May 8, 2024
a790c9c
Move the "test" files into their own directory
jbradberry May 8, 2024
6154aa1
First cut at detecting which foreign keys enter and exit the topology…
jbradberry May 8, 2024
3553443
Filter out the relations within the known topology tables
jbradberry May 8, 2024
dbdfa74
Split the foreign key sql script into an 'into' and 'from' portion
jbradberry May 8, 2024
70dfe3a
Adjusted foreignkeys.sql for correctness
jbradberry May 8, 2024
d3df2d4
Fix another instance where a bad resource->Role fk could throw a trac…
jbradberry May 13, 2024
94f1718
Add a readme file with instructions
jbradberry May 15, 2024
ec6a0aa
Do not throw away the container of cross-linked parents
jbradberry May 21, 2024
01b1c05
Add output of the update and deletion counts to fix.py
jbradberry May 23, 2024
1239feb
Wait until the end of the fix script to clean up orphaned roles
jbradberry May 29, 2024
5d0c7e8
Mark and rebuild the implicit_parents field for all affected roles
jbradberry May 29, 2024
c218f05
Add a new test scenario
jbradberry May 29, 2024
ec6f3bd
Guard against the role field not being populated
jbradberry Jun 4, 2024
42e00f2
This should deal correctly with the ancestor list mismatches
jbradberry Jun 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions tools/scripts/ig-hotfix/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
*~
customer-backup.tar.*
*.db
*.log
*.dot
*.png
*.tar.*
36 changes: 36 additions & 0 deletions tools/scripts/ig-hotfix/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Hotfix for Instance Groups and Roles after backup/restore corruption #

## role_check.py ##

`awx-manage shell < role_check.py 2> role_check.log > fix.py`

This checks the roles and resources on the system, and constructs a
fix.py file that will change the linkages of the roles that it finds
are incorrect. The command line above also redirects logging output to
a file. The fix.py file (and the log file) can then be examined (and
potentially modified) before performing the actual fix.

`awx-manage shell < fix.py > fix.log 2>&1`

This performs the fix, while redirecting all output to another log
file. Ideally, this file should wind up being empty after execution
completes.

`awx-manage shell < role_check.py 2> role_check2.log > fix2.py`

Re-run the check script in order to see that there are no remaining
problems. Ideally the log file will only consist of the equal-sign
lines.


## foreignkeys.sql ##

This script uses Postgres internals to determine all of the foreign
keys that cross the boundaries established by our (old) backup/restore
logic. Users have no need to run this.


## scenarios/test*.py ##

These files were used to set up corruption similar to that caused by
faulty backup/restore, for testing purposes. Do not use.
38 changes: 38 additions & 0 deletions tools/scripts/ig-hotfix/foreignkeys.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
DO $$
DECLARE
-- add table names here when they get excluded from main / included in topology dump
topology text[] := ARRAY['main_instance', 'main_instancegroup', 'main_instancegroup_instances'];

-- add table names here when they are handled by the special-case mapping
mapping text[] := ARRAY['main_organizationinstancegroupmembership', 'main_unifiedjobtemplateinstancegroupmembership', 'main_inventoryinstancegroupmembership'];
BEGIN
CREATE TABLE tmp_fk_from AS (
SELECT DISTINCT
tc.table_name,
ccu.table_name AS foreign_table_name
FROM information_schema.table_constraints AS tc
JOIN information_schema.constraint_column_usage AS ccu
ON ccu.constraint_name = tc.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY'
AND tc.table_name = ANY (topology)
AND NOT ccu.table_name = ANY (topology || mapping)
);

CREATE TABLE tmp_fk_into AS (
SELECT DISTINCT
tc.table_name,
ccu.table_name AS foreign_table_name
FROM information_schema.table_constraints AS tc
JOIN information_schema.constraint_column_usage AS ccu
ON ccu.constraint_name = tc.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY'
AND ccu.table_name = ANY (topology)
AND NOT tc.table_name = ANY (topology || mapping)
);
END $$;

SELECT * FROM tmp_fk_from;
SELECT * FROM tmp_fk_into;

DROP TABLE tmp_fk_from;
DROP TABLE tmp_fk_into;
187 changes: 187 additions & 0 deletions tools/scripts/ig-hotfix/role_check.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
from collections import defaultdict
import json
import sys

from django.contrib.contenttypes.models import ContentType
from django.db.models.fields.related_descriptors import ManyToManyDescriptor

from awx.main.fields import ImplicitRoleField
from awx.main.models.rbac import Role


team_ct = ContentType.objects.get(app_label='main', model='team')

crosslinked = defaultdict(lambda: defaultdict(dict))
crosslinked_parents = defaultdict(list)
orphaned_roles = set()


def resolve(obj, path):
fname, _, path = path.partition('.')
new_obj = getattr(obj, fname, None)
if new_obj is None:
return set()
if not path:
return {new_obj,}

if isinstance(new_obj, ManyToManyDescriptor):
return {x for o in new_obj.all() for x in resolve(o, path)}

return resolve(new_obj, path)


for ct in ContentType.objects.order_by('id'):
cls = ct.model_class()
if cls is None:
sys.stderr.write(f"{ct!r} does not have a corresponding model class in the codebase. Skipping.\n")
continue
if not any(isinstance(f, ImplicitRoleField) for f in cls._meta.fields):
continue
for obj in cls.objects.all():
for f in cls._meta.fields:
if not isinstance(f, ImplicitRoleField):
continue
r_id = getattr(obj, f'{f.name}_id', None)
try:
r = getattr(obj, f.name, None)
except Role.DoesNotExist:
sys.stderr.write(f"{cls} id={obj.id} {f.name} points to Role id={r_id}, which is not in the database.\n")
crosslinked[ct.id][obj.id][f'{f.name}_id'] = None
continue
if not r:
sys.stderr.write(f"{cls} id={obj.id} {f.name} does not have a Role object\n")
crosslinked[ct.id][obj.id][f'{f.name}_id'] = None
continue
if r.content_object != obj:
sys.stderr.write(f"{cls.__name__} id={obj.id} {f.name} is pointing to a Role that is assigned to a different object: role.id={r.id} {r.content_type!r} {r.object_id} {r.role_field}\n")
crosslinked[ct.id][obj.id][f'{f.name}_id'] = None
continue


sys.stderr.write('===================================\n')
for r in Role.objects.exclude(role_field__startswith='system_').order_by('id'):

# The ancestor list should be a superset of both parents and implicit_parents.
# Also, parents should be a superset of implicit_parents.
parents = set(r.parents.values_list('id', flat=True))
ancestors = set(r.ancestors.values_list('id', flat=True))
implicit = set(json.loads(r.implicit_parents))

if not implicit:
sys.stderr.write(f"Role id={r.id} has no implicit_parents\n")
if not parents <= ancestors:
sys.stderr.write(f"Role id={r.id} has parents that are not in the ancestor list: {parents - ancestors}\n")
crosslinked[r.content_type_id][r.object_id][f'{r.role_field}_id'] = r.id
if not implicit <= parents:
sys.stderr.write(f"Role id={r.id} has implicit_parents that are not in the parents list: {implicit - parents}\n")
crosslinked[r.content_type_id][r.object_id][f'{r.role_field}_id'] = r.id
if not implicit <= ancestors:
sys.stderr.write(f"Role id={r.id} has implicit_parents that are not in the ancestor list: {implicit - ancestors}\n")
crosslinked[r.content_type_id][r.object_id][f'{r.role_field}_id'] = r.id

# Check that the Role's generic foreign key points to a legitimate object
if not r.content_object:
sys.stderr.write(f"Role id={r.id} is missing a valid content_object: {r.content_type!r} {r.object_id} {r.role_field}\n")
orphaned_roles.add(r.id)
continue

# Check the resource's role field parents for consistency with Role.parents.all().
f = r.content_object._meta.get_field(r.role_field)
f_parent = set(f.parent_role) if isinstance(f.parent_role, list) else {f.parent_role,}
dotted = {x for p in f_parent if '.' in p for x in resolve(r.content_object, p)}
plus = set()
for p in r.parents.all():
if p.singleton_name:
if f'singleton:{p.singleton_name}' not in f_parent:
plus.add(p)
elif (p.content_type, p.role_field) == (team_ct, 'member_role'):
# Team has been granted this role; probably legitimate.
continue
elif (p.content_type, p.object_id) == (r.content_type, r.object_id):
if p.role_field not in f_parent:
plus.add(p)
elif p in dotted:
continue
else:
plus.add(p)

if plus:
plus_repr = [f"{x.content_type!r} {x.object_id} {x.role_field}" for x in plus]
sys.stderr.write(f"Role id={r.id} has cross-linked parents: {plus_repr}\n")
crosslinked_parents[r.id].extend(x.id for x in plus)

try:
rev = getattr(r.content_object, r.role_field, None)
except Role.DoesNotExist:
sys.stderr.write(f"Role id={r.id} {r.content_type!r} {r.object_id} {r.role_field} points at an object with a broken role.\n")
crosslinked[r.content_type_id][r.object_id][f'{r.role_field}_id'] = r.id
continue
if rev is None or r.id != rev.id:
if rev and (r.content_type_id, r.object_id, r.role_field) == (rev.content_type_id, rev.object_id, rev.role_field):
sys.stderr.write(f"Role id={r.id} {r.content_type!r} {r.object_id} {r.role_field} is an orphaned duplicate of Role id={rev.id}, which is actually being used by the assigned resource\n")
orphaned_roles.add(r.id)
elif not rev:
sys.stderr.write(f"Role id={r.id} {r.content_type!r} {r.object_id} {r.role_field} is pointing to an object currently using no role\n")
crosslinked[r.content_type_id][r.object_id][f'{r.role_field}_id'] = r.id
else:
sys.stderr.write(f"Role id={r.id} {r.content_type!r} {r.object_id} {r.role_field} is pointing to an object using a different role: id={rev.id} {rev.content_type!r} {rev.object_id} {rev.role_field}\n")
crosslinked[r.content_type_id][r.object_id][f'{r.role_field}_id'] = r.id
continue


sys.stderr.write('===================================\n')


print(f"""\
from collections import Counter

from django.contrib.contenttypes.models import ContentType

from awx.main.fields import ImplicitRoleField
from awx.main.models.rbac import Role


delete_counts = Counter()
update_counts = Counter()

""")


print("# Resource objects that are pointing to the wrong Role. Some of these")
print("# do not have corresponding Roles anywhere, so delete the foreign key.")
print("# For those, new Roles will be constructed upon save.\n")
print("queue = set()\n")
for ct, objs in crosslinked.items():
print(f"cls = ContentType.objects.get(id={ct}).model_class()\n")
for obj, kv in objs.items():
print(f"c = cls.objects.filter(id={obj}).update(**{kv!r})")
print("update_counts.update({cls._meta.label: c})")
print(f"queue.add((cls, {obj}))")

print("\n# Role objects that are assigned to objects that do not exist")
for r in orphaned_roles:
print(f"c = Role.objects.filter(id={r}).update(object_id=None)")
print("update_counts.update({'main.Role': c})")
print(f"_, c = Role.objects.filter(id={r}).delete()")
print("delete_counts.update(c)")

print('\n\n')
for child, parents in crosslinked_parents.items():
print(f"r = Role.objects.get(id={child})")
print(f"r.parents.remove(*Role.objects.filter(id__in={parents!r}))")
print(f"queue.add((r.content_object.__class__, r.object_id))")

print('\n\n')
print('print("Objects deleted:", dict(delete_counts.most_common()))')
print('print("Objects updated:", dict(update_counts.most_common()))')

print("\n\nfor cls, obj_id in queue:")
print(" role_fields = [f for f in cls._meta.fields if isinstance(f, ImplicitRoleField)]")
print(" obj = cls.objects.get(id=obj_id)")
print(" for f in role_fields:")
print(" r = getattr(obj, f.name, None)")
print(" if r is not None:")
print(" print(f'updating implicit parents on Role {r.id}')")
print(" r.implicit_parents = '[]'")
print(" r.save()")
print(" obj.save()")
19 changes: 19 additions & 0 deletions tools/scripts/ig-hotfix/scenarios/test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
from django.db import connection
from awx.main.models import InstanceGroup

InstanceGroup.objects.filter(name__in=('green', 'yellow', 'red')).delete()

green = InstanceGroup.objects.create(name='green')
red = InstanceGroup.objects.create(name='red')
yellow = InstanceGroup.objects.create(name='yellow')

for ig in InstanceGroup.objects.all():
print((ig.id, ig.name, ig.use_role_id))

with connection.cursor() as cursor:
cursor.execute("UPDATE main_instancegroup SET use_role_id = NULL WHERE name = 'red'")
cursor.execute(f"UPDATE main_instancegroup SET use_role_id = {green.use_role_id} WHERE name = 'yellow'")

print("=====================================")
for ig in InstanceGroup.objects.all():
print((ig.id, ig.name, ig.use_role_id))
20 changes: 20 additions & 0 deletions tools/scripts/ig-hotfix/scenarios/test2.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
from django.db import connection
from awx.main.models import InstanceGroup

InstanceGroup.objects.filter(name__in=('green', 'yellow', 'red')).delete()

green = InstanceGroup.objects.create(name='green')
red = InstanceGroup.objects.create(name='red')
yellow = InstanceGroup.objects.create(name='yellow')

for ig in InstanceGroup.objects.all():
print((ig.id, ig.name, ig.use_role_id))

with connection.cursor() as cursor:
cursor.execute(f"UPDATE main_rbac_roles SET object_id = NULL WHERE id = {red.use_role_id}")
cursor.execute("UPDATE main_instancegroup SET use_role_id = NULL WHERE name = 'red'")
cursor.execute(f"UPDATE main_instancegroup SET use_role_id = {green.use_role_id} WHERE name = 'yellow'")

print("=====================================")
for ig in InstanceGroup.objects.all():
print((ig.id, ig.name, ig.use_role_id))
28 changes: 28 additions & 0 deletions tools/scripts/ig-hotfix/scenarios/test3.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
from django.db import connection
from awx.main.models import InstanceGroup

InstanceGroup.objects.filter(name__in=('green', 'yellow', 'red', 'blue')).delete()

green = InstanceGroup.objects.create(name='green')
red = InstanceGroup.objects.create(name='red')
yellow = InstanceGroup.objects.create(name='yellow')
blue = InstanceGroup.objects.create(name='blue')

for ig in InstanceGroup.objects.all():
print((ig.id, ig.name, ig.use_role_id))

with connection.cursor() as cursor:
cursor.execute("ALTER TABLE main_instancegroup DROP CONSTRAINT main_instancegroup_use_role_id_48ea7ecc_fk_main_rbac_roles_id")

cursor.execute(f"UPDATE main_rbac_roles SET object_id = NULL WHERE id = {red.use_role_id}")
cursor.execute(f"DELETE FROM main_rbac_roles_parents WHERE from_role_id = {blue.use_role_id} OR to_role_id = {blue.use_role_id}")
cursor.execute(f"DELETE FROM main_rbac_role_ancestors WHERE ancestor_id = {blue.use_role_id} OR descendent_id = {blue.use_role_id}")
cursor.execute(f"DELETE FROM main_rbac_roles WHERE id = {blue.use_role_id}")
cursor.execute("UPDATE main_instancegroup SET use_role_id = NULL WHERE name = 'red'")
cursor.execute(f"UPDATE main_instancegroup SET use_role_id = {green.use_role_id} WHERE name = 'yellow'")

cursor.execute("ALTER TABLE main_instancegroup ADD CONSTRAINT main_instancegroup_use_role_id_48ea7ecc_fk_main_rbac_roles_id FOREIGN KEY (use_role_id) REFERENCES public.main_rbac_roles(id) DEFERRABLE INITIALLY DEFERRED NOT VALID")

print("=====================================")
for ig in InstanceGroup.objects.all():
print((ig.id, ig.name, ig.use_role_id))
26 changes: 26 additions & 0 deletions tools/scripts/ig-hotfix/scenarios/test4.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
from django.db import connection
from awx.main.models import InstanceGroup

InstanceGroup.objects.filter(name__in=('green', 'yellow', 'red')).delete()

green = InstanceGroup.objects.create(name='green')
red = InstanceGroup.objects.create(name='red')
yellow = InstanceGroup.objects.create(name='yellow')

for ig in InstanceGroup.objects.all():
print((ig.id, ig.name, ig.use_role_id))

with connection.cursor() as cursor:
cursor.execute("UPDATE main_instancegroup SET use_role_id = NULL WHERE name = 'red'")
cursor.execute(f"UPDATE main_instancegroup SET use_role_id = {green.use_role_id} WHERE name = 'yellow'")

green.refresh_from_db()
red.refresh_from_db()
yellow.refresh_from_db()
green.save()
red.save()
yellow.save()

print("=====================================")
for ig in InstanceGroup.objects.all():
print((ig.id, ig.name, ig.use_role_id))
Loading