Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added an option to retrospective set CPU affinities for each process running on GPUs #1

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

gavins13
Copy link

Added an option to retrospective set CPU affinities for each process running on GPUs. The CPU affinities associated with each GPU is defined in the file called cpu_affinities.json. The CPU affinity is only set if a flag -t or --taskset is used and is specified in cpu_affinities.json.

Copy link
Owner

@mseitzer mseitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Gavin,

thanks for the PR! Seems like very useful functionality. I added a couple of comments with changes I would like to make before merging this in. It would be great if you can address them.

Greetings to London!

Comment on lines +152 to +155
pid=pid,
cpus=','.join(cpus),
ssh_timeout=ssh_timeout,
cmd_timeout=cmd_timeout)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please align indentation to bracket

def print_gpu_infos(server, gpu_infos, run_ps, run_get_real_names,
filter_by_user=None,
translate_to_real_names=False):
def print_gpu_infos(server, gpu_infos, run_ps, run_taskset,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of including the taskset functionality in print_gpu_infos, I would factor this out into its own method, e.g. set_cpu_affinities.

@@ -197,6 +231,12 @@ def print_gpu_infos(server, gpu_infos, run_ps, run_get_real_names,
else:
users_by_pid = {}

if server in cpu_affinities.keys():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be just if server in cpu_affinities:

Comment on lines +236 to +237
for pid in gpu_info["pids"]:
gpu_cpus=cpu_affinities[server]["affinities"][str(gpu_info["idx"])]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use single quotes to match with the rest of the script :)

@@ -238,6 +278,14 @@ def main(argv):
error('Could not open server file {}'.format(args.server_file))
return

try:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be under if args.taskset so the file is only read when requested by the user

run_get_real_names = partial(get_real_names_remote,
server=server,
ssh_timeout=args.ssh_timeout,
cmd_timeout=args.cmd_timeout)

run_taskset = NULL_FUNCTION if args.taskset is False else run_taskset
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed anymore with the changes below

SERVER_TASKSET_PATH = os.path.join(os.path.dirname(os.path.realpath(sys.argv[0])),
DEFAULT_TASKSET_FILE)

NULL_FUNCTION = lambda *args, **kwargs : None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed anymore with my changes requested below

Comment on lines 1 to 66
{
"monal04.doc.ic.ac.uk": {
"affinities": {
"0": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
"1": [26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38],
"2": [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25],
"3": [39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51]
}
},
"monal03.doc.ic.ac.uk": {
"affinities": {
"0": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
"1": [26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38],
"2": [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25],
"3": [39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51]
}
},
"monal05.doc.ic.ac.uk": {
"affinities": {
"0": [0,1,2],
"1": [3,4,5],
"2": [6,7,8],
"3": [9,10,11],
"4": [12,13,28],
"5": [29,30,31],
"6": [32,33,34],
"7": [35,36,37],
"8": [38,39,46,47,48,49],
"9": [40,41,42,43,44,45]
}
},
"monal06.doc.ic.ac.uk": {
"affinities": {
"0": [0,1,2],
"1": [3,4,5],
"2": [6,7,8],
"3": [9,10,11],
"4": [12,13,28],
"5": [29,30,31],
"6": [32,33,34],
"7": [35,36,37],
"8": [38,39,46,47,48,49],
"9": [40,41,42,43,44,45]
}
},
"lory.doc.ic.ac.uk": {
"affinities": {
"0": [0,1,2,3,4],
"1": [5,6,7,8,9],
"2": [10,11,12,13,14],
"3": [15,16,17,18,19],
"4": [40,41,42,43,44],
"5": [45,46,47,48,49],
"6": [50,51,52,53,54],
"7": [55,56,57,58,59],
"8": [20,21,22,23,24],
"9": [25,26,27,28,29],
"10": [30,31,32,33,34],
"11": [35,36,37,38,39],
"12": [60,61,62,63,64],
"13": [65,66,67,68, 69],
"14": [70,71,72,73,74],
"15": [75,76,77,78,79]
}
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will ask you to make a more neutral example file, e.g.

{
    "example.com" {
        "affinities": {
            "0": [0, 1, 2, 3, 4],
            "1": [5, 6, 7, 8, 9],
            "2": [10, 11, 12, 13, 14],
            "3": [15, 16, 17, 18, 19]
        }
    },
    "myserver.com": {
         "affinities": {
            "0": [0, 1, 2, 3],
            "1": [4, 5, 6, 7]
         }
    }
}

Comment on lines +1 to +5
monal03.doc.ic.ac.uk
monal04.doc.ic.ac.uk
monal05.doc.ic.ac.uk
monal06.doc.ic.ac.uk
lory.doc.ic.ac.uk
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should stay empty.

Comment on lines 1 to 6
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
script=python $DIR/gpu_monitor.py -v -t

cmd="watch -n 300 "
$cmd "$script"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not include this script into the repository. If you want, you can add a section at the end of the README, e.g. Periodically Setting CPU Affinities and put those 4 lines there, maybe explain a little bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants