-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added an option to retrospective set CPU affinities for each process running on GPUs #1
base: master
Are you sure you want to change the base?
Conversation
add hyper-threading cores to monal05 & 06
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Gavin,
thanks for the PR! Seems like very useful functionality. I added a couple of comments with changes I would like to make before merging this in. It would be great if you can address them.
Greetings to London!
pid=pid, | ||
cpus=','.join(cpus), | ||
ssh_timeout=ssh_timeout, | ||
cmd_timeout=cmd_timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please align indentation to bracket
def print_gpu_infos(server, gpu_infos, run_ps, run_get_real_names, | ||
filter_by_user=None, | ||
translate_to_real_names=False): | ||
def print_gpu_infos(server, gpu_infos, run_ps, run_taskset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of including the taskset functionality in print_gpu_infos
, I would factor this out into its own method, e.g. set_cpu_affinities
.
@@ -197,6 +231,12 @@ def print_gpu_infos(server, gpu_infos, run_ps, run_get_real_names, | |||
else: | |||
users_by_pid = {} | |||
|
|||
if server in cpu_affinities.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be just if server in cpu_affinities:
for pid in gpu_info["pids"]: | ||
gpu_cpus=cpu_affinities[server]["affinities"][str(gpu_info["idx"])] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use single quotes to match with the rest of the script :)
@@ -238,6 +278,14 @@ def main(argv): | |||
error('Could not open server file {}'.format(args.server_file)) | |||
return | |||
|
|||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be under if args.taskset
so the file is only read when requested by the user
run_get_real_names = partial(get_real_names_remote, | ||
server=server, | ||
ssh_timeout=args.ssh_timeout, | ||
cmd_timeout=args.cmd_timeout) | ||
|
||
run_taskset = NULL_FUNCTION if args.taskset is False else run_taskset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed anymore with the changes below
SERVER_TASKSET_PATH = os.path.join(os.path.dirname(os.path.realpath(sys.argv[0])), | ||
DEFAULT_TASKSET_FILE) | ||
|
||
NULL_FUNCTION = lambda *args, **kwargs : None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed anymore with my changes requested below
cpu_affinities.json
Outdated
{ | ||
"monal04.doc.ic.ac.uk": { | ||
"affinities": { | ||
"0": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], | ||
"1": [26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38], | ||
"2": [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], | ||
"3": [39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51] | ||
} | ||
}, | ||
"monal03.doc.ic.ac.uk": { | ||
"affinities": { | ||
"0": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], | ||
"1": [26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38], | ||
"2": [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], | ||
"3": [39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51] | ||
} | ||
}, | ||
"monal05.doc.ic.ac.uk": { | ||
"affinities": { | ||
"0": [0,1,2], | ||
"1": [3,4,5], | ||
"2": [6,7,8], | ||
"3": [9,10,11], | ||
"4": [12,13,28], | ||
"5": [29,30,31], | ||
"6": [32,33,34], | ||
"7": [35,36,37], | ||
"8": [38,39,46,47,48,49], | ||
"9": [40,41,42,43,44,45] | ||
} | ||
}, | ||
"monal06.doc.ic.ac.uk": { | ||
"affinities": { | ||
"0": [0,1,2], | ||
"1": [3,4,5], | ||
"2": [6,7,8], | ||
"3": [9,10,11], | ||
"4": [12,13,28], | ||
"5": [29,30,31], | ||
"6": [32,33,34], | ||
"7": [35,36,37], | ||
"8": [38,39,46,47,48,49], | ||
"9": [40,41,42,43,44,45] | ||
} | ||
}, | ||
"lory.doc.ic.ac.uk": { | ||
"affinities": { | ||
"0": [0,1,2,3,4], | ||
"1": [5,6,7,8,9], | ||
"2": [10,11,12,13,14], | ||
"3": [15,16,17,18,19], | ||
"4": [40,41,42,43,44], | ||
"5": [45,46,47,48,49], | ||
"6": [50,51,52,53,54], | ||
"7": [55,56,57,58,59], | ||
"8": [20,21,22,23,24], | ||
"9": [25,26,27,28,29], | ||
"10": [30,31,32,33,34], | ||
"11": [35,36,37,38,39], | ||
"12": [60,61,62,63,64], | ||
"13": [65,66,67,68, 69], | ||
"14": [70,71,72,73,74], | ||
"15": [75,76,77,78,79] | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will ask you to make a more neutral example file, e.g.
{
"example.com" {
"affinities": {
"0": [0, 1, 2, 3, 4],
"1": [5, 6, 7, 8, 9],
"2": [10, 11, 12, 13, 14],
"3": [15, 16, 17, 18, 19]
}
},
"myserver.com": {
"affinities": {
"0": [0, 1, 2, 3],
"1": [4, 5, 6, 7]
}
}
}
monal03.doc.ic.ac.uk | ||
monal04.doc.ic.ac.uk | ||
monal05.doc.ic.ac.uk | ||
monal06.doc.ic.ac.uk | ||
lory.doc.ic.ac.uk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should stay empty.
taskset_daemon.sh
Outdated
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" | ||
script=python $DIR/gpu_monitor.py -v -t | ||
|
||
cmd="watch -n 300 " | ||
$cmd "$script" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not include this script into the repository. If you want, you can add a section at the end of the README, e.g. Periodically Setting CPU Affinities
and put those 4 lines there, maybe explain a little bit.
modified hyper-threading cores to monal05 & 06 affinity.
Added an option to retrospective set CPU affinities for each process running on GPUs. The CPU affinities associated with each GPU is defined in the file called
cpu_affinities.json
. The CPU affinity is only set if a flag-t
or--taskset
is used and is specified incpu_affinities.json
.