-
Notifications
You must be signed in to change notification settings - Fork 1
/
13.1rmoverview.html
140 lines (87 loc) · 8.6 KB
/
13.1rmoverview.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta name="generator" content="HTML Tidy for Linux (vers 25 March 2009), see www.w3.org">
<title></title>
</head>
<body>
<div class="sright">
<div class="sub-content-head">
Maui Scheduler
</div>
<div id="sub-content-rpt" class="sub-content-rpt">
<div class="tab-container docs" id="tab-container">
<div class="topNav">
<div class="docsSearch"></div>
<div class="navIcons topIcons">
<a href="index.html"><img src="home.png" title="Home" alt="Home" border="0"></a> <a href="13.0rmandinterfaces.html"><img src="upArrow.png" title="Up" alt="Up" border="0"></a> <a href="13.0rmandinterfaces.html"><img src="prevArrow.png" title="Previous" alt="Previous" border="0"></a> <a href="13.2rmconfiguration.html"><img src="nextArrow.png" title="Next" alt="Next" border="0"></a>
</div>
<h1>13.1 Resource Manager Overview</h1>
<p>Maui requires the services of a resource manager in order to properly function. This resource manager provides information about the state of compute resources (nodes) and workload (jobs). Maui also depends on the resource manager to manage jobs, instructing it when to start and/or cancel jobs.</p>
<p>Maui can be configured to manage one or more resource managers simultaneously, even resource managers of different types. However, migration of jobs from one resource manager to another is not currently allowed. Or, in other words, jobs submitted onto one resource manager cannot run on the resources of another.</p>
<ul>
<li>
<a href="13.1rmoverview.html#scheduler">13.1.1 Scheduler/Resource Manager Interactions</a>
<ul>
<li><a href="13.1rmoverview.html#commands">13.1.1.1 Resource Manager Commands</a></li>
<li><a href="13.1rmoverview.html#flow">13.1.1.2 Resource Manager Flow</a></li>
</ul>
</li>
<li><a href="13.1rmoverview.html#details">13.1.2 Resource Manager Specific Details (Limitations/Special Features)</a></li>
</ul>
<h2><a name="scheduler" id="scheduler"></a>13.1.1 Scheduler/Resource Manager Interactions</h2>Maui interacts with all resource managers in the same basic format. Interfaces are created to translate Maui concepts regarding workload and resources into native resource manager objects, attributes, and commands.
<p>Information on creation a new scheduler resource manager interface can be found in the <a href="13.4addingrminterfaces.html">Adding New Resource Manager Interfaces</a> section.</p>
<h3><a name="commands" id="commands"></a>13.1.1.1 Resource Manager Commands</h3>
<p>In the simplest configuration, Maui interacts with the resource manager using the four primary functions listed below:</p>
<p><b>GETJOBINFO</b></p>
<p>Collect detailed state and requirement information about idle, running, and recently completed jobs.</p>
<p><b>GETNODEINFO</b></p>
<p>Collect detailed state information about idle, busy, and defined nodes.</p>
<p><b>STARTJOB</b></p>
<p>Immediately start a specific job on a particular set of nodes.</p>
<p><b>CANCELJOB</b></p>
<p>Immediately cancel a specific job regardless of job state.</p>
<p>Using these four simple commands, Maui enables nearly its entire suite of scheduling functions. More detailed information about resource manager specific requirements and semantics for each of these commands can be found in the specific resource manager overviews. (LL, PBS, or <a href="wiki/">WIKI</a>).</p>
<p>In addition to these base commands, other commands are required to support advanced features such a dynamic job support, suspend/resume, gang scheduling, and scheduler initiated checkpoint/restart.</p>
<h3><a name="flow" id="flow"></a>13.1.1.2 Resource Manager Flow</h3>
<p>Early versions of Maui (i.e., Maui 3.0.x) interacted with resource managers in a very basic manner stepping through a serial sequence of steps each scheduling iteration. These steps are outlined below:</p>
<ol>
<li>load global resource information</li>
<li>load node specific information (optional)</li>
<li>load job information</li>
<li>load queue information (optional)</li>
<li>cancel jobs which violate policies</li>
<li>start jobs in accordance with available resources and policy constraints</li>
<li>handle user commands</li>
<li>repeat</li>
</ol>Each step would complete before the next step started. As systems continued to grow in size and complexity however, it became apparent that the serial model described above would not work. Three primary motivations drove the effort to replace the serial model with a concurrent threaded approach. These motivations were reliability, concurrency, and responsiveness.
<p><b>Reliability</b></p>
<p>A number of the resource managers Maui interfaces to were unreliable to some extent. This resulted in calls to resource management API's with exited or crashed taking the entire scheduler with them. Use of a threaded approach would cause only the calling thread to fail allowing the master scheduling thread to recover. Additionally, a number of resource manager calls would hang indefinitely, locking up the scheduler. These hangs could likewise be detected by the master scheduling thread and handled appropriately in a threaded environment.</p>
<p><b>Concurrency</b></p>
<p>As resource managers grew in size, the duration of each API global query call grew proportionally. Particularly, queries which required contact with each node individually became excessive as systems grew into the thousands of nodes. A threaded interface allowed the scheduler to concurrently issue multiple node queries resulting in much quicker aggregate RM query times.</p>
<p><b>Responsiveness</b></p>
<p>Finally, in the non-threaded serial approach, the user interface was blocked while the scheduler updated various aspects of its workload, resource, and queue state. In a threaded model, the scheduler could continue to respond to queries and other commands even while fresh resource manager state information was being loaded resulting in much shorter average response times for user commands.</p>
<p>Under the threaded interface, all resource manager information is loaded and processed while the user interface is still active. Average aggregate resource manager API query times are tracked and new RM updates are launched so that the RM query will complete before the next scheduling iteration should start. Where needed, the loading process uses a <i>pool</i> of worker threads to issue large numbers of node specific information queries concurrently to accelerate this process. The master thread continues to respond to user commands until all needed resource manager information is loaded <b>and</b> either a scheduling relevant <i>event</i> has occurred <b>or</b> the scheduling iteration time has arrived. At this point, the updated information is integrated into Maui's state information and scheduling is performed.</p>
<h2><a name="details" id="details"></a>13.1.2Resource Manager Specific Details (Limitations/Special Features)</h2>
<p>(Under Construction)</p>
<p>LL/LL2<br>
PBS<br>
<a href="wikiinterface.html">Wiki</a></p>
<p>Synchronizing Conflicting Information<br>
Maui does not trust resource manager. All node and job information is reloaded<br>
on each iteration. Discrepancies are logged and handled where possible.<br>
NodeSyncDeadline/JobSyncDeadline overview.</p>
<p>Purging Stale Information</p>
<p>Thread</p>
<p><b>See Also:</b></p>
<p><a href="13.2rmconfiguration.html">Resource Manager Configuration</a>, <a href="13.3rmextensions.html">Resource Manager Extensions</a></p>
<div class="navIcons bottomIcons">
<a href="index.html"><img src="home.png" title="Home" alt="Home" border="0"></a> <a href="13.0rmandinterfaces.html"><img src="upArrow.png" title="Up" alt="Up" border="0"></a> <a href="13.0rmandinterfaces.html"><img src="prevArrow.png" title="Previous" alt="Previous" border="0"></a> <a href="13.2rmconfiguration.html"><img src="nextArrow.png" title="Next" alt="Next" border="0"></a>
</div>
</div>
</div>
</div>
<div class="sub-content-btm"></div>
</div>
</body>
</html>