-
Notifications
You must be signed in to change notification settings - Fork 0
/
DesignResAna.txt
executable file
·603 lines (508 loc) · 17.7 KB
/
DesignResAna.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
Suyi Liu
Project3 design
Program overview:
The program aims to make the acquiring
and manipulating data in a repository
to be more efficient than the previous
sorted or unsorted single line lists.
I extract some of elements according to
a certain percentage to construct upper
levels for several times, so everytime I
am looking for an element, I can skip
some of the elements that are definitely
not belonging to the parts that I am
looking for. Every level of the super
list must be sorted.
The entire program consists of three
parts: driver program, superlist program
and random generator. Superlist and
random generator both contain .h file
that list all the methods to use in
corresponding .c files.
sl_repository: It builds up a super linked
list with the number of elements of a
higher level always in a certain percentage
of the number of elements of the level
below. The overall data structure are
sorted linked lists. You can always find
elements that exist in higher levels in
levels lower than that level. It can
perform update, delete, get and print in
the super list.
In repository, each element consists of
a key, a pointer to data, a pointer to
its next element and a pointer to its lower
level element with same key. Each element
also have an int pointer, pointing to
its repetiton in the list, which means
in how many layers it exists. Everytime a
new element is constructed, r is 1.
I also initialize an int representing size
of my array which might be created in insert
function, called arraysize. This variable
also represents how many levels I have in
my super list, because whenever I have to
insert a new element, I store exactly one
element pointer from one level into the
array of pointers.
I also create a dummy head before the head
of the highest level of super list. When
a new level is constructed, a new dummy
head is also constructed at the beginning
of the list, so every level has a dummy
head, which is good for traversing,
especially when a new element is inserted
before the first real element of a level.
I created an int prob representing the
percentage of elements in the lower level
that are also in the higher level. It is
used when judging if a new level should
be constructed.
I have an int named size counting how many
elements are in the bottom level so far.
This is used to represent number of unique
records in the super list.
I have a maximun variable representing
max level that can be created. I calculate
in this project, if p is 75 and max range
is 10000, the number of levels is about
30, which is the highest possible so far.
I set this up to prevent from infinite
constructing if probability = 100.
There's a counter that increases itself
by 1 whenever a pointer is pointing to
a different element. It is used to show
how much effort taken so far.
driver: driver program takes command line
inputs, but it will also work if any
command line input is missing, using
default values. It also prints the list
according to different parameters.
The program exits after completeing the
operations.
I set range, number of operations, seed
and probability to be global variables
and inserted a usage method which reads
user's input and store those values into
parameter addresses.
Function overview:
void_repository_init:
Initializes a repository. set size to
zero, and set step counter to zero.
I set number of levels to 1 by default.
I also malloc memory for static pointer
to dummy head.
int repository_update:
This method is separated into two parts:
first search to see if the element exists,
second insert if the element doesn't
exist.
Searching:
In the first layer, I traverse the list
till its bottom level. In the second layer,
I traverse it horizontally.
I initialize a temp pointer
pointing to dummy head. It goes to its
next first, if its next is smaller than
parameter key, it keeps going to its next.
I increase the counter.
If its next element equals parameter key,
its next element updates the data stored.
I do this by changing the value in the
data pointer. I only have to change once,
since all pointers are pointing to the
data in the same address. Then I return 0.
If its next element is larger than
parameter key or doesn't exist(which means
temp has reached the end of the list), it
goes down, and the address of turining
point is stored in the array of pointers.
I increase the counter by one as well.
To prevent temp from going down twice,
I write go down instruction outside
second layer and write break inside
every condition.
In the end, if temp element does not have
its next element, or its next element is
larger than the key, the key doesn't exist.
If in the bottom level, temp's next is
null or temp's next is larger than the
key, I store temp into the array.
Inserting:
If the element doesn't exist, I insert
it into the super list. I insert a new
element after some of the elements that
are stored in the array(where they are
supposed to be).
I must insert it into the bottom level.
So I used do while loops to make sure it
is executed at least once.
As for whether I should insert it in the
upper levels, I call rand100, if it is
less than the p, I insert it, if it starts
to be larger than the p, I stop inserting
it to higher levels. This makes roughly
probability/100 elements are inserted to
a new level every 100 insertions.
When something went wrong in malloc, I
return -1 reporting problem.
When inserting, I make the new
element point to array element's next.
Then I make my array element point to
the new element. I also make new element
point to the last element I inserted.
(as stored last time as telement)
The data stored in these new elements
should be the same, so I just make them
point to same address storing the same
data(except for the bottom element,
where I malloc memory for the data
integer and initialize data stored).
I increase the number stored in r pointer
by 1, showing it is repeating.
Also when the element is inserted to every
current levels, I still keep judging to
see if a new level will be
constructed above the currently highest
level. If there is a need to construct a
new level, I make a new dummy head which
is the same as dummy head, named nelement.
I also malloc memory for it. I make
it point to element being inserted, then
I set its down to the old dummyhead, and
then I move dummy head pointer to point
to this new dummyhead. Also pelement's
data is syncronized as telement's, and
its repetition is increased as well.
I make pelement point to its lower level
self as well, by using the telement
pointer. Everytime a new level is built,
I increase arraysize variable.
Finally I increase size variable by 1,
indicating a new unique record is added,
and return 1 after the process is
completed.
int repository_delete(int key):
I traverse the list to its bottom
level. I initialize a temp pointer
pointing to dummy head. It goes to its
next first, if its next is smaller than
parameter key, it keeps going to next.
And I increase the counter.
If its next element equals parameter key,
I delete its next element and then go
down. Specifically, I store element
pointer pelement as the item to be
deleted. Then I set temp pointer's next
element to be pelement's next. Then I
free pelement's memory and set pelement
to NULL. Also under this case, I check
if there is no real element in this line
now. If now, I free this line by making
temph pointer pointing to useless dummy
head. Then I make temp pointer pointing
to its down. After that I check if temph
is freed, if not, I free it. I have to
check it here since temp is also going
down in the case that temp doesn't have
next element or have larger next
element. When a level is freed, I
decrease number of levels by 1.
And I increase the counter after each
step that a pointer is pointing to
another element in the list.
If its next element is larger than
parameter key or doesn't exist, I keep
going down without deleting anything.
I increase the counter as well.
If in the end, temp element does
not have its next element, or its next
element is larger than the key, the key
doesn't exist.
To prevent from the method stopping to
perform, I initialize an int result,
It is set to 0 by default.
setting its value without returning it.
I return this int in the end of method.
If an element is deleted, I set result
to 1, showing deletion is successful.
If such element doesn't exist, result
remains 0 and is returned.
int repository_get(int key, int *data):
I also traverse through the list. In
the first layer, I check if temp equals
to NULL. If it is null, it means temp
has hit below the bottom of super list.
If the temp's next element is larger
than the key, I go down, increase the
counter. If temp's next element is
null, which means it is the end of the
list horizontally, I go down as well,
increase the counter.
To prevent temp from going down twice,
I write go down instruction outside
second layer and write break inside
every condition.
If temp's next
element equals key, I store the value
of the data address into the parameter
and return 1. If the temp's next is
smaller than the key, I continue to
traverse horizontally, increasing the
counter. Finally, if no 1 was returned,
I return 0, indicating the expected
element was not found.
int repository_print(int print_elements):
I print the size first, representing
number of unique records in the list.
I set up an array that can store the
number of elements of each level.
I set them to be 0 at first.
I make temp pointer pointing to
dummyhead of top level first.
I make temp1 pointer pointing to
temp every time temp jumps down.
I use temp1 to count how many elements
there are in the level.
In the end, I check if temp has down,
if it doesn't have down, I won't move
it, since I have to use the dummyhead
of bottom level later.
Then I print number of elements in
each level. Note that level 1
represents top level, and as level
number increases, it means lower level.
Now I traverse through the bottom level.
I have one pointer, temp which points to
the real listhead of bottom level. If
print_elements equals 2, I print the
key and data in each level, If print_
elements equals 1, I only print the
bottom level.
In order to make the printing message
look nice, I print the entire list
vertically. The leftmost vertical line
represents the bottom level, the right
lines show the upper levels. How many
repetitions of the element are in the
super list, how many times I will print
it in the line. In this case, I only
have to traverse through the bottom
level.
I print counter in the end, showing
how much effort I have spent so far.
Result:
Task 2.
1.d:
-r 1000 -o 10000000:
seed1:
steps = 206605469 size = 493
seed2:
steps = 206472978 size = 502
seed3:
steps = 206465836 size = 486
seed4:
steps = 206428106 size = 493
seed5:
steps = 206643019 size = 487
average:
steps = 206523081 size = 492
1.e:
-r 1000 -o 100000000:
seed1:
steps = 2073137774 size = 489
seed2:
steps = 2074103769 size = 498
seed3:
steps = 2076094686 size = 506
seed4:
steps = 2073372465 size = 506
seed5:
steps = 2074139463 size = 482
average:
steps = 2074169631 size = 496
1.f:
-r 10000 -o 20000000:
seed1:
steps = 562645037 size = 5004
seed2:
steps = 562845070 size = 4878
seed3:
steps = 563599848 size = 4985
seed4:
steps = 561628711 size = 5052
seed5:
steps = 557961915 size = 5057
average:
steps = 561736116 size = 4995
Analysis for task 2:
probability of update: 25%
probability of delete: 25%
probability of get: 50%
suppose the range of operation is N,
then the average size of the list is
N/2 = n.
During update, chance of the element that is
already in the list is 50%.
chance of element that is not in the list is 50%.
Note that everytime we want to find an element,
we traverse the list level by level, one element
per level. So we have to calculate how many levels
there are in the superlist. And also while going
down, temp is also going from left to right.
I make counter increase by one when a pointer
in the array points to the list element as well.
So the sum of going down and going right and array
pointers shows steps performed.
If the element is in the list, we have to traverse
down log(100/probability)n levels on average.
Since the element of lower level is always
(probability/100) times the number of elements
in the higher level. And the highest level
approximately has 1 element.
Because a huge portion of elements are in the
bottom level, if not, still a huge portion of
elements are in the first higher level or second
higher level, so no huge difference if we traverse
one or two less levels. And on average, we go
right log(100/probability)n * (100/probability - 1)
steps, if the elements
are distributed evenly, since I select n/(100/prob)
elements among 100/prob blocks first, then I select
[n/(100/prob)]/(100/prob) elements, etc.
But everytime I select my range from these blocks,
I have to traverse 100/probability -1
elements in that block of (100/prob) sub blocks.
How many levels of lists I have, how many array
of pointers I use, so I add log(100/probability)n
as well.
For every insert, I increase counter by two
at least to insert my element in the bottom level.
So in total, the counter increases
(100/probability+1)*log(100/probability)n+2 times
per update operation if the element is in the list.
(base = 100/probability)
If it is not in the list, we still have to traverse
till the right end of bottom level to make sure the
element is not in the bottom level of super list.
That's also (100/probability)*log(100/probability)n
steps per update operation.
During deletion, chance of the element that is
already in the list is 50%.
chance of element not in the list is 50%.
Very similar to update:
If it is in the list, we have to traverse
log(100/probability)n levels to go down and
log(100/probability)n * (100/probability-1)
elements on average to go right as well.
Note I increase my counter twice when I'm
deleting an element, sice pelement points
to temp and temp's next points to somewhere
else.
So in total, the counter increases
(100/probability+1)*log(100/probability)n times per
update operation if the element is in the list.
(base = 100/probability)
If it is not in the list, we have to traverse
log(100/probability)n levels to go down,
and (100/probability-1)*log(100/probability)n
elements to go right at the same time to make
sure the element is not in the bottom level of
the super list.
So (100/probability)*log(100/probability)n steps
per delete operation if element is not in the
list.
It is the same for get method:
Chance of the element that is
already in the list is 50%.
chance of element that is not in the list is 50%.
If it is in the list, we have to traverse
log(100/probability)n levels to go down and
(100/probability-1)*log(100/probability)n
elements on average to go right as well.
If it is not in the list, we have to traverse
log(100/probability)n levels to go down,
and (100/probability-1)*log(100/probability)n
elements to go right at the same time to make
sure the element is not
in the bottom level of the super list.
So (100/probability)*log(100/probability)n steps
per get operation.
So as a result, we have to traverse
[1/2*((100/prob+1)*log(100/prob)n+2) +
1/2*(100/prob)*log(100/prob)n]*1/4
+[1/2*(100/prob+1)*log(100/prob)n +
1/2*(100/prob)*log(100/prob)n]*1/4
+[1/2*(100/prob)*log(100/prob)n +
1/2*(100/prob)*log(100/prob)n]*1/2
= (100/prob) * [log(100/prob) of n]+1/4
= (100/prob+1/4) * log(100/prob)(N/2)
(prob is probability for short)
So every operation, there are
(100/prob+0.25)*log(100/prob)(N/2)
steps, where N = Max_range
In task 2, all prob is 50, which means base is
(100/prob) = 100/50 = 2
In 1.d, Max_range is 1000, so it takes
(2+1/4)*log2(1000/2)+1/4=2.25*9+0.25
=20.5 steps per operation.
And num_op is 10000000 here,
So total steps=20.5*10000000 = 205000000,
the average which is 206523081 matches my
expectation here.
The average size is 492, approximately 500
which is N/2.
In 1.e, Max_range is 1000, so it takes
(2+1/4)*log2(1000/2)+1/4=2.25*9+0.25
=20.25 steps per operation.
And num_op is 100000000 here,
So total steps=20.5*10000000=2050000000,
the average which is 2074169631 matches my
expectation here.
The average size is 496, approximately 500
which is N/2.
In 1.d, Max_range is 10000, so it takes
(2+1/4)*log2(10000/2)=2.25*12+0.25
=28.5steps per operation.
And num_op is 20000000 here,
So total steps = 28.5*20000000 = 570000000,
the average which is 561736116 matches my
expectation here.
The average size is 4995, approximately
5000 which is N/2.
The real results are slightly larger than my
expectation because I didn't take counter in
inserting elements to newer levels and building
new levels above into account.
Task 3.
1.d:
-p 25:
seed1:
steps = 176297940 size = 493
seed2:
steps = 176247053 size = 502
seed3:
steps = 176217410 size = 486
seed4:
steps = 176435095 size = 493
seed5:
steps = 176520881 size = 487
average:
steps = 176343675 size = 492
1.d:
-p 75:
seed1:
steps = 375650873 size = 493
seed2:
steps = 366920569 size = 502
seed3:
steps = 368134530 size = 486
seed4:
steps = 369445383 size = 493
seed5:
steps = 366926252 size = 487
average:
steps = 369415521 size = 492
Analysis is same as analysis for
task 2.