-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsearch.xml
451 lines (216 loc) · 144 KB
/
search.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
<?xml version="1.0" encoding="utf-8"?>
<search>
<entry>
<title>Redis网络模型_1_用户空间与内核空间 TODO</title>
<link href="/Redis%E6%BA%90%E7%A0%81/%E7%BD%91%E7%BB%9C%E6%A8%A1%E5%9E%8B_1_%E7%94%A8%E6%88%B7%E7%A9%BA%E9%97%B4%E5%92%8C%E5%86%85%E6%A0%B8%E7%A9%BA%E9%97%B4/"/>
<url>/Redis%E6%BA%90%E7%A0%81/%E7%BD%91%E7%BB%9C%E6%A8%A1%E5%9E%8B_1_%E7%94%A8%E6%88%B7%E7%A9%BA%E9%97%B4%E5%92%8C%E5%86%85%E6%A0%B8%E7%A9%BA%E9%97%B4/</url>
<content type="html"><![CDATA[<h4 id="用户空间和内核空间详解"><a href="#用户空间和内核空间详解" class="headerlink" title="用户空间和内核空间详解"></a>用户空间和内核空间详解</h4><p>[toc]</p>]]></content>
<categories>
<category> Redis </category>
</categories>
<tags>
<tag> Redis </tag>
</tags>
</entry>
<entry>
<title>Redis数据类型_5_Hash</title>
<link href="/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_5_Hash/"/>
<url>/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_5_Hash/</url>
<content type="html"><![CDATA[<h4 id="Hash详解"><a href="#Hash详解" class="headerlink" title="Hash详解"></a>Hash详解</h4><p>[toc]</p><p>Hash的特点和ZSet比较接近:</p><ul><li>键值对存储</li><li>根据键获得值</li><li>键唯一</li></ul><p>区别</p><ul><li>ZSet的值score必须是数字,因为要用score进行排序</li><li>ZSet要排序,而Hash不需要</li></ul><p>因此,Hash底层只需要去掉ZSet中负责排序的SkipList即可。</p><p>以下基于Redis 6:</p><p>Hash结构默认采用ZipList编码,以节约内存。同ZSet,ZipList的两个相邻Entry分别保存field和value</p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_5_Hash/image-20240922215227979.png" alt="image-20240922215227979"></p><p>同样,当数据量较大时,会转为HT编码,也就是Dict</p><ul><li>ZipList元素数量超过了hash-max-ziplist-entries(默认值:512)</li><li>ZipList中任意Entry大小超过hash-max-ziplist-value(默认值:64字节)</li></ul><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_5_Hash/image-20240922220350412.png" alt="image-20240922220350412"></p><h5 id="源码分析"><a href="#源码分析" class="headerlink" title="源码分析"></a>源码分析</h5><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// t_hash.c</span></span><br><span class="line"><span class="type">void</span> <span class="title function_">hsetCommand</span><span class="params">(client *c)</span> {<span class="comment">// 执行hset命令的函数</span></span><br><span class="line"> <span class="type">int</span> i, created = <span class="number">0</span>;</span><br><span class="line"> robj *o;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> ((c->argc % <span class="number">2</span>) == <span class="number">1</span>) {<span class="comment">// 检查参数个数是否正确</span></span><br><span class="line"> addReplyErrorFormat(c,<span class="string">"wrong number of arguments for '%s' command"</span>,c->cmd->name);</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> }</span><br><span class="line"><span class="comment">// 判断hash的key是否存在,如果不存在创建一个新的,默认用ZipList</span></span><br><span class="line"> <span class="keyword">if</span> ((o = hashTypeLookupWriteOrCreate(c,c->argv[<span class="number">1</span>])) == <span class="literal">NULL</span>) <span class="keyword">return</span>;<span class="comment">// 该函数见下一段代码</span></span><br><span class="line"> hashTypeTryConversion(o,c->argv,<span class="number">2</span>,c->argc<span class="number">-1</span>);<span class="comment">// 判断是否需要转为HT编码,代码见最后一段</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">2</span>; i < c->argc; i += <span class="number">2</span>)<span class="comment">// 循环遍历每一个键值对,进行set操作</span></span><br><span class="line"> <span class="comment">// HashTypeSet会检查ZipList中元素数目是否达到上限</span></span><br><span class="line"> created += !hashTypeSet(o,c->argv[i]->ptr,c->argv[i+<span class="number">1</span>]->ptr,HASH_SET_COPY);</span><br><span class="line"><span class="comment">// 后面省略 ...</span></span><br><span class="line">}</span><br></pre></td></tr></table></figure><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// t_hash.c</span></span><br><span class="line">robj *<span class="title function_">hashTypeLookupWriteOrCreate</span><span class="params">(client *c, robj *key)</span> {</span><br><span class="line"> robj *o = lookupKeyWrite(c->db,key);<span class="comment">// 查找hash的key是否存在</span></span><br><span class="line"> <span class="keyword">if</span> (o == <span class="literal">NULL</span>) {<span class="comment">// key不存在,创建一个ZipList</span></span><br><span class="line"> o = createHashObject();<span class="comment">// 该函数见下一段代码</span></span><br><span class="line"> dbAdd(c->db,key,o);</span><br><span class="line"> } <span class="keyword">else</span> {<span class="comment">// key存在</span></span><br><span class="line"> <span class="keyword">if</span> (o->type != OBJ_HASH) {</span><br><span class="line"> addReply(c,shared.wrongtypeerr);</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">NULL</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> o;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// object.c</span></span><br><span class="line">robj *<span class="title function_">createHashObject</span><span class="params">(<span class="type">void</span>)</span> {</span><br><span class="line"> <span class="type">unsigned</span> <span class="type">char</span> *zl = ziplistNew();<span class="comment">// 申请了一个ZipList</span></span><br><span class="line"> robj *o = createObject(OBJ_HASH, zl);</span><br><span class="line"> o->encoding = OBJ_ENCODING_ZIPLIST;<span class="comment">// 设置编码为ZipList</span></span><br><span class="line"> <span class="keyword">return</span> o;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// t_hash.c </span></span><br><span class="line"><span class="type">void</span> <span class="title function_">hashTypeTryConversion</span><span class="params">(robj *o, robj **argv, <span class="type">int</span> start, <span class="type">int</span> end)</span> {</span><br><span class="line"> <span class="type">int</span> i;</span><br><span class="line"> <span class="type">size_t</span> sum = <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (o->encoding != OBJ_ENCODING_ZIPLIST) <span class="keyword">return</span>;<span class="comment">// 原本不是ZipList编码,直接返回</span></span><br><span class="line"><span class="comment">// 遍历key,value</span></span><br><span class="line"> <span class="keyword">for</span> (i = start; i <= end; i++) {</span><br><span class="line"> <span class="keyword">if</span> (!sdsEncodedObject(argv[i]))</span><br><span class="line"> <span class="keyword">continue</span>;</span><br><span class="line"> <span class="type">size_t</span> len = sdslen(argv[i]->ptr);</span><br><span class="line"> <span class="keyword">if</span> (len > server.hash_max_ziplist_value) {<span class="comment">// 单个Entry大小达到上限,转为HT编码</span></span><br><span class="line"> hashTypeConvert(o, OBJ_ENCODING_HT);<span class="comment">// 检查数目是否到达上限,是在逐个插入的hashTypeSet函数中</span></span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> }</span><br><span class="line"> sum += len;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (!ziplistSafeToAdd(o->ptr, sum))<span class="comment">// 如果ZipList总大小过大(默认1G),也转为HT编码</span></span><br><span class="line"> hashTypeConvert(o, OBJ_ENCODING_HT);</span><br><span class="line">}</span><br></pre></td></tr></table></figure>]]></content>
<categories>
<category> Redis </category>
</categories>
<tags>
<tag> Redis </tag>
</tags>
</entry>
<entry>
<title>Redis数据类型_4_ZSet</title>
<link href="/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_4_ZSet/"/>
<url>/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_4_ZSet/</url>
<content type="html"><![CDATA[<h4 id="ZSet详解"><a href="#ZSet详解" class="headerlink" title="ZSet详解"></a>ZSet详解</h4><p>[toc]</p><p>ZSet - Sorted Set,每一个元素必须指定score值和member值,集合内的元素实际值为member,并按照score值排序。</p><ul><li>member必须唯一(相当于key)</li><li>按照score值排序</li><li>可以根据member查询score</li></ul><p>可以看出,ZSet的特点是键值对存储(member - score对)、键唯一、可排序,因此采用了Dict(HT)编码 + SkipList的结构</p><ul><li>SkipList跳表满足键值存储和可排序,但实现满足键唯一、以及根据member查询score困难</li><li>Dict(HT)是键值对存储、满足键唯一,但不可排序。</li></ul><blockquote><p>由于编码只能写一个,Redis中ZSet的encoding是SkipList</p></blockquote><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// server.h</span></span><br><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">struct</span> <span class="title">zset</span> {</span></span><br><span class="line"> dict *dict;</span><br><span class="line"> zskiplist *zsl;</span><br><span class="line">} zset;</span><br></pre></td></tr></table></figure><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_4_ZSet/image-20240922145225898.png" alt="image-20240922145225898"></p><p>可以看出,ZSet是一个很消耗内存的数据类型,因此在数据量不多时,ZSet会采用ZipList(Redis6)或Listpack(Redis7)结构来节约内存。</p><ul><li><p>需要元素数量小于zset_max_ziplist_entries且每个元素都小于zset_max_ziplist_value (Redis6, 7类似)</p></li><li><p>当不满足条件时会自动进行编码转换</p></li></ul><p>ZipList和Listpack本身没有排序功能,且没有键值对的概念,因此需要其他代码辅助实现功能。</p><ul><li>ZipList是连续内存,因此element和score值存储为两个相邻的entry,element在前,score在后。</li><li>score越小越接近队首,按照score值升序排列。</li></ul><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_4_ZSet/image-20240922150726316.png" alt="image-20240922150726316"></p>]]></content>
<categories>
<category> Redis </category>
</categories>
<tags>
<tag> Redis </tag>
</tags>
</entry>
<entry>
<title>Redis数据类型_3_Set</title>
<link href="/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_3_Set/"/>
<url>/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_3_Set/</url>
<content type="html"><![CDATA[<h4 id="Set详解"><a href="#Set详解" class="headerlink" title="Set详解"></a>Set详解</h4><p>[toc]</p><p>Set是Redis中的单列集合,具有以下特点:</p><ul><li>无序</li><li>元素唯一</li><li>支持交集、差集、并集</li></ul><p>Set的底层是哈希表,也就是Dict数据结构。</p><ul><li>Dict中的key用来存储元素,value一律设为null(Redis6)<ul><li>Redis7中,若元素数量不多,使用listpack</li></ul></li><li>当存储的所有数据都是整数,且元素数量不超过set-max-intset-entries时,Set会采用intset编码,节省内存。<ul><li>当插入元素时Redis会进行检查,如果不满足条件,编码将转换为HT(listpack / intset -> HT)</li></ul></li></ul><h5 id="示例:"><a href="#示例:" class="headerlink" title="示例:"></a>示例:</h5><p>以下是基于Redis6的示例图。</p><p>空Set插入三个整数5、10、20。由于插入的都是整数,且数量较少,因此采用intset编码。</p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_3_Set/image-20240922143114798.png" alt="image-20240922143114798"></p><p>之后插入了字符串,不满足条件,编码由intset转为HT。</p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_3_Set/image-20240922143318545.png" alt="image-20240922143318545"></p><h5 id="源码"><a href="#源码" class="headerlink" title="源码"></a>源码</h5><ul><li>Redis6:</li></ul><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// t_set.c</span></span><br><span class="line">robj *<span class="title function_">setTypeCreate</span><span class="params">(sds value)</span> {</span><br><span class="line"> <span class="keyword">if</span> (isSdsRepresentableAsLongLong(value,<span class="literal">NULL</span>) == C_OK)<span class="comment">// 都是整数且满足条件,使用intset</span></span><br><span class="line"> <span class="keyword">return</span> createIntsetObject();</span><br><span class="line"> <span class="keyword">return</span> createSetObject();<span class="comment">// 否则使用HT</span></span><br><span class="line">}</span><br></pre></td></tr></table></figure><ul><li>Redis7:</li></ul><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// t_set.c</span></span><br><span class="line">robj *<span class="title function_">setTypeCreate</span><span class="params">(sds value, <span class="type">size_t</span> size_hint)</span> {</span><br><span class="line"> <span class="keyword">if</span> (isSdsRepresentableAsLongLong(value,<span class="literal">NULL</span>) == C_OK && size_hint <= server.set_max_intset_entries)</span><br><span class="line"> <span class="keyword">return</span> createIntsetObject();<span class="comment">// 都是整数且满足条件,使用intset</span></span><br><span class="line"> <span class="keyword">if</span> (size_hint <= server.set_max_listpack_entries)<span class="comment">// 否则使用listpack或dict</span></span><br><span class="line"> <span class="keyword">return</span> createSetListpackObject();</span><br><span class="line"></span><br><span class="line"> <span class="comment">/* We may oversize the set by using the hint if the hint is not accurate,</span></span><br><span class="line"><span class="comment"> * but we will assume this is acceptable to maximize performance. */</span></span><br><span class="line"> robj *o = createSetObject();</span><br><span class="line"> dictExpand(o->ptr, size_hint);</span><br><span class="line"> <span class="keyword">return</span> o;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// t_set.c</span></span><br><span class="line"><span class="type">void</span> <span class="title function_">setTypeMaybeConvert</span><span class="params">(robj *<span class="built_in">set</span>, <span class="type">size_t</span> size_hint)</span> {</span><br><span class="line"> <span class="keyword">if</span> ((<span class="built_in">set</span>->encoding == OBJ_ENCODING_LISTPACK && size_hint > server.set_max_listpack_entries)</span><br><span class="line"> || (<span class="built_in">set</span>->encoding == OBJ_ENCODING_INTSET && size_hint > server.set_max_intset_entries))</span><br><span class="line"> {</span><br><span class="line"> setTypeConvertAndExpand(<span class="built_in">set</span>, OBJ_ENCODING_HT, size_hint, <span class="number">1</span>);</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>]]></content>
<categories>
<category> Redis </category>
</categories>
<tags>
<tag> Redis </tag>
</tags>
</entry>
<entry>
<title>Redis数据类型_2_List</title>
<link href="/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_2_List/"/>
<url>/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_2_List/</url>
<content type="html"><![CDATA[<h4 id="List详解"><a href="#List详解" class="headerlink" title="List详解"></a>List详解</h4><p>[toc]</p><p>在Redis3.2之后,它采用QuickList来实现List。</p><p>Redis7之前,QuickList底层是ZipList,7之后底层是listpack。</p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_2_List/image-20240922140827310.png" alt="image-20240922140827310"></p><p>PUSH操作底层都是由pushGenericCommand()函数来完成,只是参数不同。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// t_list.c</span></span><br><span class="line"><span class="comment">/* 实现了LPUSH/RPUSH/LPUSHX/RPUSHX. */</span></span><br><span class="line"><span class="type">void</span> <span class="title function_">pushGenericCommand</span><span class="params">(client *c, <span class="type">int</span> where, <span class="type">int</span> xx)</span> {<span class="comment">// 如果xx为true,那么仅在key存在时才Push</span></span><br><span class="line"> <span class="type">int</span> j;</span><br><span class="line"></span><br><span class="line"> robj *lobj = lookupKeyWrite(c->db, c->argv[<span class="number">1</span>]);<span class="comment">// 查找列表对象</span></span><br><span class="line"> <span class="keyword">if</span> (checkType(c,lobj,OBJ_LIST)) <span class="keyword">return</span>;<span class="comment">// 检查类型是否为List</span></span><br><span class="line"> <span class="keyword">if</span> (!lobj) {<span class="comment">// 处理不存在的列表</span></span><br><span class="line"> <span class="keyword">if</span> (xx) {<span class="comment">// 列表不存在且xx为true,返回</span></span><br><span class="line"> addReply(c, shared.czero);</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> lobj = createListListpackObject();<span class="comment">// 列表不存在且xx为false,可以插入,创建一个list,初始只有一个listpack</span></span><br><span class="line"> dbAdd(c->db,c->argv[<span class="number">1</span>],lobj);<span class="comment">// 添加元素</span></span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> listTypeTryConversionAppend(lobj,c->argv,<span class="number">2</span>,c->argc<span class="number">-1</span>,<span class="literal">NULL</span>,<span class="literal">NULL</span>);<span class="comment">// 尝试转换为更高效的格式(如果需要的话)</span></span><br><span class="line"> <span class="keyword">for</span> (j = <span class="number">2</span>; j < c->argc; j++) {</span><br><span class="line"> listTypePush(lobj,c->argv[j],where);</span><br><span class="line"> server.dirty++;<span class="comment">// 脏数据计数器,用于跟踪数据修改</span></span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> addReplyLongLong(c, listTypeLength(lobj));<span class="comment">// 更新状态和回复</span></span><br><span class="line"></span><br><span class="line"> <span class="type">char</span> *event = (where == LIST_HEAD) ? <span class="string">"lpush"</span> : <span class="string">"rpush"</span>;<span class="comment">// 发送通知</span></span><br><span class="line"> signalModifiedKey(c,c->db,c->argv[<span class="number">1</span>]);</span><br><span class="line"> notifyKeyspaceEvent(NOTIFY_LIST,event,c->argv[<span class="number">1</span>],c->db->id);</span><br><span class="line">}</span><br></pre></td></tr></table></figure>]]></content>
<categories>
<category> Redis </category>
</categories>
<tags>
<tag> Redis </tag>
</tags>
</entry>
<entry>
<title>README</title>
<link href="/README/"/>
<url>/README/</url>
<content type="html"><![CDATA[]]></content>
<tags>
<tag> Redis </tag>
</tags>
</entry>
<entry>
<title>Redis数据类型_1_String</title>
<link href="/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_1_String/"/>
<url>/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_1_String/</url>
<content type="html"><![CDATA[<h4 id="String详解"><a href="#String详解" class="headerlink" title="String详解"></a>String详解</h4><p>[toc]</p><p>String是Redis中最常用的数据类型之一,它的编码方式如下:</p><ul><li>基本编码方式为<strong>RAW</strong>,基于SDS实现,存储上限为512M</li></ul><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_1_String/image-20240921144907223.png" alt="image-20240921144907223"></p><ul><li><p>如果SDS的长度小于等于44字节,则会采用<strong>EMBSTR</strong>编码,此时Object head和SDS改为使用连续空间,申请内存时也就只需要一次内存分配,提升效率。</p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_1_String/image-20240921145029881.png" alt="image-20240921145029881"></p></li></ul><p>为什么是44字节?</p><p>答:SDS为x字节(x<=44)时,连续空间总长度为 <code>16 (RedisObject) + 1(len) + 1(alloc) + 1(flags) + x + 1('\0') = x + 20 <= 64</code>,而Redis底层内存是以$2^n$进行内存分配的,64B恰好是一个分片大小,因此不会产生内存碎片。</p><ul><li>如果存储的字符串是整数值,并且大小在LONG_MAX内,则会采用<strong>INT</strong>编码,直接将数据保存在RedisObject的ptr位置<ul><li>因为ptr刚好是8字节,此时不再需要sds部分</li></ul></li></ul><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B_1_String/image-20240921145811029.png" alt="image-20240921145811029"></p>]]></content>
<categories>
<category> Redis </category>
</categories>
<tags>
<tag> Redis </tag>
</tags>
</entry>
<entry>
<title>Redis数据结构_7_RedisObject</title>
<link href="/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_7_RedisObject/"/>
<url>/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_7_RedisObject/</url>
<content type="html"><![CDATA[<h4 id="RedisObject详解"><a href="#RedisObject详解" class="headerlink" title="RedisObject详解"></a>RedisObject详解</h4><p>[toc]</p><p>Redis中任意数据类型的键和值都会被封装成一个RedisObject,也叫作Redis对象,结构如下</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// server.h</span></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">redisObject</span> {</span></span><br><span class="line"> <span class="type">unsigned</span> type:<span class="number">4</span>;<span class="comment">// 对象类型</span></span><br><span class="line"> <span class="type">unsigned</span> encoding:<span class="number">4</span>;<span class="comment">// 编码类型</span></span><br><span class="line"> <span class="type">unsigned</span> lru:LRU_BITS; <span class="comment">// lru时间,用于内存回收</span></span><br><span class="line"> <span class="type">int</span> refcount;<span class="comment">// 引用计数器,用于判断是否可以回收</span></span><br><span class="line"> <span class="type">void</span> *ptr;<span class="comment">// 指针,指向存放实际数据的空间</span></span><br><span class="line">};</span><br></pre></td></tr></table></figure><ul><li>可以看出,一个Redis对象头为16字节</li></ul><p>type:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// server.h</span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> OBJ_STRING 0 <span class="comment">/* String object. */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> OBJ_LIST 1 <span class="comment">/* List object. */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> OBJ_SET 2 <span class="comment">/* Set object. */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> OBJ_ZSET 3 <span class="comment">/* Sorted set object. */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> OBJ_HASH 4 <span class="comment">/* Hash object. */</span></span></span><br></pre></td></tr></table></figure><p>Encoding:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// server.h</span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> OBJ_ENCODING_RAW 0 <span class="comment">/* Raw representation */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> OBJ_ENCODING_INT 1 <span class="comment">/* Encoded as integer */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> OBJ_ENCODING_HT 2 <span class="comment">/* Encoded as hash table */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> OBJ_ENCODING_ZIPMAP 3 <span class="comment">/* No longer used: old hash encoding. */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> OBJ_ENCODING_LINKEDLIST 4 <span class="comment">/* No longer used: old list encoding. */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> OBJ_ENCODING_ZIPLIST 5 <span class="comment">/* No longer used: old list/hash/zset encoding. */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> OBJ_ENCODING_INTSET 6 <span class="comment">/* Encoded as intset */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> OBJ_ENCODING_SKIPLIST 7 <span class="comment">/* Encoded as skiplist */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> OBJ_ENCODING_EMBSTR 8 <span class="comment">/* Embedded sds string encoding */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> OBJ_ENCODING_QUICKLIST 9 <span class="comment">/* Encoded as linked list of listpacks */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> OBJ_ENCODING_STREAM 10 <span class="comment">/* Encoded as a radix tree of listpacks */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> OBJ_ENCODING_LISTPACK 11 <span class="comment">/* Encoded as a listpack */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> OBJ_ENCODING_LISTPACK_EX 12 <span class="comment">/* Encoded as listpack, extended with metadata */</span></span></span><br></pre></td></tr></table></figure>]]></content>
<categories>
<category> Redis </category>
</categories>
<tags>
<tag> Redis </tag>
</tags>
</entry>
<entry>
<title>Redis数据结构_6_SkipList</title>
<link href="/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_6_SkipList/"/>
<url>/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_6_SkipList/</url>
<content type="html"><![CDATA[<h4 id="SkipList详解"><a href="#SkipList详解" class="headerlink" title="SkipList详解"></a>SkipList详解</h4><p>[toc]</p><p>SkipList是一个链表</p><ul><li>元素按照升序排列</li><li>一个节点可能包含多个级别不同的指针,它们的跨度不同<ul><li>最多允许32级指针</li></ul></li></ul><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_6_SkipList/image-20240921123854120.png" alt="image-20240921123854120"></p><h5 id="源码分析"><a href="#源码分析" class="headerlink" title="源码分析"></a>源码分析</h5><p>以下是zset中使用的skiplist源码及示例图:</p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_6_SkipList/image-20240921124750028.png" alt="image-20240921124750028"></p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">struct</span> <span class="title">zskiplist</span> {</span></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">zskiplistNode</span> *<span class="title">header</span>, *<span class="title">tail</span>;</span><span class="comment">// 头尾节点指针</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">long</span> length;<span class="comment">// 节点数量</span></span><br><span class="line"> <span class="type">int</span> level;<span class="comment">// 最大的索引层级,默认为1</span></span><br><span class="line">} zskiplist;</span><br></pre></td></tr></table></figure><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">struct</span> <span class="title">zskiplistNode</span> {</span></span><br><span class="line"> sds ele;<span class="comment">// 节点存储的值</span></span><br><span class="line"> <span class="type">double</span> score;<span class="comment">// 节点的分数,用于排序和查找</span></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">zskiplistNode</span> *<span class="title">backward</span>;</span><span class="comment">// 前一个节点的指针</span></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">zskiplistLevel</span> {</span></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">zskiplistNode</span> *<span class="title">forward</span>;</span><span class="comment">// 下一个节点指针</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">long</span> span;<span class="comment">// 索引跨度</span></span><br><span class="line"> } level[];<span class="comment">// 多级索引数组</span></span><br><span class="line">} zskiplistNode;</span><br></pre></td></tr></table></figure><h5 id="SkipList特点总结"><a href="#SkipList特点总结" class="headerlink" title="SkipList特点总结"></a>SkipList特点总结</h5><ul><li>是一个双向链表,每个节点都有ele和score值,分别用于存储数据和打分排序查找</li><li>节点按照score排序,score值相同时按照ele字典序</li><li>每个节点可能包含多个指针,层数为1到32</li><li>不同层级指针跨越的距离不同,层级越高跨度越大</li><li>增删改查效率较高,与红黑树类似,但实现简单</li></ul>]]></content>
<categories>
<category> Redis </category>
</categories>
<tags>
<tag> Redis </tag>
</tags>
</entry>
<entry>
<title>Redis数据结构_5_QuickList</title>
<link href="/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_5_QuickList/"/>
<url>/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_5_QuickList/</url>
<content type="html"><![CDATA[<h4 id="QuickList详解"><a href="#QuickList详解" class="headerlink" title="QuickList详解"></a>QuickList详解</h4><p>[toc]</p><p>ZipList虽然节省空间,但它申请的内存必须是连续的,如果内存占用过多,就会导致申请内存的效率很低。</p><ul><li>所以必须限制ZipList的长度和Entry大小</li></ul><p>假设我们要存储大量数据该怎么办?</p><ul><li>创建多个ZipList分片存储数据</li></ul><p>分片后,数据变得分散,不方便管理和查找怎么办?</p><ul><li>引入<strong>QuickList</strong>,它是一个双端列表,它的每个节点都是一个ZipList(Redis6.2之前),如下图所示<ul><li>Redis6.2之后ZipList被更换为listpack,以下叙述基于Redis6.2之前,即底层是ZipList</li></ul></li></ul><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_5_QuickList/image-20240921115903842.png" alt="image-20240921115903842"></p><p>为了限制ZipList节点的大小,Redis提供了参数list-max-ziiplist-size进行限制</p><ul><li>如果size > 0,则代表了允许存放的entry个数的最大值</li><li>如果size < 0 ,代表Ziplist的最大内存大小为$2^{1-size}kb(-1<=size <= -5)$,例如size=-1时,最大内存大小为2^2=4kb<ul><li>默认值为-2</li></ul></li></ul><p>除此之外,QuickList还提供了list-compress-depth控制节点压缩,depth = N代表QuickList的首尾各有N个节点不压缩,中间剩余节点压缩</p><ul><li>默认值为0</li></ul><p>以下是<code>list-compress-depth = 1</code>的QuickList示意</p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_5_QuickList/image-20240921122602982.png" alt="image-20240921122602982"></p><h5 id="QuickList特点总结"><a href="#QuickList特点总结" class="headerlink" title="QuickList特点总结"></a>QuickList特点总结</h5><ul><li>是一个双端列表,节点为ZipList,6.2之后换为listpack</li><li>节点采用ZipList,节省空间</li><li>控制分片大小,解决了连续空间申请的问题</li><li>中间结点可以进行压缩,进一步节省空间</li></ul><h5 id="QuickList源码分析-基于ZipList"><a href="#QuickList源码分析-基于ZipList" class="headerlink" title="QuickList源码分析(基于ZipList)"></a>QuickList源码分析(基于ZipList)</h5><p>Redis-6.0.19</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">struct</span> <span class="title">quicklist</span> {</span></span><br><span class="line"> quicklistNode *head;<span class="comment">// 头指针</span></span><br><span class="line"> quicklistNode *tail;<span class="comment">// 尾指针</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">long</span> count; <span class="comment">// 总Entry数量</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">long</span> len; <span class="comment">// 总节点ZipList数量</span></span><br><span class="line"> <span class="type">int</span> fill : QL_FILL_BITS; <span class="comment">// 控制ZipList的上限</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">int</span> compress : QL_COMP_BITS; <span class="comment">// 控制QuickList压缩</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">int</span> bookmark_count: QL_BM_BITS;<span class="comment">// 内存重分配时使用的书签数量和数组</span></span><br><span class="line"> quicklistBookmark bookmarks[];</span><br><span class="line">} quicklist;</span><br><span class="line"></span><br></pre></td></tr></table></figure><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">struct</span> <span class="title">quicklistNode</span> {</span></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">quicklistNode</span> *<span class="title">prev</span>;</span><span class="comment">// 前指针</span></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">quicklistNode</span> *<span class="title">next</span>;</span><span class="comment">// 后指针</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">char</span> *zl;<span class="comment">// 当前节点的ZipList指针</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">int</span> sz; <span class="comment">// 当前Ziplist大小,按字节计</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">int</span> count : <span class="number">16</span>; <span class="comment">// 当前Ziplist的Entry数</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">int</span> encoding : <span class="number">2</span>; <span class="comment">// 编码方式:1,ZipList ; 2,LZF压缩模式</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">int</span> container : <span class="number">2</span>; <span class="comment">// 容器类型:1,预留 ; 2,ZipList</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">int</span> recompress : <span class="number">1</span>; <span class="comment">// 是否被解压缩:1,解压状态,可能需要重新压缩</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">int</span> attempted_compress : <span class="number">1</span>;</span><br><span class="line"> <span class="type">unsigned</span> <span class="type">int</span> extra : <span class="number">10</span>; <span class="comment">// 预留字段</span></span><br><span class="line">} quicklistNode;</span><br></pre></td></tr></table></figure>]]></content>
<categories>
<category> Redis </category>
</categories>
<tags>
<tag> Redis </tag>
</tags>
</entry>
<entry>
<title>ArrayList详解</title>
<link href="/Java%E6%BA%90%E7%A0%81/ArrayList/"/>
<url>/Java%E6%BA%90%E7%A0%81/ArrayList/</url>
<content type="html"><![CDATA[<h4 id="ArrayList详解"><a href="#ArrayList详解" class="headerlink" title="ArrayList详解"></a>ArrayList详解</h4><p>[toc]</p><p>ArrayList具有以下特点:</p><ul><li>底层:动态数组,初始容量为0,第一次添加元素时才会初始化为10</li><li>时间复杂度:下标查询O(1),随机插入或删除:O(n),末尾插入O(1)</li><li>线程安全性:不安全,如何保证安全?<ul><li>在方法内使用:局部变量 -> 不存在线程安全问题</li><li>synchroniozedList();</li></ul></li><li>构造方法<ul><li>无参:创建默认大小为10的空数组</li><li>初始化容量:返回一个给定容量大小的数组</li><li>参数是Collection对象:如果是ArrayList直接赋值,否则用copyOf复制</li></ul></li><li>扩容机制:初始容量为10,每次扩容到之前的1.5倍,开辟新空间,并用copyOf将元素从旧数组复制到新数组</li></ul><h5 id="源码分析"><a href="#源码分析" class="headerlink" title="源码分析"></a>源码分析</h5><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">private</span> <span class="keyword">static</span> <span class="keyword">final</span> <span class="type">int</span> <span class="variable">DEFAULT_CAPACITY</span> <span class="operator">=</span> <span class="number">10</span>;<span class="comment">// 初始容量为10</span></span><br></pre></td></tr></table></figure><p>三种构造函数:</p><ul><li><p>无参</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="title function_">ArrayList</span><span class="params">()</span> { <span class="built_in">this</span>.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA; }</span><br><span class="line"><span class="keyword">private</span> <span class="keyword">static</span> <span class="keyword">final</span> Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};</span><br></pre></td></tr></table></figure></li><li><p>给定初始容量:返回一个该容量大小的Object[]数组</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="title function_">ArrayList</span><span class="params">(<span class="type">int</span> initialCapacity)</span> {</span><br><span class="line"> <span class="keyword">if</span> (initialCapacity > <span class="number">0</span>) {</span><br><span class="line"> <span class="built_in">this</span>.elementData = <span class="keyword">new</span> <span class="title class_">Object</span>[initialCapacity];</span><br><span class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (initialCapacity == <span class="number">0</span>) {</span><br><span class="line"> <span class="built_in">this</span>.elementData = EMPTY_ELEMENTDATA;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="keyword">throw</span> <span class="keyword">new</span> <span class="title class_">IllegalArgumentException</span>(<span class="string">"Illegal Capacity: "</span> + initialCapacity);</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure></li><li><p>参数为Collection对象:</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="title function_">ArrayList</span><span class="params">(Collection<? extends E> c)</span> {</span><br><span class="line"> Object[] a = c.toArray();</span><br><span class="line"> <span class="keyword">if</span> ((size = a.length) != <span class="number">0</span>) {</span><br><span class="line"> <span class="keyword">if</span> (c.getClass() == ArrayList.class) {<span class="comment">// 如果参数是ArrayList,直接赋值</span></span><br><span class="line"> elementData = a;</span><br><span class="line"> } <span class="keyword">else</span> {<span class="comment">// 不是ArrayList,使用copyOf进行复制</span></span><br><span class="line"> elementData = Arrays.copyOf(a, size, Object[].class);</span><br><span class="line"> }</span><br><span class="line"> } <span class="keyword">else</span> {<span class="comment">// 为空直接返回EMPTY_ELEMENTDATA</span></span><br><span class="line"> elementData = EMPTY_ELEMENTDATA;</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure></li></ul>]]></content>
<categories>
<category> Java </category>
</categories>
<tags>
<tag> ArrayList </tag>
</tags>
</entry>
<entry>
<title>AQS原理</title>
<link href="/Java%E6%BA%90%E7%A0%81/AQS/"/>
<url>/Java%E6%BA%90%E7%A0%81/AQS/</url>
<content type="html"><![CDATA[<h4 id="AQS详解"><a href="#AQS详解" class="headerlink" title="AQS详解"></a>AQS详解</h4><p>[toc]</p><p>AQS - AbstractQueuedSynchronizer,抽象队列同步器,构建锁和其他同步组件的基础框架。提供了共享锁和排它锁。</p><p>AQS常见的实现类:</p><ul><li>ReentrantLock 阻塞式锁</li><li>Semaphore 信号量</li><li>CountDownLatch 倒计时锁</li></ul><blockquote><p> 其中:ReentrantLock使用了排它锁,而Semaphore和CountDownLatch使用的是共享锁</p></blockquote><h5 id="AQS工作原理"><a href="#AQS工作原理" class="headerlink" title="AQS工作原理"></a>AQS工作原理</h5><p>在AQS中有个volatile变量state(保证可见性),0表示无锁,1表示有锁。</p><p>一个线程获取锁资源时,判断state是否为0,若是把0改为1(这里使用CAS操作保证原子性),表示持有锁,其他的线程由于抢不到锁进入阻塞队列(双向链表,有头指针和尾指针)。拿到锁的线程执行结束后,会把state改为0并唤醒队列的head元素</p><ul><li>AQS中使用CAS操作state保证原子性</li><li>既可以实现公平锁也可以实现非公平锁<ul><li>新的线程与队列中的线程共同争抢资源,是非公平锁</li><li>新的线程必须先加入队列等待,只允许队列的head元素获得锁,是公平锁</li></ul></li></ul><h5 id="AQS源码分析"><a href="#AQS源码分析" class="headerlink" title="AQS源码分析"></a>AQS源码分析</h5><p>AQS的成员变量如下:</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">private</span> <span class="keyword">transient</span> <span class="keyword">volatile</span> Node head;<span class="comment">// 阻塞队列的头,它是懒加载的</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">private</span> <span class="keyword">transient</span> <span class="keyword">volatile</span> Node tail;<span class="comment">// 阻塞队列的尾,在初始化之后,它只通过casTail修改</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">private</span> <span class="keyword">volatile</span> <span class="type">int</span> state;<span class="comment">// state变量,同步状态,代表锁是否被持有</span></span><br></pre></td></tr></table></figure><p>Node节点</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">abstract</span> <span class="keyword">static</span> <span class="keyword">class</span> <span class="title class_">Node</span> {</span><br><span class="line"> <span class="keyword">volatile</span> Node prev; <span class="comment">// initially attached via casTail</span></span><br><span class="line"> <span class="keyword">volatile</span> Node next; <span class="comment">// visibly nonnull when signallable</span></span><br><span class="line"> Thread waiter; <span class="comment">// visibly nonnull when enqueued</span></span><br><span class="line"> <span class="keyword">volatile</span> <span class="type">int</span> status; <span class="comment">// written by owner, atomic bit ops by others</span></span><br><span class="line"></span><br><span class="line"> <span class="comment">// methods for atomic operations</span></span><br><span class="line"> <span class="keyword">final</span> <span class="type">boolean</span> <span class="title function_">casPrev</span><span class="params">(Node c, Node v)</span> { <span class="comment">// for cleanQueue</span></span><br><span class="line"> <span class="keyword">return</span> U.weakCompareAndSetReference(<span class="built_in">this</span>, PREV, c, v);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">final</span> <span class="type">boolean</span> <span class="title function_">casNext</span><span class="params">(Node c, Node v)</span> { <span class="comment">// for cleanQueue</span></span><br><span class="line"> <span class="keyword">return</span> U.weakCompareAndSetReference(<span class="built_in">this</span>, NEXT, c, v);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">final</span> <span class="type">int</span> <span class="title function_">getAndUnsetStatus</span><span class="params">(<span class="type">int</span> v)</span> { <span class="comment">// for signalling</span></span><br><span class="line"> <span class="keyword">return</span> U.getAndBitwiseAndInt(<span class="built_in">this</span>, STATUS, ~v);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">final</span> <span class="keyword">void</span> <span class="title function_">setPrevRelaxed</span><span class="params">(Node p)</span> { <span class="comment">// for off-queue assignment</span></span><br><span class="line"> U.putReference(<span class="built_in">this</span>, PREV, p);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">final</span> <span class="keyword">void</span> <span class="title function_">setStatusRelaxed</span><span class="params">(<span class="type">int</span> s)</span> { <span class="comment">// for off-queue assignment</span></span><br><span class="line"> U.putInt(<span class="built_in">this</span>, STATUS, s);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">final</span> <span class="keyword">void</span> <span class="title function_">clearStatus</span><span class="params">()</span> { <span class="comment">// for reducing unneeded signals</span></span><br><span class="line"> U.putIntOpaque(<span class="built_in">this</span>, STATUS, <span class="number">0</span>);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">private</span> <span class="keyword">static</span> <span class="keyword">final</span> <span class="type">long</span> <span class="variable">STATUS</span></span><br><span class="line"> <span class="operator">=</span> U.objectFieldOffset(Node.class, <span class="string">"status"</span>);</span><br><span class="line"> <span class="keyword">private</span> <span class="keyword">static</span> <span class="keyword">final</span> <span class="type">long</span> <span class="variable">NEXT</span></span><br><span class="line"> <span class="operator">=</span> U.objectFieldOffset(Node.class, <span class="string">"next"</span>);</span><br><span class="line"> <span class="keyword">private</span> <span class="keyword">static</span> <span class="keyword">final</span> <span class="type">long</span> <span class="variable">PREV</span></span><br><span class="line"> <span class="operator">=</span> U.objectFieldOffset(Node.class, <span class="string">"prev"</span>);</span><br><span class="line">}</span><br></pre></td></tr></table></figure>]]></content>
<categories>
<category> Java </category>
</categories>
<tags>
<tag> Redis </tag>
</tags>
</entry>
<entry>
<title>JVM垃圾回收详解</title>
<link href="/Java%E6%BA%90%E7%A0%81/JVM%E5%9E%83%E5%9C%BE%E5%9B%9E%E6%94%B6/"/>
<url>/Java%E6%BA%90%E7%A0%81/JVM%E5%9E%83%E5%9C%BE%E5%9B%9E%E6%94%B6/</url>
<content type="html"><![CDATA[<h4 id="JVM垃圾回收详解"><a href="#JVM垃圾回收详解" class="headerlink" title="JVM垃圾回收详解"></a>JVM垃圾回收详解</h4><p>[toc]</p><p>首先要明确的是,JVM垃圾回收器主要工作区域是堆,目前垃圾收集器都采用了分代垃圾回收算法,因此Java堆被划分成了多个不同的区域。</p><p>本文介绍JDK1.8以后的版本:堆分为新生代、老年代,永久代已被元空间(MetaSpace)代替,且不再存在于堆上。</p><h5 id="堆的结构"><a href="#堆的结构" class="headerlink" title="堆的结构"></a>堆的结构</h5><p>事实上,不同的垃圾回收器所使用的堆模型的具体划分是不同的。但总体分为新生代和老年代,新生代分为Eden区和S(Survivor)区。</p><p>以下是传统的</p><ul><li>G1之前,大对象存放在Old区,但G1开始单独划分了Humongous区存放大对象。</li></ul><h5 id="如何判断哪些对象需要回收"><a href="#如何判断哪些对象需要回收" class="headerlink" title="如何判断哪些对象需要回收"></a>如何判断哪些对象需要回收</h5><h6 id="1-引用计数法"><a href="#1-引用计数法" class="headerlink" title="1. 引用计数法"></a>1. 引用计数法</h6><p>就是给对象增加一个引用计数器:</p><ul><li>每有一个地方引用该对象,计数器+1</li><li>每当一个引用失效,计数器-1</li><li>计数器的值为0,说明对象不被引用,可以回收</li></ul><blockquote><p>缺陷:如果有两个对象相互引用,它们的计数器永不为0,也就是说无论是否有用,它们永远都不会被回收。</p></blockquote><h6 id="2-可达性分析"><a href="#2-可达性分析" class="headerlink" title="2. 可达性分析"></a>2. 可达性分析</h6><h5 id="引用类型"><a href="#引用类型" class="headerlink" title="引用类型"></a>引用类型</h5><ul><li>强引用:</li><li>弱引用:</li><li>虚引用:</li><li>软引用:</li></ul><h5 id="垃圾收集算法"><a href="#垃圾收集算法" class="headerlink" title="垃圾收集算法"></a>垃圾收集算法</h5><h5 id="CMS收集器"><a href="#CMS收集器" class="headerlink" title="CMS收集器"></a>CMS收集器</h5><h5 id="G1收集器"><a href="#G1收集器" class="headerlink" title="G1收集器"></a>G1收集器</h5><h5 id="ZGC收集器"><a href="#ZGC收集器" class="headerlink" title="ZGC收集器"></a>ZGC收集器</h5>]]></content>
<categories>
<category> Java </category>
</categories>
<tags>
<tag> JVM </tag>
</tags>
</entry>
<entry>
<title>Redis数据结构_4_Ziplist</title>
<link href="/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_4_ZipList/"/>
<url>/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_4_ZipList/</url>
<content type="html"><![CDATA[<h4 id="ZipList详解"><a href="#ZipList详解" class="headerlink" title="ZipList详解"></a>ZipList详解</h4><p>[toc]</p><p>ZipList是一种特殊的双端链表,它是由一系列连续的特殊编码内存块组成,存储类型为字符串或整数。</p><ul><li>可以在任意一端进行压入/弹出,且时间复杂度为O(1)</li><li>采用小端字节序存储</li></ul><blockquote><p>在Redis7.0后,ZipList已经被listpack替换</p></blockquote><h5 id="ZipList结构"><a href="#ZipList结构" class="headerlink" title="ZipList结构"></a>ZipList结构</h5><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_4_ZipList/image-20240920152214396.png" alt="image-20240920152214396 "></p><div class="table-container"><table><thead><tr><th style="text-align:center">属性</th><th style="text-align:center">类型</th><th style="text-align:center">长度</th><th>用途</th></tr></thead><tbody><tr><td style="text-align:center">zlbytes</td><td style="text-align:center">uint32_t</td><td style="text-align:center">4B</td><td>记录整个列表占用的内存数</td></tr><tr><td style="text-align:center">zltail</td><td style="text-align:center">uint32_t</td><td style="text-align:center">4B</td><td>记录尾节点到列表起始地址的字节数,可以快速确定尾节点位置</td></tr><tr><td style="text-align:center">zllen</td><td style="text-align:center">uint16_t</td><td style="text-align:center">2B</td><td>记录列表的entry数量,如果超过表示的最大值,那么为65535,但实际个数需要遍历获得</td></tr><tr><td style="text-align:center">entry</td><td style="text-align:center">列表节点</td><td style="text-align:center">不确定</td><td>列表的各个节点</td></tr><tr><td style="text-align:center">zlend</td><td style="text-align:center">uint8_t</td><td style="text-align:center">1B</td><td>特殊值0xFF,标记了列表的结束</td></tr></tbody></table></div><ul><li>由于0xFF是结束标志,因此其他entry都不以0xFF开头。</li></ul><h5 id="ZipList-Entry结构"><a href="#ZipList-Entry结构" class="headerlink" title="ZipList Entry结构"></a>ZipList Entry结构</h5><p><em><prevlen> <encoding> <entry-data></em></p><ul><li><p>prevlen:前一节点的总长度,占1或5字节</p><ul><li>如果前一个节点小于254字节,用1个字节保存</li><li>大于等于254字节,用5个字节保存</li></ul></li><li><p>encoding:记录了content的数据类型(字符串或整数)以及entry-data的长度</p><ul><li><p>如果encoding以00,01,10开头,证明存储的是字符串</p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_4_ZipList/image-20240920155202514.png" alt="image-20240920155202514"></p></li><li><p>11开头存储的就是整数,此时encoding固定占用1个字节</p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_4_ZipList/image-20240920160539159.png" alt="image-20240920160539159"></p></li></ul></li><li><p>entry-data:保存实际的数据</p></li></ul><p>通过上述文字,我们可以知道zipList的遍历方式:</p><ul><li>从前向后遍历:<code>next_index = base_index + len(prevlen) + len(encoding) + len(entry-data)</code></li><li>从后向前遍历:<code>last_index = base_index - prevlen</code></li></ul><h5 id="ZipList示例"><a href="#ZipList示例" class="headerlink" title="ZipList示例"></a>ZipList示例</h5><h6 id="字符串示例"><a href="#字符串示例" class="headerlink" title="字符串示例"></a>字符串示例</h6><p>假设我们现在存储字符串”ab”和”bc”</p><p>对于字符串ab,由于没有上一个节点,因此prevlen用一个字节保存,为0x00;因为<code>len("ab") = 2B <63B</code>,所以encoding为0x02;</p><p>entry-data = 0x6162 (0x61 = 97, 0x62 = 98,是a和b的ASCII值)</p><p>因此第一个节点为0x00 02 61 62,同理可得第二个节点为0x04 02 62 63,如图所示:</p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_4_ZipList/image-20240920155631509.png" alt="image-20240920155631509"></p><p>此时,zllen = 2 = 0x02 00;</p><p>zltail = 14 = 0x0e 00 00 00;</p><p>zlbyte = 19 = 0x13 00 00 00; <em>注意:ZipList采用小端序存储</em></p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_4_ZipList/image-20240920160428614.png" alt="image-20240920160428614"></p><h6 id="整数示例"><a href="#整数示例" class="headerlink" title="整数示例"></a>整数示例</h6><p>假设我们现在存储整数2和5.</p><p>对于数字2和5,由于它们都很小,处在0~12中,因此采用整数的最后一种编码,取消entry-data部分。</p><p>第一个数字2没有前结点,prevlen = 0 = 0x 00;encoding = 1111 0011 = 0xF3,同理可得第二个节点5编码为 0x 02 f6</p><p>其他zlbytes 、zltail 、 zllen 推算同字符串部分中的推算过程,最终得到的存储结构如下:</p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_4_ZipList/image-20240920161346893.png" alt="image-20240920161346893"></p><p>可以看出,ZipList各种各样的编码都是为了尽可能节省内存空间。</p><p>也因此,它压缩了空间,使得只能进行正序或逆序遍历来访问节点,所以它不适合存储大量节点,需要对节点数量进行限制。</p><h5 id="ZipList的连锁更新问题"><a href="#ZipList的连锁更新问题" class="headerlink" title="ZipList的连锁更新问题"></a>ZipList的连锁更新问题</h5><p>前文说过ZipList的Entry有prevlen保存前一个节点的长度,且它的大小为1或5个字节。</p><ul><li>如果前一个节点长度小于254字节,那么prevlen为1个字节</li><li>如果前一个节点长度大于等于254字节,那么prevlen为5个字节</li></ul><p>假设现在有N个连续,长度在250-253个字节的entry,此时它们的prevlen都是1个字节保存,如图:</p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_4_ZipList/image-20240920163545430.png" alt="image-20240920163545430"></p><p>此时,表头插入了一个长度为254字节的entry,它的下一个节点的prevlen部分就会从1个字节变为5个字节,也使得下一个节点的长度+4,达到了254字节,因此下下个节点的prevlen也会从1字节变为5字节,依次类推,最终产生了连锁反应,导致了连续多次的空间拓展。而扩展涉及到内存的申请、分配、数据迁移等,因此涉及了用户态和内核态的切换,严重影响效率。</p><p>这就是<strong>连锁更新问题</strong>,新增和删除都可能导致连锁更新问题的产生。</p><ul><li>当然可以看出,该问题实际发生的概率很低</li></ul>]]></content>
<categories>
<category> Redis </category>
</categories>
<tags>
<tag> Redis </tag>
</tags>
</entry>
<entry>
<title>Redis数据结构_3_Dict</title>
<link href="/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_3_Dict/"/>
<url>/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_3_Dict/</url>
<content type="html"><![CDATA[<h4 id="Dict详解"><a href="#Dict详解" class="headerlink" title="Dict详解"></a>Dict详解</h4><p>[toc]</p><p>Dict:一个key-value型的数据结构。我们知道Redis就是键值对型的数据库,它正是基于Dict来实现的。</p><h5 id="Dict和Entry结构体"><a href="#Dict和Entry结构体" class="headerlink" title="Dict和Entry结构体"></a>Dict和Entry结构体</h5><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">dict</span> {</span></span><br><span class="line"> dictType *type;<span class="comment">// dict类型,内置了不同的哈希函数(不同场景使用不同函数,拓展性强)</span></span><br><span class="line"></span><br><span class="line"> dictEntry **ht_table[<span class="number">2</span>];<span class="comment">// 指向dictEntry数组,两个表一个是当前数据,另一个为空,rehash时使用</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">long</span> ht_used[<span class="number">2</span>];<span class="comment">// 哈希表当前负载</span></span><br><span class="line"></span><br><span class="line"> <span class="type">long</span> rehashidx; <span class="comment">// rehash的进度,-1表示未进行rehash</span></span><br><span class="line"></span><br><span class="line"> <span class="comment">/* Keep small vars at end for optimal (minimal) struct padding */</span></span><br><span class="line"> <span class="type">unsigned</span> pauserehash : <span class="number">15</span>; <span class="comment">// 指示rehash是否暂停,可以防止哈希不一致</span></span><br><span class="line"></span><br><span class="line"> <span class="type">unsigned</span> useStoredKeyApi : <span class="number">1</span>; <span class="comment">/* See comment of storedHashFunction above */</span></span><br><span class="line"> <span class="type">signed</span> <span class="type">char</span> ht_size_exp[<span class="number">2</span>]; <span class="comment">// 哈希表大小的指数位:例如若为4,则表大小size = 2 ^ 4 = 16</span></span><br><span class="line"> <span class="type">int16_t</span> pauseAutoResize; <span class="comment">//用于控制自动Resize</span></span><br><span class="line"> <span class="type">void</span> *metadata[];<span class="comment">// 存储额外的元数据</span></span><br><span class="line">};</span><br></pre></td></tr></table></figure><p>Entry:键值对,v是联合体中的任意一种</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">dictEntry</span> {</span></span><br><span class="line"> <span class="type">void</span> *key;</span><br><span class="line"> <span class="class"><span class="keyword">union</span> {</span></span><br><span class="line"> <span class="type">void</span> *val;</span><br><span class="line"> <span class="type">uint64_t</span> u64;</span><br><span class="line"> <span class="type">int64_t</span> s64;</span><br><span class="line"> <span class="type">double</span> d;</span><br><span class="line"> } v;</span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">dictEntry</span> *<span class="title">next</span>;</span> <span class="comment">//链表,哈希冲突时拉链法使用</span></span><br><span class="line">};</span><br></pre></td></tr></table></figure><h5 id="Dict添加规则"><a href="#Dict添加规则" class="headerlink" title="Dict添加规则"></a>Dict添加规则</h5><p>当向Dict中添加键值对时,Redis首先根据key计算出哈希值h,然后通过<code>h & sizemask</code>计算应该存储到的索引,如果发生冲突,那么采用拉链法解决冲突问题。</p><ul><li><p>此处原理:哈希表的大小都是2的指数次幂;因此在二进制中,模运算其实就是求它二进制最末尾几位的数字。</p><p>比如 110101 % 8其实就是保留最后的3位(2^3 = 8),那它就等价于 110101 & 7 ,因为7 = 000111b,也因此<code>sizemask = size -1</code></p></li></ul><h5 id="流程示例"><a href="#流程示例" class="headerlink" title="流程示例"></a>流程示例</h5><p>Dict初始大小为4,如图:</p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_3_Dict/image-20240919224326635.png" alt="image-20240919224326635"></p><p>插入第一个元素<k1,v1>,假设<code>hashcode(k1) = 1 => 1 & 3 = 1</code>,因此存储到下标1处,如图</p><ul><li>更新了used为1,连接了插入的Entry<k1,v1></li></ul><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_3_Dict/image-20240919224451053.png" alt="image-20240919224451053"></p><p>插入第一个元素<k2,v2>,假设<code>hashcode(k2) = 1 => 1 & 3 = 1</code>,因此发生哈希冲突,采用拉链法存储到下标1处,如图</p><ul><li>新元素放在链表的队首,效率高</li></ul><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_3_Dict/image-20240919224639242.png" alt="image-20240919224639242"></p><h5 id="Dict扩容"><a href="#Dict扩容" class="headerlink" title="Dict扩容"></a>Dict扩容</h5><p>Dict在每次新增键值对时都会去检查负载因子<code>loadFactor = used / size</code>,满足以下两种情况时进行扩容:</p><ul><li><p><code>loadFactor>=1</code> 且 服务器没有执行BGSAVE或BGREWRITEAOF等后台进程</p></li><li><p><code>loadFactor>5</code>且resize未被禁止</p><ul><li>resize的三种状态</li></ul><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">enum</span> {</span></span><br><span class="line"> DICT_RESIZE_ENABLE,</span><br><span class="line"> DICT_RESIZE_AVOID,</span><br><span class="line"> DICT_RESIZE_FORBID,</span><br><span class="line">} dictResizeEnable;</span><br></pre></td></tr></table></figure></li></ul><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">int</span> <span class="title function_">dictExpandIfNeeded</span><span class="params">(dict *d)</span> {</span><br><span class="line"> <span class="keyword">if</span> (dictIsRehashing(d)) <span class="keyword">return</span> DICT_OK;<span class="comment">// 已经在Rehash,直接返回</span></span><br><span class="line"> <span class="keyword">if</span> (DICTHT_SIZE(d->ht_size_exp[<span class="number">0</span>]) == <span class="number">0</span>) {<span class="comment">// 如果哈希表为空,初始化哈希表(默认大小为4)</span></span><br><span class="line"> dictExpand(d, DICT_HT_INITIAL_SIZE);</span><br><span class="line"> <span class="keyword">return</span> DICT_OK;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> ((dict_can_resize == DICT_RESIZE_ENABLE &&</span><br><span class="line"> d->ht_used[<span class="number">0</span>] >= DICTHT_SIZE(d->ht_size_exp[<span class="number">0</span>])) ||<span class="comment">// 达到了1:1并且dict_can_resize == DICT_RESIZE_ENABLE</span></span><br><span class="line"> (dict_can_resize != DICT_RESIZE_FORBID &&<span class="comment">// 并且resize未被禁止</span></span><br><span class="line"> d->ht_used[<span class="number">0</span>] >= dict_force_resize_ratio * DICTHT_SIZE(d->ht_size_exp[<span class="number">0</span>])))<span class="comment">// 达到了dict_force_resize_ratio</span></span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">if</span> (dictTypeResizeAllowed(d, d->ht_used[<span class="number">0</span>] + <span class="number">1</span>))</span><br><span class="line"> dictExpand(d, d->ht_used[<span class="number">0</span>] + <span class="number">1</span>);</span><br><span class="line"> <span class="keyword">return</span> DICT_OK;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> DICT_ERR;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><ul><li>需要注意的是,哈希表大小始终是$2^n$,因此扩容会扩到满足条件且容量是$2^n$。</li></ul><h5 id="Dict收缩"><a href="#Dict收缩" class="headerlink" title="Dict收缩"></a>Dict收缩</h5><p>Dict在删除元素时,也会检查负载因子,当<code>loadFactor<0.125</code>且允许resize时,触发哈希表收缩。或者当<code>loadFactor<1/32</code>,且resize未被禁止时收缩。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">int</span> <span class="title function_">dictShrinkIfNeeded</span><span class="params">(dict *d)</span> {</span><br><span class="line"> <span class="keyword">if</span> (dictIsRehashing(d)) <span class="keyword">return</span> DICT_OK;<span class="comment">// 已经在Rehash,直接返回</span></span><br><span class="line"> </span><br><span class="line"> <span class="keyword">if</span> (DICTHT_SIZE(d->ht_size_exp[<span class="number">0</span>]) <= DICT_HT_INITIAL_SIZE) <span class="keyword">return</span> DICT_OK;<span class="comment">// 如果已经收缩到初始大小4,不再收缩</span></span><br><span class="line"></span><br><span class="line"> <span class="comment">// 低于1:8且dict_can_resize == DICT_RESIZE_ENABLE->收缩,或者低于1:32且dict_can_resize != DICT_RESIZE_FORBID收缩 </span></span><br><span class="line"> <span class="keyword">if</span> ((dict_can_resize == DICT_RESIZE_ENABLE &&</span><br><span class="line"> d->ht_used[<span class="number">0</span>] * HASHTABLE_MIN_FILL <= DICTHT_SIZE(d->ht_size_exp[<span class="number">0</span>])) ||</span><br><span class="line"> (dict_can_resize != DICT_RESIZE_FORBID &&</span><br><span class="line"> d->ht_used[<span class="number">0</span>] * HASHTABLE_MIN_FILL * dict_force_resize_ratio <= DICTHT_SIZE(d->ht_size_exp[<span class="number">0</span>])))</span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">if</span> (dictTypeResizeAllowed(d, d->ht_used[<span class="number">0</span>]))</span><br><span class="line"> dictShrink(d, d->ht_used[<span class="number">0</span>]);</span><br><span class="line"> <span class="keyword">return</span> DICT_OK;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> DICT_ERR;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>最终Dict扩容和收缩最终都由<code>_dictResize</code>函数来完成</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">/* Resize or create the hash table,</span></span><br><span class="line"><span class="comment"> * when malloc_failed is non-NULL, it'll avoid panic if malloc fails (in which case it'll be set to 1).</span></span><br><span class="line"><span class="comment"> * Returns DICT_OK if resize was performed, and DICT_ERR if skipped. */</span></span><br><span class="line"><span class="type">int</span> _dictResize(dict *d, <span class="type">unsigned</span> <span class="type">long</span> size, <span class="type">int</span>* malloc_failed)</span><br><span class="line">{</span><br><span class="line"> <span class="keyword">if</span> (malloc_failed) *malloc_failed = <span class="number">0</span>;</span><br><span class="line"> assert(!dictIsRehashing(d));<span class="comment">// assert Rehash未在进行中</span></span><br><span class="line"></span><br><span class="line"> <span class="comment">// 新哈希表</span></span><br><span class="line"> dictEntry **new_ht_table;</span><br><span class="line"> <span class="type">unsigned</span> <span class="type">long</span> new_ht_used;</span><br><span class="line"> <span class="type">signed</span> <span class="type">char</span> new_ht_size_exp = _dictNextExp(size);<span class="comment">// 找到最接近的2的指数</span></span><br><span class="line"></span><br><span class="line"> <span class="comment">// 溢出检测</span></span><br><span class="line"> <span class="type">size_t</span> newsize = DICTHT_SIZE(new_ht_size_exp);</span><br><span class="line"> <span class="keyword">if</span> (newsize < size || newsize * <span class="keyword">sizeof</span>(dictEntry*) < newsize)</span><br><span class="line"> <span class="keyword">return</span> DICT_ERR;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Rehash前后表大小相同是无用的</span></span><br><span class="line"> <span class="keyword">if</span> (new_ht_size_exp == d->ht_size_exp[<span class="number">0</span>]) <span class="keyword">return</span> DICT_ERR;</span><br><span class="line"></span><br><span class="line"> <span class="comment">//分配新的哈希表,并初始化所有指针为NULL</span></span><br><span class="line"> <span class="keyword">if</span> (malloc_failed) {</span><br><span class="line"> new_ht_table = ztrycalloc(newsize*<span class="keyword">sizeof</span>(dictEntry*));</span><br><span class="line"> *malloc_failed = new_ht_table == <span class="literal">NULL</span>;</span><br><span class="line"> <span class="keyword">if</span> (*malloc_failed)</span><br><span class="line"> <span class="keyword">return</span> DICT_ERR;</span><br><span class="line"> } <span class="keyword">else</span></span><br><span class="line"> new_ht_table = zcalloc(newsize*<span class="keyword">sizeof</span>(dictEntry*));</span><br><span class="line"></span><br><span class="line"> new_ht_used = <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 准备第二个哈希表用于重新散列,将旧表的数据迁移到新表中,第一次初始化也使用了</span></span><br><span class="line"> d->ht_size_exp[<span class="number">1</span>] = new_ht_size_exp;</span><br><span class="line"> d->ht_used[<span class="number">1</span>] = new_ht_used;</span><br><span class="line"> d->ht_table[<span class="number">1</span>] = new_ht_table;</span><br><span class="line"> d->rehashidx = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">if</span> (d->type->rehashingStarted) d->type->rehashingStarted(d);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 如果是第一次初始化而不是真正的扩容,进行以下调整,使得它能够接受keys</span></span><br><span class="line"> <span class="keyword">if</span> (d->ht_table[<span class="number">0</span>] == <span class="literal">NULL</span> || d->ht_used[<span class="number">0</span>] == <span class="number">0</span>) {</span><br><span class="line"> <span class="keyword">if</span> (d->type->rehashingCompleted) d->type->rehashingCompleted(d);</span><br><span class="line"> <span class="keyword">if</span> (d->ht_table[<span class="number">0</span>]) zfree(d->ht_table[<span class="number">0</span>]);</span><br><span class="line"> d->ht_size_exp[<span class="number">0</span>] = new_ht_size_exp;</span><br><span class="line"> d->ht_used[<span class="number">0</span>] = new_ht_used;</span><br><span class="line"> d->ht_table[<span class="number">0</span>] = new_ht_table;</span><br><span class="line"> _dictReset(d, <span class="number">1</span>);</span><br><span class="line"> d->rehashidx = <span class="number">-1</span>;</span><br><span class="line"> <span class="keyword">return</span> DICT_OK;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> DICT_OK;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h5 id="Dict的Rehash"><a href="#Dict的Rehash" class="headerlink" title="Dict的Rehash"></a>Dict的Rehash</h5><p>示例如下:一个哈希表容量为4,并且已经装载了4个键值对,即将插入第五个,触发了扩容。</p><ul><li>未插入第五个键值对时:</li></ul><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_3_Dict/image-20240920144309297.png" alt="image-20240920144309297"></p><ul><li>插入第五个键值对后,进行了重新散列</li></ul><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_3_Dict/image-20240920144634170.png" alt="image-20240920144634170"></p><ul><li>移动散列表,调整参数并将另一个表重新置为空</li></ul><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_3_Dict/image-20240920144926708.png" alt="image-20240920144926708"></p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">/*如果还有key没移动完成返回1,否则返回0</span></span><br><span class="line"><span class="comment">*重新散列将一个桶从旧的哈希表移动到新的哈希表,由于哈希表的一部分可能是空,因此不能保证此函数至少重新散列一个桶。</span></span><br><span class="line"><span class="comment">*为保证不会出现长时间阻塞,它将最多访问n*10个空桶。</span></span><br><span class="line"><span class="comment">*/</span></span><br><span class="line"><span class="type">int</span> <span class="title function_">dictRehash</span><span class="params">(dict *d, <span class="type">int</span> n)</span> {</span><br><span class="line"> <span class="type">int</span> empty_visits = n*<span class="number">10</span>; <span class="comment">// 最大的空bucket访问数,防止阻塞时间过长</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">long</span> s0 = DICTHT_SIZE(d->ht_size_exp[<span class="number">0</span>]);</span><br><span class="line"> <span class="type">unsigned</span> <span class="type">long</span> s1 = DICTHT_SIZE(d->ht_size_exp[<span class="number">1</span>]);<span class="comment">// 获取当前和新哈希表大小,分别为s0和s1</span></span><br><span class="line"> <span class="keyword">if</span> (dict_can_resize == DICT_RESIZE_FORBID || !dictIsRehashing(d)) <span class="keyword">return</span> <span class="number">0</span>;<span class="comment">// 检查是否可以重新散列</span></span><br><span class="line"> <span class="comment">/* 乳沟dict_can_resize = DICT_RESIZE_AVOID, 要避免rehash. </span></span><br><span class="line"><span class="comment"> * - 如果扩容, threshold 为 dict_force_resize_ratio = 4.</span></span><br><span class="line"><span class="comment"> * - 如果收缩, threshold 为 1 / (HASHTABLE_MIN_FILL * dict_force_resize_ratio) = 1/32 */</span></span><br><span class="line"> <span class="keyword">if</span> (dict_can_resize == DICT_RESIZE_AVOID && </span><br><span class="line"> ((s1 > s0 && s1 < dict_force_resize_ratio * s0) ||</span><br><span class="line"> (s1 < s0 && s0 < HASHTABLE_MIN_FILL * dict_force_resize_ratio * s1)))</span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">while</span>(n-- && d->ht_used[<span class="number">0</span>] != <span class="number">0</span>) {</span><br><span class="line"> <span class="comment">// 因为ht[0].used不为0,所以能够确保rehashidx不会溢出</span></span><br><span class="line"> assert(DICTHT_SIZE(d->ht_size_exp[<span class="number">0</span>]) > (<span class="type">unsigned</span> <span class="type">long</span>)d->rehashidx);</span><br><span class="line"> <span class="keyword">while</span>(d->ht_table[<span class="number">0</span>][d->rehashidx] == <span class="literal">NULL</span>) {</span><br><span class="line"> d->rehashidx++;</span><br><span class="line"> <span class="keyword">if</span> (--empty_visits == <span class="number">0</span>) <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="comment">// 移动此bucket中的所有key到新的哈希表</span></span><br><span class="line"> rehashEntriesInBucketAtIndex(d, d->rehashidx);</span><br><span class="line"> d->rehashidx++;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> !dictCheckRehashingCompleted(d);</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>值得注意的是,Dict的rehash不是一次完成的,如果Dict含有大量元素,那么一次性完成将会造成长时间的阻塞。因此Dict的rehash是多次、分批完成的,也叫作渐进式rehash。</p><ul><li>每次增删改查时都会检查是否已经rehash完成,如果未完成则进行一部分的迁移。</li><li>在新增时,直接写入新表,而查、改、删会查询两个表,确保旧表数据只减不增,最终能完成rehash任务。</li></ul>]]></content>
<categories>
<category> Redis </category>
</categories>
<tags>
<tag> Redis </tag>
</tags>
</entry>
<entry>
<title>Redis数据结构_2_intset</title>
<link href="/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_2_Set/"/>
<url>/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_2_Set/</url>
<content type="html"><![CDATA[<h4 id="Inset详解"><a href="#Inset详解" class="headerlink" title="Inset详解"></a>Inset详解</h4><p>[toc]</p><p>Intset:顾名思义,它是整数的集合,是Redis中set的一种实现方式,基于整数数组实现,具有长度可变、有序等特征。</p><p>特点:</p><ul><li>确保元素的唯一性</li><li>具备升级机制,节省内存空间</li><li>底层数组是有序的,因此采用二分查找<ul><li>也因此,它不适合存储大量数据<ul><li>没有基于哈希的Set快</li><li>需要连续的内存空间</li></ul></li></ul></li></ul><h5 id="intset结构体"><a href="#intset结构体" class="headerlink" title="intset结构体"></a>intset结构体</h5><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">struct</span> <span class="title">intset</span> {</span></span><br><span class="line"> <span class="type">uint32_t</span> encoding;<span class="comment">// 编码方式,支持16、32、64位有符号整数(2B、4B、8B)</span></span><br><span class="line"> <span class="type">uint32_t</span> length;<span class="comment">// 元素个数</span></span><br><span class="line"> <span class="type">int8_t</span> contents[];<span class="comment">// 整数数组,保存了集合的数据</span></span><br><span class="line">} intset;</span><br></pre></td></tr></table></figure><p>为了方便查找,Redis会将intset中的所有整数按照升序依次保存在contents数组中,如下图:</p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_2_Set/image-20240919195303556.png" alt="image-20240919195303556"></p><p>此时占用的空间为:4(encoding)+4(length)+2*3(contents) = 14字节</p><p> 可以看出,寻找一个元素的地址:<code>addr = startAddr + sizeof(element) * index</code>,index为元素下标</p><h5 id="Intset升级"><a href="#Intset升级" class="headerlink" title="Intset升级"></a>Intset升级</h5><p>可以看出,当encoding为INTSET_ENC_INT16时,支持的最大整数只有2^15-1,若此时插入新数据50000,就超出最大范围,此时就触发了intset</p><p>的升级机制,它会自动升级编码至合适的大小。假设一个intset包含5、10、20三个元素,现加入元素50000。</p><p>它的升级具体流程如下:</p><ul><li><p>升级编码到<code>INTSET_ENC_INT32</code>,每个整数占四个字节,因此范围等同于int,并按照新的编码方式及元素个数扩容数组。</p></li><li><p>逆序将数组中的元素拷贝到正确的位置上。(如果正序就会发生数据覆盖)</p><ul><li>正确位置计算方式:<code>addr = startAddr + sizeof(element) * index</code>,这里新的<code>sizeof(element)</code>为4</li></ul></li></ul><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_2_Set/image-20240919200907228.png" alt="image-20240919200907228"></p><p>修改前:</p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_2_Set/image-20240919201231239.png" alt="image-20240919201231239"></p><p>修改后:</p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_2_Set/image-20240919201246295.png" alt="image-20240919201246295"></p><h5 id="源码分析"><a href="#源码分析" class="headerlink" title="源码分析"></a>源码分析</h5><h6 id="intset插入"><a href="#intset插入" class="headerlink" title="intset插入"></a>intset插入</h6><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line">intset *<span class="title function_">intsetAdd</span><span class="params">(intset *is, <span class="type">int64_t</span> value, <span class="type">uint8_t</span> *success)</span> {</span><br><span class="line"> <span class="comment">// is:插入的intset,value:要插入的元素,success:接收插入结果的指针</span></span><br><span class="line"> <span class="type">uint8_t</span> valenc = _intsetValueEncoding(value);<span class="comment">// 获取当前value的编码</span></span><br><span class="line"> <span class="type">uint32_t</span> pos;<span class="comment">// 插入的位置</span></span><br><span class="line"> <span class="keyword">if</span> (success) *success = <span class="number">1</span>;</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// 检查插入的值是否在编码范围内</span></span><br><span class="line"> <span class="keyword">if</span> (valenc > intrev32ifbe(is->encoding)) {</span><br><span class="line"> <span class="comment">//超出编码范围,intset升级</span></span><br><span class="line"> <span class="keyword">return</span> intsetUpgradeAndAdd(is,value);</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="comment">// 如果value已经存在,直接抛弃,修改success指针</span></span><br><span class="line"> <span class="keyword">if</span> (intsetSearch(is,value,&pos)) {</span><br><span class="line"> <span class="keyword">if</span> (success) *success = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">return</span> is;</span><br><span class="line"> }</span><br><span class="line"><span class="comment">//value不存在,数组扩容并插入</span></span><br><span class="line"> is = intsetResize(is,intrev32ifbe(is->length)+<span class="number">1</span>);</span><br><span class="line"> <span class="keyword">if</span> (pos < intrev32ifbe(is->length)) intsetMoveTail(is,pos,pos+<span class="number">1</span>);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> _intsetSet(is,pos,value);</span><br><span class="line"> is->length = intrev32ifbe(intrev32ifbe(is->length)+<span class="number">1</span>);</span><br><span class="line"> <span class="keyword">return</span> is;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h6 id="intset扩容"><a href="#intset扩容" class="headerlink" title="intset扩容"></a>intset扩容</h6><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">static</span> intset *<span class="title function_">intsetUpgradeAndAdd</span><span class="params">(intset *is, <span class="type">int64_t</span> value)</span> {<span class="comment">// 扩容并插入元素</span></span><br><span class="line"> <span class="type">uint8_t</span> curenc = intrev32ifbe(is->encoding);<span class="comment">// 扩容前的encoding</span></span><br><span class="line"> <span class="type">uint8_t</span> newenc = _intsetValueEncoding(value);<span class="comment">// value的编码,用于判断升级到哪种编码</span></span><br><span class="line"> <span class="type">int</span> length = intrev32ifbe(is->length);<span class="comment">// 元素个数</span></span><br><span class="line"> <span class="type">int</span> prepend = value < <span class="number">0</span> ? <span class="number">1</span> : <span class="number">0</span>;<span class="comment">// 判断新元素放在队首还是队尾</span></span><br><span class="line"></span><br><span class="line"> <span class="comment">// 更改intset的encoding</span></span><br><span class="line"> is->encoding = intrev32ifbe(newenc);</span><br><span class="line"> <span class="comment">//调整数组大小</span></span><br><span class="line"> is = intsetResize(is,intrev32ifbe(is->length)+<span class="number">1</span>);</span><br><span class="line"></span><br><span class="line"> <span class="comment">//逆序遍历,调整元素位置</span></span><br><span class="line"> <span class="keyword">while</span>(length--)</span><br><span class="line"> _intsetSet(is,length+prepend,_intsetGetEncoded(is,length,curenc));</span><br><span class="line"></span><br><span class="line"> <span class="comment">//插入新元素,由于新元素超出编码范围,因此不是插入队头就是队尾,由prepend决定</span></span><br><span class="line"> <span class="keyword">if</span> (prepend)</span><br><span class="line"> _intsetSet(is,<span class="number">0</span>,value);</span><br><span class="line"> <span class="keyword">else</span></span><br><span class="line"> _intsetSet(is,intrev32ifbe(is->length),value);</span><br><span class="line"> <span class="comment">//修改数组长度</span></span><br><span class="line"> is->length = intrev32ifbe(intrev32ifbe(is->length)+<span class="number">1</span>);</span><br><span class="line"> <span class="keyword">return</span> is;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h6 id="intset查找"><a href="#intset查找" class="headerlink" title="intset查找"></a>intset查找</h6><p>因为intset存储是有序的,因此采用了典型的二分查找。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">static</span> <span class="type">uint8_t</span> <span class="title function_">intsetSearch</span><span class="params">(intset *is, <span class="type">int64_t</span> value, <span class="type">uint32_t</span> *pos)</span> {</span><br><span class="line"> <span class="type">int</span> min = <span class="number">0</span>, max = intrev32ifbe(is->length)<span class="number">-1</span>, mid = <span class="number">-1</span>;</span><br><span class="line"> <span class="type">int64_t</span> cur = <span class="number">-1</span>;</span><br><span class="line"></span><br><span class="line"> <span class="comment">/* The value can never be found when the set is empty */</span></span><br><span class="line"> <span class="keyword">if</span> (intrev32ifbe(is->length) == <span class="number">0</span>) {</span><br><span class="line"> <span class="keyword">if</span> (pos) *pos = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="comment">/* Check for the case where we know we cannot find the value,</span></span><br><span class="line"><span class="comment"> * but do know the insert position. */</span></span><br><span class="line"> <span class="keyword">if</span> (value > _intsetGet(is,max)) {</span><br><span class="line"> <span class="keyword">if</span> (pos) *pos = intrev32ifbe(is->length);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (value < _intsetGet(is,<span class="number">0</span>)) {</span><br><span class="line"> <span class="keyword">if</span> (pos) *pos = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">while</span>(max >= min) {</span><br><span class="line"> mid = ((<span class="type">unsigned</span> <span class="type">int</span>)min + (<span class="type">unsigned</span> <span class="type">int</span>)max) >> <span class="number">1</span>;</span><br><span class="line"> cur = _intsetGet(is,mid);</span><br><span class="line"> <span class="keyword">if</span> (value > cur) {</span><br><span class="line"> min = mid+<span class="number">1</span>;</span><br><span class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (value < cur) {</span><br><span class="line"> max = mid<span class="number">-1</span>;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (value == cur) {</span><br><span class="line"> <span class="keyword">if</span> (pos) *pos = mid;</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="keyword">if</span> (pos) *pos = min;</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>]]></content>
<categories>
<category> Redis </category>
</categories>
<tags>
<tag> Redis </tag>
</tags>
</entry>
<entry>
<title>Redis数据结构_1_SDS</title>
<link href="/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_1_SDS/"/>
<url>/Redis%E6%BA%90%E7%A0%81/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_1_SDS/</url>
<content type="html"><![CDATA[<h4 id="SDS-详解"><a href="#SDS-详解" class="headerlink" title="SDS 详解"></a>SDS 详解</h4><p>[toc]</p><p>为什么Redis没有使用C语言中的字符串?</p><ul><li>获取字符串长度需要运算</li><li>非二进制安全:C语言字符串的结束标志是’\0’,如果字符串中间出现了’\0’,读取就会提前结束</li><li>不可修改:字面值存储在字符串常量池中,不可修改</li></ul><p>C语言的字符串底层都是字符数组char[],例如</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">char</span>* s = <span class="string">"hello"</span></span><br><span class="line"><span class="comment">// 底层是{'h','e','l','l','o','\0'},其中'\0'是结束标志</span></span><br></pre></td></tr></table></figure><p>因此,Redis实现了SDS(Simple Dynamic String)- 简单动态字符串。</p><h5 id="sds优点:"><a href="#sds优点:" class="headerlink" title="sds优点:"></a>sds优点:</h5><ul><li>获取字符串长度的的时间复杂度为O(1)</li><li>支持动态扩容</li><li>减少了内存分配次数</li><li>二进制安全</li></ul><h5 id="sds结构体:"><a href="#sds结构体:" class="headerlink" title="sds结构体:"></a>sds结构体:</h5><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> __<span class="title">attribute__</span> ((__<span class="title">packed__</span>)) <span class="title">sdshdr8</span> {</span><span class="comment">// sdshdrx:x字节头部的SDS结构体,各种结构体支持的长度和内存分配不同,以8为示例</span></span><br><span class="line"> <span class="type">uint8_t</span> len; <span class="comment">// buf[]数组已经保存的字符串字节数,不包含结束标志符(uint64_t: unsigned int 68bit;)</span></span><br><span class="line"> <span class="comment">// 所以sdshdr64支持的字符串最大长度为2^8-1-1个字节(因为buf中也有\0要多减1)</span></span><br><span class="line"> <span class="type">uint8_t</span> alloc; <span class="comment">// buf[]申请的总字节数,不包含结束标志符</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">char</span> flags; <span class="comment">// 不同SDS的头类型,用来控制SDS的头大小</span></span><br><span class="line"> <span class="type">char</span> buf[];<span class="comment">// 字符数组,保存了SDS的实际数据</span></span><br><span class="line">};</span><br><span class="line"><span class="comment">// flags实际取值</span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> SDS_TYPE_5 0</span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> SDS_TYPE_8 1 </span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> SDS_TYPE_16 2</span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> SDS_TYPE_32 3</span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> SDS_TYPE_64 4</span></span><br></pre></td></tr></table></figure><p>以实际字符串”name”为例,它用sds8即可存储,结构如下:</p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_1_SDS/image-20240913181310379.png" alt="image-20240913181310379"></p><ul><li>其中’\0’是为了兼容C语言字符串,也使得它可以使用C语言字符串的函数。</li></ul><h5 id="sds扩容"><a href="#sds扩容" class="headerlink" title="sds扩容"></a>sds扩容</h5><p>假设有一个初始sds,内容为”hi”,如下图</p><p><img src="../../images/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84_1_SDS/image-20240913181535065.png" alt="image-20240913181535065"></p><p>现在我们要给它追加”,Amy”,显然需要申请新的空间。检查新字符串大小:</p><ul><li>如果新字符串小于1M,新的空间为拓展后字符串的2倍+1,如图:<ul><li>具体地,”hi,Amy”长度为6,因此len为6,新空间为6*2+1=13,但这个13包含了’\0’,而alloc不包含,所以alloc为12;长度还没达到flags=1表示的最大值,因此flags仍为1.</li></ul></li></ul><p><img src="../../images/Redis/image-20240919173241500.png" alt="image-20240919173241500"></p><ul><li>如果新字符串大于1M,新的空间为拓展后字符串+1M+1,这是内存预分配<ul><li>因为分配内存需要进行用户态到内核态的状态转变,因此预分配可以减少后续再次追加字符串导致的状态切换。</li></ul></li></ul><h5 id="sds-API"><a href="#sds-API" class="headerlink" title="sds API"></a>sds API</h5><p>//TODO</p>]]></content>
<categories>
<category> Redis </category>
</categories>
<tags>
<tag> Redis </tag>
</tags>
</entry>
<entry>
<title>倒排索引的Python实现</title>
<link href="/%E5%80%92%E6%8E%92%E7%B4%A2%E5%BC%95/"/>
<url>/%E5%80%92%E6%8E%92%E7%B4%A2%E5%BC%95/</url>
<content type="html"><![CDATA[<blockquote><p>索引保存了每个词出现的文章编号以及出现的次数</p><p>未保存更详细的位置信息等</p><p>实现了通过倒排索引检索文档,不过效果上略差于BM25检索</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">性能</span><br><span class="line">top1: 4180/5352 ---0.781</span><br><span class="line">top2: 383/5352 ---0.853</span><br><span class="line">top3: 131/5352 ---0.877</span><br></pre></td></tr></table></figure></blockquote><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br><span class="line">150</span><br><span class="line">151</span><br><span class="line">152</span><br><span class="line">153</span><br><span class="line">154</span><br></pre></td><td class="code"><pre><span class="line"><span class="string">"""</span></span><br><span class="line"><span class="string">coding:UTF-8</span></span><br><span class="line"><span class="string">author:LemontreeN</span></span><br><span class="line"><span class="string">date:2022-05-08</span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"><span class="keyword">import</span> json</span><br><span class="line"><span class="keyword">import</span> math</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> tqdm</span><br><span class="line"><span class="keyword">from</span> ltp <span class="keyword">import</span> LTP</span><br><span class="line"></span><br><span class="line">ltp = LTP(path=<span class="string">"base"</span>)<span class="comment"># base模型</span></span><br><span class="line"><span class="comment"># ltp = LTP()# small小模型</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">PreProcessed</span>:</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self</span>):</span><br><span class="line"> <span class="variable language_">self</span>.stop_words = <span class="literal">None</span></span><br><span class="line"> <span class="variable language_">self</span>.cnt = <span class="number">0</span></span><br><span class="line"> <span class="variable language_">self</span>.index_path = <span class="string">'data/inverted_index.txt'</span></span><br><span class="line"> <span class="variable language_">self</span>.word_dict = {}</span><br><span class="line"> <span class="variable language_">self</span>.cnts = []</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">read_stop_words</span>(<span class="params">self, file_path: <span class="built_in">str</span></span>):</span><br><span class="line"> <span class="keyword">with</span> <span class="built_in">open</span>(file_path, <span class="string">'r'</span>, encoding=<span class="string">'utf-8'</span>) <span class="keyword">as</span> fp:</span><br><span class="line"> <span class="variable language_">self</span>.stop_words = <span class="built_in">set</span>(fp.read().split(<span class="string">'\n'</span>))</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">generate_index</span>(<span class="params">self, input_path: <span class="built_in">str</span></span>):</span><br><span class="line"> progress_read_index = tqdm.tqdm(<span class="built_in">range</span>(<span class="number">14768</span>), <span class="string">f'建立索引中,目前进度'</span>)</span><br><span class="line"> <span class="keyword">with</span> <span class="built_in">open</span>(input_path, <span class="string">'r'</span>, encoding=<span class="string">'utf-8'</span>) <span class="keyword">as</span> js:</span><br><span class="line"> <span class="keyword">for</span> line <span class="keyword">in</span> js.readlines():</span><br><span class="line"> data = json.loads(line)</span><br><span class="line"> pid = data.get(<span class="string">'pid'</span>)</span><br><span class="line"> seg_list = data.get(<span class="string">'document'</span>)</span><br><span class="line"> seg = ltp.seg(seg_list)</span><br><span class="line"> data[<span class="string">'document'</span>] = [<span class="string">' '</span>.join(item) <span class="keyword">for</span> item <span class="keyword">in</span> seg[<span class="number">0</span>]]</span><br><span class="line"> word_cnt = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> item <span class="keyword">in</span> seg[<span class="number">0</span>]:</span><br><span class="line"> word_cnt += <span class="built_in">len</span>(item)</span><br><span class="line"> <span class="keyword">for</span> word <span class="keyword">in</span> item:</span><br><span class="line"> flag = <span class="number">0</span></span><br><span class="line"> <span class="keyword">if</span> word <span class="keyword">in</span> <span class="variable language_">self</span>.stop_words:</span><br><span class="line"> <span class="keyword">pass</span></span><br><span class="line"> <span class="keyword">elif</span> word <span class="keyword">not</span> <span class="keyword">in</span> <span class="variable language_">self</span>.word_dict:</span><br><span class="line"> <span class="variable language_">self</span>.word_dict[word] = []</span><br><span class="line"> <span class="variable language_">self</span>.word_dict[word].append([pid, <span class="number">1</span>])</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> index_list = <span class="variable language_">self</span>.word_dict[word]</span><br><span class="line"> <span class="keyword">for</span> index <span class="keyword">in</span> index_list:</span><br><span class="line"> <span class="keyword">if</span> index[<span class="number">0</span>] == pid:</span><br><span class="line"> index[<span class="number">1</span>] += <span class="number">1</span></span><br><span class="line"> flag = <span class="number">1</span></span><br><span class="line"> <span class="keyword">if</span> flag == <span class="number">0</span>:</span><br><span class="line"> <span class="variable language_">self</span>.word_dict[word].append([pid, <span class="number">1</span>])</span><br><span class="line"> progress_read_index.update(<span class="number">1</span>)</span><br><span class="line"> <span class="variable language_">self</span>.cnts.append(word_cnt)</span><br><span class="line"> <span class="keyword">with</span> <span class="built_in">open</span>(<span class="variable language_">self</span>.index_path, <span class="string">'w'</span>, encoding=<span class="string">'utf-8'</span>) <span class="keyword">as</span> index_output:</span><br><span class="line"> <span class="keyword">for</span> key, value <span class="keyword">in</span> <span class="variable language_">self</span>.word_dict.items():</span><br><span class="line"> index_output.write(<span class="built_in">str</span>(key) + <span class="string">';;;'</span>)</span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> value:</span><br><span class="line"> index_output.write(<span class="built_in">str</span>(i) + <span class="string">'.'</span>)</span><br><span class="line"> index_output.write(<span class="string">'\n'</span>)</span><br><span class="line"> <span class="keyword">with</span> <span class="built_in">open</span>(<span class="string">'data/words.txt'</span>, <span class="string">'w'</span>, encoding=<span class="string">'utf-8'</span>) <span class="keyword">as</span> fp:</span><br><span class="line"> <span class="keyword">for</span> item <span class="keyword">in</span> <span class="variable language_">self</span>.cnts:</span><br><span class="line"> fp.write(<span class="string">'%d\n'</span> % item)</span><br><span class="line"> exit(<span class="number">0</span>)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">read_index</span>(<span class="params">self</span>):</span><br><span class="line"> progress_read_index = tqdm.tqdm(<span class="built_in">range</span>(<span class="number">355109</span>), <span class="string">f'读取索引中,目前进度'</span>)</span><br><span class="line"> <span class="keyword">with</span> <span class="built_in">open</span>(<span class="string">'data/words.txt'</span>, <span class="string">'r'</span>, encoding=<span class="string">'utf-8'</span>) <span class="keyword">as</span> fp:</span><br><span class="line"> <span class="variable language_">self</span>.cnts = fp.read().split(<span class="string">'\n'</span>)</span><br><span class="line"> <span class="keyword">with</span> <span class="built_in">open</span>(<span class="variable language_">self</span>.index_path, <span class="string">'r'</span>, encoding=<span class="string">'utf-8'</span>) <span class="keyword">as</span> fp:</span><br><span class="line"> <span class="keyword">for</span> line <span class="keyword">in</span> fp.readlines():</span><br><span class="line"> line = line.split(<span class="string">';;;'</span>)<span class="comment"># 注意挑选合适的分隔符</span></span><br><span class="line"> <span class="keyword">if</span> <span class="built_in">len</span>(line) != <span class="number">2</span>:</span><br><span class="line"> <span class="built_in">print</span>(<span class="string">'error'</span>)</span><br><span class="line"> <span class="keyword">pass</span></span><br><span class="line"> word, index = line[<span class="number">0</span>], line[<span class="number">1</span>]</span><br><span class="line"> pid_list = index[:-<span class="number">2</span>].split(<span class="string">'.'</span>)</span><br><span class="line"> <span class="variable language_">self</span>.word_dict[word] = pid_list</span><br><span class="line"> progress_read_index.update(<span class="number">1</span>)</span><br><span class="line"> <span class="built_in">print</span>(<span class="string">'-----***索引读取完毕***-----'</span>)</span><br><span class="line"> <span class="comment"># print('输入查询文本,输入 !quit 退出')</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">search</span>(<span class="params">self, conds: <span class="built_in">str</span></span>) -> <span class="built_in">list</span>:</span><br><span class="line"> <span class="string">"""</span></span><br><span class="line"><span class="string"> 检索TOP3相关文档</span></span><br><span class="line"><span class="string"> :param conds: 查询条件</span></span><br><span class="line"><span class="string"> :return: 可能的文档列表: list</span></span><br><span class="line"><span class="string"> """</span></span><br><span class="line"> seg, hidden = ltp.seg([conds])</span><br><span class="line"> conds = <span class="string">'||'</span>.join(seg[<span class="number">0</span>])</span><br><span class="line"> <span class="keyword">if</span> conds != <span class="string">'!quit'</span>:</span><br><span class="line"> <span class="keyword">if</span> <span class="string">'&&'</span> <span class="keyword">in</span> conds:</span><br><span class="line"> conds = conds.split(<span class="string">'&&'</span>)</span><br><span class="line"> pid_list = <span class="variable language_">self</span>.word_dict.get(conds[<span class="number">0</span>])</span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> <span class="built_in">range</span>(<span class="number">1</span>, <span class="built_in">len</span>(conds)):</span><br><span class="line"> merge_list = <span class="variable language_">self</span>.word_dict.get(conds[i])</span><br><span class="line"> temp_list = []</span><br><span class="line"> <span class="keyword">for</span> item <span class="keyword">in</span> merge_list:</span><br><span class="line"> <span class="keyword">if</span> item <span class="keyword">in</span> pid_list:</span><br><span class="line"> temp_list.append(item)</span><br><span class="line"> pid_list = temp_list</span><br><span class="line"> <span class="keyword">elif</span> <span class="string">'||'</span> <span class="keyword">in</span> conds:</span><br><span class="line"> conds = conds.split(<span class="string">'||'</span>)</span><br><span class="line"> pid_list = []</span><br><span class="line"> weight = []</span><br><span class="line"> <span class="keyword">for</span> cond <span class="keyword">in</span> conds:</span><br><span class="line"> merge_list = <span class="variable language_">self</span>.word_dict.get(cond)</span><br><span class="line"> <span class="keyword">if</span> merge_list <span class="keyword">is</span> <span class="keyword">not</span> <span class="literal">None</span>:</span><br><span class="line"> df = <span class="built_in">len</span>(merge_list)</span><br><span class="line"> idf = <span class="number">1</span> / df</span><br><span class="line"> <span class="keyword">for</span> item <span class="keyword">in</span> merge_list:</span><br><span class="line"> item = item.split(<span class="string">','</span>)</span><br><span class="line"> pid = <span class="built_in">int</span>(item[<span class="number">0</span>][<span class="number">1</span>:])</span><br><span class="line"> fre = <span class="built_in">int</span>(item[<span class="number">1</span>][:-<span class="number">1</span>])</span><br><span class="line"> tf=math.log(fre+<span class="number">3</span>)</span><br><span class="line"> tf_idf = tf * idf<span class="comment"># 参数自己设置</span></span><br><span class="line"> <span class="keyword">if</span> pid <span class="keyword">not</span> <span class="keyword">in</span> pid_list:</span><br><span class="line"> pid_list.append(pid)</span><br><span class="line"> weight.append(tf_idf)</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> <span class="built_in">range</span>(<span class="built_in">len</span>(pid_list)):</span><br><span class="line"> <span class="keyword">if</span> pid_list[i] == pid:</span><br><span class="line"> weight[i] += tf_idf</span><br><span class="line"> <span class="keyword">break</span></span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> pid_list = <span class="variable language_">self</span>.word_dict.get(conds)</span><br><span class="line"> <span class="keyword">if</span> <span class="keyword">not</span> pid_list:</span><br><span class="line"> <span class="built_in">print</span>(<span class="string">'None!'</span>)</span><br><span class="line"> <span class="keyword">return</span> [-<span class="number">1</span>]</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> pid_weight = <span class="built_in">sorted</span>([(w, p) <span class="keyword">for</span> w, p <span class="keyword">in</span> <span class="built_in">zip</span>(weight, pid_list)], reverse=<span class="literal">True</span>)</span><br><span class="line"> <span class="built_in">print</span>(pid_weight[<span class="number">0</span>])</span><br><span class="line"> <span class="keyword">if</span> <span class="built_in">len</span>(pid_weight)==<span class="number">1</span>:</span><br><span class="line"> <span class="keyword">return</span> [pid_weight[<span class="number">0</span>][<span class="number">1</span>]]</span><br><span class="line"> <span class="keyword">elif</span> <span class="built_in">len</span>(pid_weight)==<span class="number">2</span>:</span><br><span class="line"> <span class="keyword">return</span> [pid_weight[<span class="number">0</span>][<span class="number">1</span>], pid_weight[<span class="number">1</span>][<span class="number">1</span>]]</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="keyword">return</span> [pid_weight[<span class="number">0</span>][<span class="number">1</span>],pid_weight[<span class="number">1</span>][<span class="number">1</span>],pid_weight[<span class="number">2</span>][<span class="number">1</span>]]</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">"__main__"</span>:</span><br><span class="line"> pre = PreProcessed()</span><br><span class="line"> choice = <span class="built_in">input</span>(<span class="string">'\n****Inverted Index IR System****\nChoice:\n1. 读取文件建立索引\n2. 使用已有索引查询'</span>)</span><br><span class="line"> <span class="keyword">if</span> choice == <span class="string">'1'</span>:</span><br><span class="line"> pre.read_stop_words(<span class="string">'data/stopwords.txt'</span>)</span><br><span class="line"> pre.generate_index(<span class="string">'data/passages_multi_sentences.json'</span>)</span><br><span class="line"> <span class="keyword">elif</span> choice == <span class="string">'2'</span>:</span><br><span class="line"> pre.read_index()</span><br><span class="line"> pre.search(<span class="string">'家||中国'</span>)</span><br><span class="line"> exit(<span class="number">0</span>)</span><br><span class="line"></span><br></pre></td></tr></table></figure>]]></content>
<categories>
<category> NLP </category>
</categories>
<tags>
<tag> NLP </tag>
</tags>
</entry>
</search>