-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathSearcher.search.html
138 lines (116 loc) · 8.49 KB
/
Searcher.search.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html><head><title>Python: module Searcher.search</title>
<meta charset="utf-8">
</head><body bgcolor="#f0f0f8">
<table width="100%" cellspacing=0 cellpadding=2 border=0 summary="heading">
<tr bgcolor="#7799ee">
<td valign=bottom> <br>
<font color="#ffffff" face="helvetica, arial"> <br><big><big><strong><a href="Searcher.html"><font color="#ffffff">Searcher</font></a>.search</strong></big></big></font></td
><td align=right valign=bottom
><font color="#ffffff" face="helvetica, arial"><a href=".">index</a><br><a href="file:/home/vamshi/PycharmProjects/InformationRetrieval/Searcher/search.py">/home/vamshi/PycharmProjects/InformationRetrieval/Searcher/search.py</a></font></td></tr></table>
<p><tt>This is the script that is exposed to the GUI/user. It calls the ranking<br>
script, takes the query and searches for it.</tt></p>
<p>
<table width="100%" cellspacing=0 cellpadding=2 border=0 summary="section">
<tr bgcolor="#aa55cc">
<td colspan=3 valign=bottom> <br>
<font color="#ffffff" face="helvetica, arial"><big><strong>Modules</strong></big></font></td></tr>
<tr><td bgcolor="#aa55cc"><tt> </tt></td><td> </td>
<td width="100%"><table width="100%" summary="list"><tr><td width="25%" valign=top><a href="heapq.html">heapq</a><br>
<a href="math.html">math</a><br>
</td><td width="25%" valign=top><a href="nltk.html">nltk</a><br>
<a href="operator.html">operator</a><br>
</td><td width="25%" valign=top><a href="shelve.html">shelve</a><br>
</td><td width="25%" valign=top></td></tr></table></td></tr></table><p>
<table width="100%" cellspacing=0 cellpadding=2 border=0 summary="section">
<tr bgcolor="#ee77aa">
<td colspan=3 valign=bottom> <br>
<font color="#ffffff" face="helvetica, arial"><big><strong>Classes</strong></big></font></td></tr>
<tr><td bgcolor="#ee77aa"><tt> </tt></td><td> </td>
<td width="100%"><dl>
<dt><font face="helvetica, arial"><a href="__builtin__.html#object">__builtin__.object</a>
</font></dt><dd>
<dl>
<dt><font face="helvetica, arial"><a href="Searcher.search.html#Searcher">Searcher</a>
</font></dt></dl>
</dd>
</dl>
<p>
<table width="100%" cellspacing=0 cellpadding=2 border=0 summary="section">
<tr bgcolor="#ffc8d8">
<td colspan=3 valign=bottom> <br>
<font color="#000000" face="helvetica, arial"><a name="Searcher">class <strong>Searcher</strong></a>(<a href="__builtin__.html#object">__builtin__.object</a>)</font></td></tr>
<tr bgcolor="#ffc8d8"><td rowspan=2><tt> </tt></td>
<td colspan=2><tt>This class defines all the search methods. It is the one that is exposed<br>
to Flask (for GUI).<br>
;query: String, the query entered by user.<br>
;query_score: a dictionary containing scores for each word query_word. The<br>
score is tf-idf score.<br>
;stop_word : a list that contains all the query_words whose df is greater<br>
than 500. They are considered stop words, and are given score of zero unless<br>
specifically told otherwise.<br>
;weighted : a boolean that checks whether the scores are calculated by the<br>
tf-idf scores or the scores given by the user.<br>
;top_corrections : a dict containing top_corrections for all the words in<br>
query that have zero df.<br>
;boolean_results : set of documents which satisfy boolean search model.<br> </tt></td></tr>
<tr><td> </td>
<td width="100%">Methods defined here:<br>
<dl><dt><a name="Searcher-__init__"><strong>__init__</strong></a>(self, input_query, **kwargs)</dt></dl>
<dl><dt><a name="Searcher-cosine_score"><strong>cosine_score</strong></a>(self)</dt><dd><tt>Calculates cosine score for query_words. It also adds query_word to<br>
query_corpus. If the word was already present in the corpus,<br>
it increases its value by 1.<br>
It also populates the stop_word list of this class so as to let the<br>
user know what are the stop-words.<br>
Uses heapq.sort to get the top 20 items.<br>
:return: Top 20 documents with highest score</tt></dd></dl>
<dl><dt><a name="Searcher-fill_title_results"><strong>fill_title_results</strong></a>(self)</dt><dd><tt>Find the documents which have all the query_terms in their titles and<br>
fill self.<strong>title_results</strong><br>
:return:</tt></dd></dl>
<dl><dt><a name="Searcher-query_score_calculator"><strong>query_score_calculator</strong></a>(self, words)</dt><dd><tt>This method updates the query_score dictionary with the score for each<br>
word. The score is calculated by tf-idf-cosine normalization for<br>
query_words. If the user supplies the scores for each words (<br>
determined by checking the weighted boolean), there is nothing left to do<br>
in this method, So we simply return. It also fills the boolean results<br>
set. Any document in boolean results set should be at the top of<br>
results list.<br>
1) First, term frequency of each term in the query is calculated.<br>
2) Df, idf are calculated with respect to the inverted index. if the<br>
df > 500, it is considered a stop word and added to the stop_words<br>
list and it's score is zero<br>
3) we consider each query a vector with dimensions as words and score<br>
corresponding to each word is used to calculate vector length<br>
4) score of each word is divided with this vector length to normalize<br>
and query_score is updated to contain updated scores.<br>
:param words: list of query_words<br>
:return:</tt></dd></dl>
<hr>
Data descriptors defined here:<br>
<dl><dt><strong>__dict__</strong></dt>
<dd><tt>dictionary for instance variables (if defined)</tt></dd>
</dl>
<dl><dt><strong>__weakref__</strong></dt>
<dd><tt>list of weak references to the object (if defined)</tt></dd>
</dl>
<hr>
Data and other attributes defined here:<br>
<dl><dt><strong>DICTIONARY</strong> = 'dictionary.db'</dl>
<dl><dt><strong>DOCUMENT_NUMBER</strong> = 690</dl>
<dl><dt><strong>LENGTH</strong> = 'length.db'</dl>
<dl><dt><strong>QUERY_CORPUS</strong> = 'query_corpus.db'</dl>
<dl><dt><strong>TITLES</strong> = 'titles.db'</dl>
<dl><dt><strong>boolean_results</strong> = set([])</dl>
<dl><dt><strong>query</strong> = None</dl>
<dl><dt><strong>query_score</strong> = {}</dl>
<dl><dt><strong>stop_word</strong> = []</dl>
<dl><dt><strong>title_results</strong> = set([])</dl>
<dl><dt><strong>top_corrections</strong> = {}</dl>
<dl><dt><strong>weighted</strong> = False</dl>
</td></tr></table></td></tr></table><p>
<table width="100%" cellspacing=0 cellpadding=2 border=0 summary="section">
<tr bgcolor="#55aa55">
<td colspan=3 valign=bottom> <br>
<font color="#ffffff" face="helvetica, arial"><big><strong>Data</strong></big></font></td></tr>
<tr><td bgcolor="#55aa55"><tt> </tt></td><td> </td>
<td width="100%"><strong>division</strong> = _Feature((2, 2, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 8192)</td></tr></table>
</body></html>