-
Notifications
You must be signed in to change notification settings - Fork 48
/
Copy pathdata.html
196 lines (146 loc) · 13.6 KB
/
data.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
<!DOCTYPE html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="theme-color" content="#2D2D2D" />
<title>NLTK :: Installing NLTK Data</title>
<link rel="stylesheet" href="_static/css/nltk_theme.css"/>
<link rel="stylesheet" href="_static/css/custom.css"/>
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script type="text/javascript" src="_static/documentation_options.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<script type="text/javascript" src="_static/sphinx_highlight.js"></script>
<script src="https://email.tl.fortawesome.com/c/eJxNjUEOgyAQAF8jR7Kw6wIHDh7sP1Cw2mgxgmn6-3JsMqc5zEQfE8dkxOY1KKMUOI3ACFKRJpSW2AAp7ontYIaxI6i7XPJVwyeVfCQ550Os3jLrGSNOLgbdAy6s0PBk2TFNjEbsfq31LB0OnX407pJa5v2faRadwSW63mn5KuLyR9j2tgx3zecanl-55R_-jjPs"></script>
</head>
<body>
<div id="nltk-theme-container">
<header>
<div id="logo-container">
<h1>
<a href="index.html">NLTK</a>
</h1>
</div>
<div id="project-container">
<h1>Documentation</h1>
</div>
<a id="menu-toggle" class="fa fa-bars" aria-hidden="true"></a>
<script type="text/javascript">
$("#menu-toggle").click(function() {
$("#menu-toggle").toggleClass("toggled");
$("#side-menu-container").slideToggle(300);
});
</script>
</header>
<div id="content-container">
<div id="side-menu-container">
<div id="search" role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
<div id="side-menu" role="navigation">
<p class="caption" role="heading"><span class="caption-text">NLTK Documentation</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="api/nltk.html">API Reference</a></li>
<li class="toctree-l1"><a class="reference internal" href="howto.html">Example Usage</a></li>
<li class="toctree-l1"><a class="reference internal" href="py-modindex.html">Module Index</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/nltk/nltk/wiki">Wiki</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/nltk/nltk/wiki/FAQ">FAQ</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/nltk/nltk/issues">Open Issues</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/nltk/nltk">NLTK on GitHub</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Installation</span></p>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="install.html">Installing NLTK</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">Installing NLTK Data</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">More</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="news.html">Release Notes</a></li>
<li class="toctree-l1"><a class="reference internal" href="contribute.html">Contributing to NLTK</a></li>
<li class="toctree-l1"><a class="reference internal" href="team.html">NLTK Team</a></li>
</ul>
</div>
</div>
<div id="main-content-container">
<div id="main-content" role="main">
<section id="installing-nltk-data">
<h1>Installing NLTK Data<a class="headerlink" href="#installing-nltk-data" title="Link to this heading">¶</a></h1>
<p>NLTK comes with many corpora, toy grammars, trained models, etc. A complete list is posted at: <a class="reference external" href="https://www.nltk.org/nltk_data/">https://www.nltk.org/nltk_data/</a></p>
<p>To install the data, first install NLTK (see <a class="reference external" href="https://www.nltk.org/install.html">https://www.nltk.org/install.html</a>), then use NLTK’s data downloader as described below.</p>
<p>Apart from individual data packages, you can download the entire collection (using “all”), or just the data required for the examples and exercises in the book (using “book”), or just the corpora and no grammars or trained models (using “all-corpora”).</p>
<section id="interactive-installer">
<h2>Interactive installer<a class="headerlink" href="#interactive-installer" title="Link to this heading">¶</a></h2>
<p><em>For central installation on a multi-user machine, do the following from an administrator account.</em></p>
<p>Run the Python interpreter and type the commands:</p>
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">nltk</span>
<span class="gp">>>> </span><span class="n">nltk</span><span class="o">.</span><span class="n">download</span><span class="p">()</span>
</pre></div>
</div>
<p>A new window should open, showing the NLTK Downloader. Click on the File menu and select Change Download Directory. For central installation, set this to <code class="docutils literal notranslate"><span class="pre">C:\nltk_data</span></code> (Windows), <code class="docutils literal notranslate"><span class="pre">/usr/local/share/nltk_data</span></code> (Mac), or <code class="docutils literal notranslate"><span class="pre">/usr/share/nltk_data</span></code> (Unix). Next, select the packages or collections you want to download.</p>
<p>If you did not install the data to one of the above central locations, you will need to set the <code class="docutils literal notranslate"><span class="pre">NLTK_DATA</span></code> environment variable to specify the location of the data. (On a Windows machine, right click on “My Computer” then select <code class="docutils literal notranslate"><span class="pre">Properties</span> <span class="pre">></span> <span class="pre">Advanced</span> <span class="pre">></span> <span class="pre">Environment</span> <span class="pre">Variables</span> <span class="pre">></span> <span class="pre">User</span> <span class="pre">Variables</span> <span class="pre">></span> <span class="pre">New...</span></code>)</p>
<p>Test that the data has been installed as follows. (This assumes you downloaded the Brown Corpus):</p>
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">nltk.corpus</span> <span class="kn">import</span> <span class="n">brown</span>
<span class="gp">>>> </span><span class="n">brown</span><span class="o">.</span><span class="n">words</span><span class="p">()</span>
<span class="go">['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]</span>
</pre></div>
</div>
<section id="installing-via-a-proxy-web-server">
<h3>Installing via a proxy web server<a class="headerlink" href="#installing-via-a-proxy-web-server" title="Link to this heading">¶</a></h3>
<p>If your web connection uses a proxy server, you should specify the proxy address as follows. In the case of an authenticating proxy, specify a username and password. If the proxy is set to None then this function will attempt to detect the system proxy.</p>
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">nltk</span><span class="o">.</span><span class="n">set_proxy</span><span class="p">(</span><span class="s1">'http://proxy.example.com:3128'</span><span class="p">,</span> <span class="p">(</span><span class="s1">'USERNAME'</span><span class="p">,</span> <span class="s1">'PASSWORD'</span><span class="p">))</span>
<span class="gp">>>> </span><span class="n">nltk</span><span class="o">.</span><span class="n">download</span><span class="p">()</span>
</pre></div>
</div>
</section>
</section>
<section id="command-line-installation">
<h2>Command line installation<a class="headerlink" href="#command-line-installation" title="Link to this heading">¶</a></h2>
<p>The downloader will search for an existing <code class="docutils literal notranslate"><span class="pre">nltk_data</span></code> directory to install NLTK data. If one does not exist it will attempt to create one in a central location (when using an administrator account) or otherwise in the user’s filespace. If necessary, run the download command from an administrator account, or using sudo. The recommended system location is <code class="docutils literal notranslate"><span class="pre">C:\nltk_data</span></code> (Windows); <code class="docutils literal notranslate"><span class="pre">/usr/local/share/nltk_data</span></code> (Mac); and <code class="docutils literal notranslate"><span class="pre">/usr/share/nltk_data</span></code> (Unix). You can use the <code class="docutils literal notranslate"><span class="pre">-d</span></code> flag to specify a different location (but if you do this, be sure to set the <code class="docutils literal notranslate"><span class="pre">NLTK_DATA</span></code> environment variable accordingly).</p>
<p>Run the command <code class="docutils literal notranslate"><span class="pre">python</span> <span class="pre">-m</span> <span class="pre">nltk.downloader</span> <span class="pre">all</span></code>. To ensure central installation, run the command <code class="docutils literal notranslate"><span class="pre">sudo</span> <span class="pre">python</span> <span class="pre">-m</span> <span class="pre">nltk.downloader</span> <span class="pre">-d</span> <span class="pre">/usr/local/share/nltk_data</span> <span class="pre">all</span></code>.</p>
<p>Windows: Use the “Run…” option on the Start menu. Windows Vista users need to first turn on this option, using <code class="docutils literal notranslate"><span class="pre">Start</span> <span class="pre">-></span> <span class="pre">Properties</span> <span class="pre">-></span> <span class="pre">Customize</span></code> to check the box to activate the “Run…” option.</p>
<p>Test the installation: Check that the user environment and privileges are set correctly by logging in to a user account,
starting the Python interpreter, and accessing the Brown Corpus (see the previous section).</p>
</section>
<section id="manual-installation">
<h2>Manual installation<a class="headerlink" href="#manual-installation" title="Link to this heading">¶</a></h2>
<p>Create a folder <code class="docutils literal notranslate"><span class="pre">nltk_data</span></code>, e.g. <code class="docutils literal notranslate"><span class="pre">C:\nltk_data</span></code>, or <code class="docutils literal notranslate"><span class="pre">/usr/local/share/nltk_data</span></code>,
and subfolders <code class="docutils literal notranslate"><span class="pre">chunkers</span></code>, <code class="docutils literal notranslate"><span class="pre">grammars</span></code>, <code class="docutils literal notranslate"><span class="pre">misc</span></code>, <code class="docutils literal notranslate"><span class="pre">sentiment</span></code>, <code class="docutils literal notranslate"><span class="pre">taggers</span></code>, <code class="docutils literal notranslate"><span class="pre">corpora</span></code>,
<code class="docutils literal notranslate"><span class="pre">help</span></code>, <code class="docutils literal notranslate"><span class="pre">models</span></code>, <code class="docutils literal notranslate"><span class="pre">stemmers</span></code>, <code class="docutils literal notranslate"><span class="pre">tokenizers</span></code>.</p>
<p>Download individual packages from <code class="docutils literal notranslate"><span class="pre">https://www.nltk.org/nltk_data/</span></code> (see the “download” links).
Unzip them to the appropriate subfolder. For example, the Brown Corpus, found at:
<code class="docutils literal notranslate"><span class="pre">https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/brown.zip</span></code>
is to be unzipped to <code class="docutils literal notranslate"><span class="pre">nltk_data/corpora/brown</span></code>.</p>
<p>Set your <code class="docutils literal notranslate"><span class="pre">NLTK_DATA</span></code> environment variable to point to your top level <code class="docutils literal notranslate"><span class="pre">nltk_data</span></code> folder.</p>
</section>
</section>
</div>
</div>
</div>
<footer>
<div id="footer-info">
<ul id="build-details">
<li class="footer-element">
<a href="_sources/data.rst.txt" rel="nofollow"> source</a>
</li>
<li class="footer-element">
<a href="https://github.com/nltk/nltk/tree/3.9.1">3.9.1</a>
</li>
<li class="footer-element">
Aug 19, 2024
</li>
</ul>
<div id="copyright">
© 2024, NLTK Project
</div>
<div id="credit">
created with <a href="http://sphinx-doc.org/">Sphinx</a> and <a href="https://github.com/tomaarsen/nltk_theme">NLTK Theme</a>
</div>
</div>
</footer>
</div>
</body>
</html>