Skip to content

Commit

Permalink
5.0.0
Browse files Browse the repository at this point in the history
  • Loading branch information
spencermountain committed Dec 4, 2018
1 parent afce319 commit e99757b
Show file tree
Hide file tree
Showing 4 changed files with 14 additions and 8 deletions.
5 changes: 5 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,8 @@
* get skip_redirects actually working
* reduce default batch_size even lower
* add `verbose_skip` option, to log disambig/redirect skipping

## v5
* more consistent template json, via [wtf_wikipedia@7](https://github.com/spencermountain/wtf_wikipedia/blob/master/changelog.md#700)
* removal of empty `[]` results in `Section`.
* fs fixes for node > 9
10 changes: 5 additions & 5 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"author": "Spencer Kelly <[email protected]> (http://spencermounta.in)",
"name": "dumpster-dive",
"description": "get a wikipedia dump parsed into mongodb",
"version": "4.0.2",
"version": "5.0.0",
"repository": {
"type": "git",
"url": "git://github.com/spencermountain/wikipedia-to-mongodb.git"
Expand All @@ -26,7 +26,7 @@
"prettysize": "1.1.0",
"sunday-driver": "1.0.2",
"worker-nodes": "1.6.1",
"wtf_wikipedia": "6.2.1",
"wtf_wikipedia": "7.0.0",
"yargs": "12.0.5"
},
"devDependencies": {
Expand Down
3 changes: 2 additions & 1 deletion scratch.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ const drop = require('./src/lib/drop-db');

//144mb → 2.5 minutes = 57mb per worker per minute
const path = '/Users/spencer/data/wikipedia/enwiki-latest-pages-articles.xml'
// const path = '/Users/spencer/data/wikipedia/simplewiki-latest-pages-articles.xml'
// const path = './tests/smallwiki-latest-pages-articles.xml'; //3s
// const path = './tests/tinywiki-latest-pages-articles.xml'; //2s
const dbName = path.match(/\/([a-z-]+)-latest-pages/)[1];
Expand All @@ -22,7 +23,7 @@ let options = {
// skip_redirects: true,
// skip_disambig: true,
// missing_templates: true
// workers: 1
// workers: 2
// custom: function(doc) {
// console.log(doc.title())
// return {
Expand Down

0 comments on commit e99757b

Please sign in to comment.