Skip to content
PiRSquared17 edited this page Sep 23, 2014 · 9 revisions

MediaWiki

List of wikis

Already exists.

Dump algorithm

Already exists.

Dump format.

MediaWiki XML dump + extras. Notes:

DokuWiki

List of wikis

Dump algorithm

Sketch:

  • Check XML-RPC API availability.
  • Generate list of titles (XML-RPC)
  • Use 'wiki.getAllPages' or 'dokuwiki.getPagelist' (is this restricted to single NS at a time?). (TODO: determine if this is usable)
  • Generate list of titles (do=index or ajax.php)
  • Try to load /lib/exe/ajax.php call=index (exists in newer versions) or do=index (present since first release)
  • Recursively, for each namespace, load the appropriate sub-index, adding each title to a list.
  • Add root pages to list.
  • Generate list of media/files/uploads
  • Check for /lib/exe/mediamanager.php or do=media
  • If it exists, use it recursively in each namespace, collecting a list of all file names (or just downloading each) e.g. ?ns=n1:ns2
  • Extract file details from /lib/exe/detail.php or do=media
  • Or /lib/exe/ajax.php call=medialist on each namespace?
  • Or XML-RPC?
  • Export current page content
  • export: do=export_raw (not in first release) or XML-RPC wiki.getPage. wiki.getPageInfo gives metadata about the current revision.
  • do=edit scrape textarea content
  • Export full history
  • do=revisions or XML-RPC wiki.getPageVersions.
  • For each revision, do ?rev=insert_rev_id_here&do=edit or ?rev=1234&do=export_raw. Or XML-RPC wiki.getPageVersion. wiki.getPageInfoVersion gives revision metadata.
  • Get site version and metadata
  • Note: In recent DokuWiki releases, it is not possible to get the version.
  • Download do=check
  • Try to preview a page with the following content (example output):
====== ~~INFO:syntaxmodes~~ ======
~~INFO:syntaxmodes~~
====== ~~INFO:syntaxtypes~~ ======
~~INFO:syntaxtypes~~
====== ~~INFO:syntaxplugins~~ ======
~~INFO:syntaxplugins~~
====== ~~INFO:adminplugins~~ ======
~~INFO:adminplugins~~
====== ~~INFO:actionplugins~~ ======
~~INFO:actionplugins~~
====== ~~INFO:rendererplugins~~ ======
~~INFO:rendererplugins~~
====== ~~INFO:helperplugins~~ ======
~~INFO:helperplugins~~
====== ~~INFO:helpermethods~~ ======
~~INFO:helpermethods~~
====== ~~INFO:authplugins~~ ======
~~INFO:authplugins~~
====== ~~INFO:remoteplugins ~~ ======
~~INFO:remoteplugins~~
====== ~~INFO:version~~ ======
~~INFO:version~~

Dump format

Compressed data directory. cache, index, locks, tmp probably not needed.

MoinMoin

...

UseModWiki, OddMuseWiki, etc.

List of wikis

Dump algorithm

  • Check if raw=1 is available.
  • Get list of pages
  • Use action=index (add &raw=1 if available).
  • Download current version only
  • For each page title, either get action=browse&id=FooBar&raw=1 (preferable) or action=edit&id=FooBar. If raw not available, scrape textarea content of edit box.
  • Loop
  • Get history of each page (note: UseModWiki history is not permanent!)
  • Use action=history&id=FooBar
  • Parse.
  • For each revision, download raw content:
  • If action=browse&id=Foo&revision=123&raw=1 is available, use that. Otherwise, use action=edit&id=Foo&revision=123
  • Get images.
  • Go through each saved page text, and search for image URLs defined using same regex as UseModWiki uses.
  • Save image.
  • Save site version/metadata.
  • Save action=version. In UseModWiki, this is not very useful, but it's cool to have for Oddmuse. Example: http://communitywiki.org/?action=version

Dump format

http://www.usemod.com/cgi-bin/wiki.pl?DataBase

Welcome to the WikiTeam documentation wiki! We are a group dedicated to archiving wikis around the Internet, and you are invited to be part of it! Find out more.


Clone this wiki locally