The script takes the folder with exported articles and converts them to WordPress WXR format: https://codex.wordpress.org/Importing_Content#WordPress
- Docker
- All images in article will be uploaded to "Media Library" in Wordpress and will have cross-linked with articles.
- Use the first image in post as a "Featured image" in WordPress.
- Works good with iframe embeds like videos.
- Replace the background images in article to Image tag in WordPress.
- Remove some useless elements from exported HTML like empty spacers and blocks.
- Remove first header from article content because WordPress will use additional header above article.
- Save article tags.
- Save article publish date to "medium_publish_date" meta field.
- Export articles from Medium and unzip it.
- Clone the repo and run:
# unpack the zip file to the "exported" folder: mkdir exported unzip -d exported medium.zip # Fetch Docker Node image, I have used latest. docker pull node # Install NPM dependencies: ./run.sh npm install # Run the script. # You should provide author name and starting ID (it will be used as post ID in Wordpress DB). ./run.sh node john.smith 6000
- The file named "export-john.smith-6000.xml" should appear in the folder.
- In your Wordpress installation install and activate Wordpress Importer plugin
- Replace wordpress-importer.php in plugin folder. My version works correctly with images in article which do not have extension.
- Now you can import the file to your Wordpress installation. I personally recommend to use the wp-cli "import" command as this is much more reliable than executing the process through the adminstration console.
- After import, all articles will be moved to "Draft" state so you could tune it before publishing. See after_import.sql for handy queries to prettify article content and set publish date from medium.
The project uses the altered (and improved) version of NPM node-wxr package: https://www.npmjs.com/package/wxr
Also, I have changed some behavior of cheerio package to deal with issue: cheeriojs/cheerio#866