Moving from MediaWiki to SharePoint O365 - part 4

Having our scripts in place to emulate some of the MediaWiki functionality and knowing how to use it, we can now finally turn our attention to the actual migration of content. Content comes in two forms: text, as inputted in MediaWiki, and attachments. The attachments can be documents (docx, xslx, pdf, ...) which are available for download, or they can be images (png, jpg, gif, ...) which are displayed in the articles themselves.

In this post, let's focus our attention on the attachments.

Migrating files from MediaWiki to SharePoint O365

In your MediaWiki installation, you need to go to your images folder and copy it. In the www directory of your UniServerZ folder, create a new folder called wikimigration. This will be our project work folder (for an Eclipse PHP project, if you're using Eclipse). Inside that folder, we create a folder fileProcessing. Paste the images folder from the MediaWiki installation in here so we can work with this copy.

In the images folder, there are a number of directories: they go from 0-9 and a-z and some others besides. If you had the Math extension working on your wiki, there's probably a math folder. Since we have the MathJax scripts in place to re-render all math, you can safely delete this folder. That's a whole bunch of images we don't have to worry about any more.
There's also a thumb folder in there: this contains pre-rendered thumbnails. If you set a certain image with for an image in a MediaWiki article, the system pre-renders the image at that size so as to limit the amount of data that needs to be sent to the browser. That's clever! But a bit too clever for SharePoint. So you can go ahead and delete this folder as well - the originals are all in the other folders, and the originals is what we are going to be working with.
Some more folders you can delete if they appear in you installation: tmp, temp, archive, deleted, lockdir, timeline.

The remaining stuff is the stuff we need. We're now going to get all the files out of the hierarchical folder structure (it goes many folders deep) into a flat structure (i.e. we're going to move all the files into a single directory without subdirectories). Well, two directories really: one for non-image files, and one for image files.

The script

In the fileProcessing directory you made earlier, create two additional folders: imagesFlat and filesFlat. Now, using a text editor (Eclipse or whatever IDE, in worst case use Notepad), create a file called filePathFlattener.php in the fileProcessing directory. Copy this code and paste it in this file, update the rootpath on line 4, then save it.

Before we run the script, right-click the images folder and ask for its properties. Note down the number of files - as a check, for later.
Now we're going to run the script to move all the files into either the imagesFlat or the filesFlat directory. To do this, open a command line console and type the full path to the php.exe in your UniServerZ folders, followed by a space and the path to the php-script, best put between quotes:

C:\something\UniServerZ\core\php54\php.exe "C:\something\UniServerZ\www\wikimigration\fileProcessing\filePathFlattener.php"

Have some patience while the script is moving stuff. When it's ready, it should tell you how many files it copied - the number should of course be identical to the number of files you had originally.

On to SharePoint

In an earlier article, I explained how to open a SharePoint library with Explorer. Now open the Documents library (through Site Contents) in Explorer (where it's called Shared Documents after you open it - go figure). Now simply copy all files from filesFlat to this directory. Depending on the number of files you have, this may take some time.

We're going to do the same thing for the images: open the Site Assets library (through Site Contents) in Explorer. Create a folder mw (for MediaWiki) and copy all images from imagesFlat into this directory.

And now we wait?

While you're waiting for everything to copy - for me at least, this is going hideously slow: 10 KB/s is not of this age... what am I working with here, a 56K modem? - let's turn our attention to the textual content of the articles.