TL;DR: As organizations upgrade their websites to modern, flexible Digital Experience Platforms (DXPs), content migration becomes a critical task. This involves transferring media assets, data types, pages, content items, code, and analytics data to new systems. Key challenges include mapping between incompatible structures, handling multi-language content, and ensuring smooth migration of media and large volume of data. Automated tools, such as scripts, APIs, and AI, can speed up this process. Real-world examples, like migrating assets from Sitecore to Content Hub or transitioning from Sitecore XM to Optimizely CMS, show how these tools streamline migration from weeks to days or even hours.
Introduction
Nowadays, organizations are changing their websites every 4-5 years. Such a period in MarTech typically means that the stack used to build the website may be too old and more modern/cheaper/flexible options are already available on the market. To keep the cost of such a project reasonable, the majority of the organizations want to keep their existing content, pages and media assets produced over the years and migrate it to the new system. Such a system can be a brand new product, or a new version of the product already used by the organization:
In a nutshell we can divide the website migration related work into following tasks:
- Migrate Media Assets
- Migrate Data Types and Components
- Migrate Pages
- Migrate Content Items
- Migrate Code
- Migrate Analytics Data
Let’s dive into each point and discuss possible challenges.
Media Assets Migration
Media assets (images, videos, vectors, documents) are widely used on almost every website, with the numbers which can reach to tens of thousands it’s nearly impossible to migrate it manually.
We can divide the typical migration of the assets into:
- Extracting assets items from the source system, including their names, identifiers, metadata like alt texts, tags or folder structure and public URLs to the asset’s file.
- Import assets items to the destination system, including meta data and generate new public URLs.
- Create a mapping between the assets from old and a new system and update the usages of the assets inside the content. For example, assets can be used in a dedicated „Image” or „File” fields in the content items or anywhere inside „Rich Text” fields. Enterprise CMS typically stores the media assets inside the content using identifiers and not the direct URLs to the files, which helps authors to manage it in the future (a single change of the asset in one place updates it in all the places where used).
Migrate Data Types and Components
Majority of CMS or CMP platforms use data types to model the content. It can be Templates which are used to create Items in Sitecore, or Content Types used to create Blocks and Experiences in Optimizely, or Entity Types which structures Entities in Content Hub. Typically during the migration we need to take into account the fact that source and destination data types may not be fully compatible. It can be because we move the content between different products, or the design of the website changes and data types need to be adjusted to work with the new design.
Therefore for a successful migration we need a process which can create a mapping between content types (a Hero component may be represented by a „Hero Data” template in Sitecore and a „Hero Banner” entity type in Content Hub CMP) and field types of source and destination systems (a „Tree List” type in Sitecore and a „Content Area” in Optimizely).
In Sitecore a Component is built from a Rendering Definition Item, optional Rendering Parameters and a Datasource. In Optimizely a Component is represented by a Block (or Experience in Visual Builder). At first it may be confusing to map one into another, but If you think about the component in a headless way, both Sitecore component and Optimizely block can be represented by a JSON in a quite similar format, either returned by a Sitecore Headless Service for JSS, Optimizely Content API, or a GraphQL endpoint in both cases. This can be helpful in a migration process.
Migrate Pages
A hybrid headless CMS like Sitecore or Optimizely has a tree-structure representation of the website, with a parent-children relation between page and subpages. This relation can be different in a traditional headless CMS, which is another issue to consider. The page can also have a meta data (typically multi-language text content) and list of components added in an ordered way (for example inside placeholders in Sitecore, or Content Areas in Optimizely), which can be represented by a simple list.
Migrate Content Items
Content Items can be either a data displayed directly inside a component, like a Hero Banner’s background image, headline, description and CTA, or a data used by a website indirectly (for example a structure of categories used to filter some other data, or a shared list of countries used inside the drop-downs in forms). It is important to have a strategy for both types of the content and consider that all can be in multiple languages.
Migrate Code
Nowadays the most common approach to build websites using Digital Experience Platforms is a headless with for example React or Vue components and frameworks like Next.js to structure it into a web app with the content delivered from CMS with GraphQL or API endpoint. This standard may help with the migration when we don’t want to change the design language, but only re-platform the website, then we only need to change the part responsible fetching the page layout and the components content from CMS and rendering individual fields (typically CMS provides some helpers which may vary between the platforms) which we can move the rest (style and HTML structure of the components almost untouched). To address the design differences we can also use AI like v0.dev from Vercel or tools like builder.io Visual Copilot to help us with the code transitions between languages or to generate new code from the design/image.
Migrate Analytics Data
Migration of analytics data may be tricky and require a lot of effort. It can also mean transferring a lot of data (for example one of our customers gathered over 1TB of data in xDB Shards). Another consideration is that nowadays platforms like Sitecore XM Cloud and Optimizely SaaS (or PaaS from version 12) no longer offer the self-hosted analytics solution and force customers to use SaaS-based Customer Data products. Here we need to make sure that such a product offers the import of data including contacts and their interactions from external sources.
Automated Migration Tool
In the enterprise world every customer solution is different and there is no universal solution which will work for everyone. But with a set of tools: scripts, APIs, custom code and AI, it is possible to swift the migration from weeks to days or hours.
In Include Agency we made our migration tools generic, so it can be easily adapted to customer’s solution. Additionally we don’t directly connect the source and destination systems to flow the data, instead we export the data from the source system into universal format for example JSON, CSV and then import data from universal format data to the destination system. This approach gives us better flexibility, so we can migrate the content from more sources and destination systems without rewriting the entire tool each time we have a new migration.
Let’s have another look into the diagram from the beginning of the article and update it:
Here each connector is atomic and could work separately as a script, API request, or a piece of code and could connect from any supported source to any destination.
Example Migration: Sitecore Media Library to Sitecore Content Hub
Sitecore Content Hub is a powerful CMP and DAM (Digital Asset Management) system and part of Sitecore DXP suite. Many Sitecore customers traditionally keep the assets within the CMS inside the Sitecore Media Library, but with a strong demand of multi-channel in large organizations a dedicated DAM solution can help to cover the limitation of Media Library.
For this migration we used Sitecore Powershell Extension to create scripts which extract assets data from Media Library for a given brand and created a CSV file containing item name, ID, a folder structure converted into tags, meta data and public path to the media files. We used bulk upload to insert the assets into Content Hub and to automatically create public links.
In the next step we exported assets from Content Hub into another CSV file with the mapping between the entities from old and a new system. Lastly we used other SPE scripts to go through entire Sitecore content and updated every occurrence of the media assets for all sites.
With that semi automated approach we could execute the migration of thousands of assets and then upgrade the content of almost a thousand of pages in 5 to 10 languages in less than a single working day, which helped the team to test the website on multiple stages before doing the final migration on the production environment.