URL formatting with Regex for Drupal and CKEDITOR

Tags: Drupal, CKEditor.

If you list a URL without the protocol and it is processed by the "Convert URLs into links" input filter, "http" will be automatically prepended, regardless of the site being setup for https. Since you don't know what the protocol should be without testing, this is an acceptable fallback. Since the site I am working on uses https, every page that includes the site address without the protocol is being flagged as a having a broken link. If it was a link to an external site I would not be as concerned, as these occasionally change anyways, but a link within a site should be correct. If you have multiple content editors, you know that formatting consistency/accuracy can be a challenge.

I use CKEDITOR for content creation, and have recently added the autocorrect plugin which among other things, will also auto-convert URLs to links. There are numerous link modules that I could play with or plugins for CKEDITOR but after a quick search I didn't find anything appropriate.

I chose to create a small plugin for CKEDITOR that looks for the site domain on paste or save and will process a regular expression to correct the formatting.

This regex will find variations of the domain with or without www, http, or https, and and change them to a standard format of https://www.example.com. It will also ignore the domain if it is used as the link text. View this example on regex101 for an explanation of the how it functions.

/((https?:\/\/)+?(www.)??example\.com)|((www\.)+?example\.com(?!<\/a>))/g

The CKEDITOR plugin is pretty simple, at one point this was borrowed from another plugin and modified. To use, create a "standardlinks" folder within your CKEDITOR plugins folder, name the file plugin.js, and then activate, either through the CKEDITOR config in Drupal or your standalone config file for CKEDITOR.

/**
 * @file Standardize URL Formatting
 */

(function() {
  var regex = {}, replacements = {
    "((https?:\/\/)+?(www.)??example\.com)|((www\.)+?example\.com(?!<\/a>))" : "https://www.example.com"
  };

  for (var key in replacements) {
    regex[key] = new RegExp(key, 'g');
  }

  function doReplaceOnEvent(evt) {
    var content = evt.data.dataValue || evt.data.txt || evt.data.html;
    if (content) {
      for (var key in replacements) {
      	content = content.replace(regex[key], replacements[key]);
      }
      if (evt.data.dataValue) {
        evt.data.dataValue = content;
      }
      else if (evt.data.text) {
        evt.data.text = content;
      }
      else {
        evt.data.html = content;
      }
    }
  }

  CKEDITOR.plugins.add( 'standardlinks', {
    init : function( editor ) {
      editor.on('paste', doReplaceOnEvent);
      editor.on('getData', doReplaceOnEvent);
      editor.on('setData', doReplaceOnEvent);
    }
  });

})();

Another option would be to create an Input Filter to use with your Text Format, from scratch or by using Custom filter. That would take the load off the client side and be less prone to error.