Generate react.js elements from markdown like text

This recursive function will format your text nicely into react elements. No library required.

When building modern web apps you will often come across the issue of rendering text stored in either json files, or perhaps from a database entity, or maybe text fetched from an endpoint. This is a quite fundamental necessity for many apps but it does come with some problems to solve. For example, how do you store a linebreak? Let's solve the linebreak first and then add a bit more functionality afterwards.

If you don't care about how it works you can scroll down to the final code snippet that contains the working function.

Step 1. Line breaks

Let's start by importing react, since we are gonna be using the createElement function. Next up, we will create the function that will generate linebreaks based on its input, I call it "textToHtml". Commonly used for linebreaks is \n (backslash n), we will follow this convention. First create an if statement like below, that checks if the input contains any linebreaks. If there is at least one linebreak detected we are gonna return a div that wraps all of our <p> elements (paragraphs), next up we are splitting the input on each linebreak so we end up with an array containing separate strings, one for each linebrak and no string will contain any additional linbreaks. Finally we return each string as a paragraph with the key prop i, so that react can distinguish the elements from each other. In case there is no linebreak detected in the if statement we will in the else clause just return a paragraph the entire input. We end up with this function:

const React = require('react');
const html = React.createElement;

const textToHtml = (input) => {
  if (input && (input.indexOf('\n') !== -1 || input.indexOf('\\n') !== -1)) {
    return (
      html('div', {}, input.split(/\n|\\n/).map((paragraph, i) => {
        return (
          html('p', { key: i }, paragraph);
      })));
  } else {
    return (
      html('p', {}, input));
  }
};

module.exports = {
  textToHtml
};

Now this is great, we are able to store text outside our app that are capable of being formatted with linbreaks. Just remember to  add \n whenever you want your text to do a linebreak.

Pretty quickly you will probably need more complex text in your app than just plain text with linebreaks. That could for example be bold text. Let's add a function that can produce bold, talic and underlined text.

Step 2. Match: bold, talic & underline

We will in this step be creating three different functions. First function is the most simple one, it will take three arguments, input (the text), element (the html element we want to use), regex (a pattern that will match our formatting language). If the regex matches anything we will add the element to the match object before returning it. Your function should look like this:

const matchElement = (input, element, regex) => {
  const match = regex.exec(input);
  if (match) match['element'] = element;
  return match;
};

Now that we have a function that is capable of detecting a pattern in a string and will return a match object containing the string (matched text) and an element type to use for generating the corresponding html tag. Now we need a function defining all the patterns and all the corresponding elements. Why not call it "matchAll"?

const matchAll = (input) => {
  const bold = matchElement(input, 'b', /\*\*([ a-z0-9!?.*_]+)\*\*/gmi);
  const talic = matchElement(input, 'i', /__([ a-z0-9!?.*_]+)__/gmi);
  const underLine = matchElement(input, 'u', /_\*([ a-z0-9!?.*_]+)\*_/gmi);
  return [bold, talic, underLine].filter(e => { return e != null; });
};

Our matchAll function should look something like this, but you can customize yours to whatever you prefer. This one matches:
**bold**
__talic__
_*underlined*_
Finally we return all match objects (technically they are arrays). Remember from previous function (matchElement), that match object contains the text between the format tags/symbols, and we have added an element type so we know which html element should be used for this string. We filter the list and remove all nulls.

To understand what's going on with that regex.exec(input) function, I have created a small example based on one of my unit tests.

const text = 'Today is a **happy** day. __Hurra__';

console.log(matchAll(text));

[
  '**happy**',
  'happy',
  index: 11,
  input: 'Today is a **happy** day. __Hurra__',
  groups: undefined,
  element: 'b'
]
[
  '__Hurra__',
  'Hurra',
  index: 26,
  input: 'Today is a **happy** day. __Hurra__',
  groups: undefined,
  element: 'i'
]

Step 3 - Recursive formatting function

Now we have a way of getting the text in small bites. The bites contains the string that we care about at index 1, plus information about which html element that string should be wrapped in on index 'element'. Now we can start building a recursive function that will format one bite, and then call it self with the remaining bites plus everything that is already formatted. At last if there is nothing more to format, we should return everything.

const formatParagraph = (unformatted, formatted) => {
  const matches = matchAll(unformatted);
  if (matches.length > 0) {
    const targetMatch = matches.sort((a, b) => { return a['index'] - b['index']; })[0];
    const remaining = unformatted.substring(targetMatch['index'] + targetMatch[0].length, unformatted.length);
    const newElement = h(targetMatch['element'], { key: targetMatch.index }, formatParagraph(targetMatch[1]));
    const processed = [formatted, [unformatted.substring(0, targetMatch.index), newElement]];
    return formatParagraph(remaining, processed);
  } else {
    return formatted ? [formatted, unformatted] : unformatted;
  }
};

This function is obviously a little more complex compared to the previous ones, but I will try to break it down to something understandable. I have called it formatParagraph because it formats the text in each paragraph, in other words the text between each linebreak. We are going to call this function from our textToHtml function, because that was where we divided the text into paragraphs based on the backslash n linebreak. The first argument of this function is called unformatted, because this is the text we give the function that have not yet been formatted. First off we run the matchAll function and we give it the unformatted text, we then receive all the matches as shown in the example above with the "happy day" sentence. If there is more than zero matches, we will convert each match into a nicely formatted react element which finally will be rendered as a html element. We will start from one end, formatting the first match in the text. To do so we are gonna sort the matches by index low to high, we can then grab the first match and store it in a variable called targetMatch. Next thing we want to worry about is the remaining text, to do that we create a variable called "remaining" where we store the unformatted text minus this target text. Now let's create the react element of type tagetMatch['element'], this was where we stored the html tag, we also must always remember the unique key prop, and finally we will give it the text. But wait... What if the text inside the matched pattern also contains another element to be formatted? It could for instance be __**bold and talic**__. That's right, this is a valid case and we will support that by doing our first recursive call with that text which will then be processed the same way. Finally we are gonna build up what have been processed. First we add "formatted" the second argument of the function, the first time the function is called this argument in undefined, but as the function keeps calling it self it is important to remember to add the already formatted text to the start of "processed". Next thing we want to add is all the text that was before the matched pattern, example: hello __world__, our match only focuses on the word "world", but we don't want to loose "hello" on the way, we add this text and finally we add the react element that we created. At last we call the function again, with the remaining text + the processed text, and this will happen all over again and again and again until all the text have been formatted and finally we will return it.

The final result (all the code)

This is the final result you can copy-paste this code directly into your react project and start using it, feel free to customzie it as much as you want, you could for example add more html tags, or you could use <br> tags for linebreaks instead of creating a new paragraph each time you want to do a linebreak, that is totally up to you. I have added a small replace in textToHtml, don't mind this too much, sometimes when I have received text from a database where the single quote have been replaced with its unicode &#x27; I have replaced this with an escaped single quote.

const React = require('react');
const html = React.createElement;

const textToHtml = (input) => {
  const text = input.replace('&#x27;', '\'');
  if (text && (text.indexOf('\n') !== -1 || text.indexOf('\\n') !== -1)) {
    return (
      html('div', {}, text.split(/\n|\\n/).map((paragraph, i) => {
        return (
          html('p', { key: i }, formatParagraphtml(paragraph)));
      })));
  } else {
    return (
      html('p', {}, formatParagraphtml(text)));
  }
};

const formatParagraph = (unformatted, formatted) => {
  const matches = matchAll(unformatted);
  if (matches.length > 0) {
    const targetMatch = matches.sort((a, b) => { return a['index'] - b['index']; })[0];
    const remaining = unformatted.substring(targetMatch['index'] + targetMatch[0].length, unformatted.length);
    const newElement = html(targetMatch['element'], { key: targetMatch.index }, formatParagraphtml(targetMatch[1]));
    const processed = [formatted, [unformatted.substring(0, targetMatch.index), newElement]];
    return formatParagraphtml(remaining, processed);
  } else {
    return formatted ? [formatted, unformatted] : unformatted;
  }
};

const matchAll = (input) => {
  const bold = matchElement(input, 'b', /\*\*([ a-z0-9!?.*_]+)\*\*/gmi);
  const talic = matchElement(input, 'i', /__([ a-z0-9!?.*_]+)__/gmi);
  const underLine = matchElement(input, 'u', /_\*([ a-z0-9!?.*_]+)\*_/gmi);
  return [bold, talic, underLine].filter(e => { return e != null; });
};

const matchElement = (input, element, regex) => {
  const match = regex.exec(input);
  if (match) match['element'] = element;
  return match;
};

module.exports = {
  textToHtml
};

Why use this function instead of a fancy feature packed markdown parser?

There's pros and cons to every choice. And this might not be a viable solution for your projects need. However, I have been working on large node.js projects for a long time, and something that keeps becoming more and more frequent in these mega repositories is what I call the "npm dependency hell". At my current day job we have a monthly chore called "npm outdated" this task is about updating dependencies so that our codebase doesn't end up in an un-updatable state because it is simple too far behind. Usually this task takes 1-3 full working days because several of the updates have deprecated APIs that you were using 12312 instances in your code, and will have to find a way to deal with this. Another thing is security, you don't know what security holes might be pulled when you do an update, luckily there are tool such as npm audit to help you monitor your packages for security breaches, but again this might result in more work as you suddenly have to stop using some module that you were depending on because it have not yet been patched. Small functions like the one  doesn't have those disadvantages, and further more, I think there is a certain quality aspect of software that is built of code that the developers have actually had their eyes on. Anyway, this is a subject for a blog post itself, and I will leave it up to you whether you want to use a simple function like this in your code or depending on a fancy module that offer a lot more features (also features you don't need) and that develops over time and might cause you headache and cost time in the future.

If you are gonna use it and care about code coverage...

In case you want to use this function in your project and do care about code coverage but don't bother writing tests when you can simply skip it. First one's always free for a reason.

/* eslint-env mocha */
const format = require('./formatHtml');

describe('format html from text (markdown like)', function () {
  const input = 'Hello world!\nToday is a **happy** day. __Hurra__\n_*Cheers!*_';
  const output = format.textToHtml(input);

  it('should return one div', function () {
    (typeof output).should.equal('object');
    output.should.include({ type: 'div' });
  });

  it('should create 3 paragraphs', function () {
    output.props.children.should.have.length(3);
    output.props.children[0].should.include({ type: 'p' });
    output.props.children[0].props.children.should.equal('Hello world!');
    output.props.children[1].should.include({ type: 'p' });
  });

  it('second paragraph should contain child elements', function () {
    output.props.children[1].props.children[0][0][1][1].should.include({ type: 'b' });
    output.props.children[1].props.children[0][0][1][1].props.children.should.equal('happy');
    output.props.children[1].props.children[0][1][1].should.include({ type: 'i' });
    output.props.children[1].props.children[0][1][1].props.children.should.equal('Hurra');
  });

  it('third paragraph should contain an underlined element', function () {
    output.props.children[2].props.children[0][1][1].should.include({ type: 'u' });
    output.props.children[2].props.children[0][1][1].props.children.should.equal('Cheers!');
  });
});

describe('format html from text (markdown like)', function () {
  const input = '___***WORD!***___';
  const output = format.textToHtml(input);

  it('should return nested elements', function () {
    output.props.children[0][1][1].should.contain({ type: 'i' }); // outer
    output.props.children[0][1][1].props.children[0][1][1].should.contain({ type: 'u' }); // 2nd
    output.props.children[0][1][1].props.children[0][1][1].props.children[0][1][1].should.contain({ type: 'b' }); // inner
    output.props.children[0][1][1].props.children[0][1][1].props.children[0][1][1].props.children.should.equal('WORD!');
  });
});

This test requires mocha and chai dev-dependencies.