Why we chose Chromium to generate our PDF documents

When doing the complete rebuild of our products, we had to rethink the way we generate our PDF documents.

To ease the selling process of sales representatives, our business needed clear and great looking estimations and quotes.

We had 2 needs related to the generation of PDF documents :

  1. Great looking docs
  2. Be able to have an HTML version for the web

And as we had a great internal expertise of web technologies such as HTML and CSS, we decided to render HTML code and to convert it to a PDF.

So we went for mainstream technologies

The current backend stack we had was almost completely done with Java and Spring. So after some research we made our choices. To generate the HTML, the most reliable technology seemed to be Thymeleaf. That library has proven itself to be powerful, complete, easy to use and greatly documented. It filled the job very well. To convert this HTML to a PDF file, we went with Flying Saucer. According to some articles, it matched well with Thymeleaf and it also had great community support.

Architecture of the solution using Flying Saucer Architecture of the solution using Flying Saucer

Unfortunately the result was not so satisfying.

We quickly found out Flying saucer had limitations

That solution hold us back in a lot of ways :

  • It did not manage SVG assets (and currently still does not)
  • It did not handle useful CSS3 properties such as transform, flexbox, grid, box shadows and more… (and currently still does not)
  • The documentation was greatly limited
  • The PDF version was most of the time far from the HTML rendering

As it became obvious that we had to find a new way to generate PDF documents, we started to think of another way to convert HTML to PDF.

We decided to POC a new HTML to PDF converter

The first thought that came to our mind was to try another HTML to PDF converter, usable directly in our backend Java server. But those libraries always had flaws we did not want (such as bad handling of CSS3 properties, or had paid licences).

Then after some time researching for solutions, we realised browsers had great engines to convert HTML pages to PDF documents, mainly when using the print function. So we started discussing about using Google Chrome as our conversion engine through its community version : Chromium. We created a new NodeJS micro service using chromium to convert HTML code to PDF. The code to handle that was stupid simple using less than 25 lines :

app.route('/html-to-pdf').post(express.text(), (req, res) => {
  const html = req.body;
  const filename = Math.random().toString(36).substring(7);
  const filepath = `${__dirname}/${filename}`;
  const htmlfilepath = filepath + '.html';
  const pdffilepath = filepath + '.pdf';
  fs.writeFile(htmlfilepath, html, () => {
    childProc.exec(
      `chromium-browser --headless --use-gl=swiftshader --disable-software-rasterizer --disable-dev-shm-usage --disable-gpu --hide-scrollbars --no-sandbox --print-to-pdf="${pdffilepath}" ${htmlfilepath}`,
      (error, stdout, stderr) => {
        if (error) {
          return res.status(500).send('Chrome command error: ' + error);
        }
        res.sendFile(pdffilepath, () => {
          fs.unlink(pdffilepath, () => {});
          fs.unlink(htmlfilepath, () => {});
        });
      }
    );
  });
});

However, we faced some difficulties with the architecture.

Architecture of the solution using Chromium microservice Architecture of the solution using Chromium microservice

The main problem was that when developing locally, the HTML file would contain links which points to localhost addresses leading to static files such as CSS or images. When using the remote hosted chromium service, it could not retrieve those static files because it did not had it in its file system. So the rendering would not be complete because the CSS was not read properly and the images where not included.

After exploring different solutions we decided not to over engineer the solution just to manage that local problem.

So to develop a new PDF document on our own machines and quickly see the rendering, we could either use the Print method on our Chrome browser on the rendered HTML file, or pull the latest version of the service and run it on our computer so it could access the static files on the file system.

The ROI was very substantial

We observed faster development time of documents using the Chrome print feature to quickly see the generation results. Usage of SVG files for icons in the document simplified development as we did not need to convert all SVG assets to the rightly sized PNG images. Usage of latest CSS3 made some design implementations possible and accelerated development. We were relieved to see a closer match between the web version and the PDF version.

However there is still room for improvement

Even though Chrome is a way to convert a HTML page to a printable PDF, it does not have all the features you could need from a PDF creation service (such as page layout, header and footers). Whereas flying saucer implemented custom CSS properties and selectors to fill those needs.

Next steps

As of now, it is still too soon for us to decide if we should stick with Flying saucer, migrate fully to the Chrome solution or explore another path.

If we stick with the Chrome solution we could use Puppeteer to manipulate Chrome and make it do the PDF instead of using a system command.