rfcdiff – Dependency-Free, Bash Shell Script for Creating a Generic, Side-by-Side Diff in HTML

You’ve Been Here Before

Ask yourself these questions …

  • Have you ever had to share a “compare” (of just about anything) with non-developers, especially non-technical people, especially from UNIX/Linux?
  • Do they not know the first thing about ‘diff’?  Never heard of solutions like ViM or any IDE with such?  They really just want something colorful?
  • Do they not know how to much much other than MS Office or an HTML attachment, whatever they can click in Outlook or Sharepoint or some CMS system?

You know who you have, and with more frequency than desired.  And every time you run into this, you are often stuck in a situation with a Windows user, maybe a Mac user, usually one that doesn’t have anything remotely related to an Integrated Development Environment (IDE) like Visual Studio, a simple editing system like Vi Improved (ViM – I personally and professionally cannot live without gvimdiff) or Notepad++ (don’t laugh, it often comes up as the #1 in most Windows developer surveys, no joke) or anything that gives them much insight into editing files and tagging, highlighting and diving into differences.

For those of us who maintain developer environments and lifecycle, we’ve long had lots of Web front-ends.  Trac Integrated SCM & Project Management is the age-old, Python open source classic system that lets non-developer and even non-technical managers look at project timelines, and even into differences in commits, directory trees and code.  In the age of Agile, DevOps and other buzzwords that are actually little changed (at least from the code lifecycle standpoint), there are the Atlassian Stack and its Products,  which use an overlap of many components with various systems, or plug-ins that tie-in.

In most of these cases, the managers have a Wiki front-end, like Trac or Confluence, to tie into various mark-up, graphs and other details.  People can go into as much geekdom, or not, as desired.

Developers! Developers! Developers!

But what happens when …

  • You just need to mark up some general text, like showing differences between files, directory tress and other things not under version control or other management?
  • You don’t have a stack you can use or, better yet, someone that wants to view something doesn’t have a login to that lifecycle system to view?
  • The person you’re talking to doesn’t understand the first thing about these details?  Why they even need to use ‘that developer software’?

A lot of us infrastructure architects and DevOps gurus have been developers at one point, or had to fill in on a project as one, or just had to deal with the build, target or other management, because the developers weren’t familiar with aspects outside of their IDE, or they can’t go below 4G or for whatever reason.  So we often get into the guts and start hacking up some code ourselves.

For syntax highlighting, we’ll start looking at things like pushing pre-processed syntax through GNU Enscript or, in the ’10s, Python’s Pygments or similar tools.  And soon we’ll dive into all sorts of dependencies, and building those and setting up all sorts of trees and, in general, have a lot of fun just pushing through a lot of flexible, modular code to do all sorts of cool things that is overkill.  But we do it, to re-familiarize ourselves, to update ourselves, and to find out what various, cool utilities are now out there that use these functions.

So I literally spent some time looking at this the weekend, and getting it setup on my Fedora system, getting it working … all to come into work, and realize I’m totally missing a lot on 2014-era Red Hat Enterprise Linux (RHEL) release 7.  Even adding in the Fedora Project’s Extra Packages for Enterprise Linux (EPEL) repository, still missing a lot, or on older versions of software like Pygments 1.6 (instead of 2.2), etc…  I did a lot of Upstream overkill at home, because I wanted to see it all piecemeal, but really for no capability on an Enterprise platform (and if you’re a developer, please don’t scoff at RHEL … some of us have to maintain a platform for your code 7-10 years after you’ve moved on), hence why we have these ‘canned’ systems like Trac and Atlassian and others that package it all for us.

And in many cases, uploading internal files and/or information up into the Internet to use the variety of tools is not going to be allowed.

Simple Shell Scripts Still Rule

I literally smacked my forehead on Monday around lunchtime when I just started throwing off as many Google searches for as many things as I could find, and came across a GNU Bourne Again Shell (bash) Script I wish I knew about years ago … the Internet Engineering Task Force’s (IETF’s) rfcdiff.   Yes, there is the on-line version, but they offer a completely off-line, standalone script (direct link to the latest) with virtually zero (0) dependencies! (it’ll probably ‘just work” on most UNIX flavors with Bash too).

No need to grab the full tarball, although if you redistribute it, yes, grab it all like for 1.45 for obvious attribution, license, legal and other reasons.

I am extremely impressed just how much the tool does on its own, including increasing width as required (within reason).  It’s actually been around since 2003 (changelog), and always been designed to produce side-by-side diffs in an HTML format.  I just never knew of it before today.  For those of you that knew about it, or something like it, more power to you.  But I didn’t.  I literally didn’t.  Why?  Probably because I’m always thinking like a developer.  Sometimes the dead simple script, one that has no (or nothing beyond base OS) dependencies, just rules.

For those that know their Internet history, the RFC in rfcdiff literally means what you think it means, ye’olde IETF Request for Comments (RFC), the original means of creating Internet standards before the STD and other processes, although the RFC process still continues.  The IETF wanted a tool to do word comparisons in various standards, and in a side-by-side format that could be viewed by anyone, people without markup or other structured content knowledge, and didn’t require any other tools.

As such, rfcdiff is built around GNU wdiff, a front-end around diff that focuses on word differences.  Newer versions can use hwdiff, all while falling back to ye’olde standard diff that differs little from the original ’74 UNIX V (not System V, but plain, original UNIX 5) lineage.  It’s amazing just how much this little tool will take two (2) files, and output all that simple, contextual diff goodness, into a side-by-side output with the annotations instead of that ugly geek speak.  Just look at rfcdiff’s own diff of its latest version (1.45) from the prior, to get an idea of the simplistic, but effective, capability this shell script offers.

Real World Example: Directory Tree Comparison

Windows-only developers really don’t like it when they don’t have a CIFS/SMB share (I might settle for HTTP-WebDAV using Kerberos ticketing authorization) to run all sorts of load-inducing, security-defying searches and IDE tools across, especially in this day’n age where even Microsoft’s core infrastructure is moving to Linux, and they now have to deal with the lack of direct access and “Where’s my P: (project) drive?”  I’ll put aside the latest Ransomware realities of what CIFS/SMB brings to the world (and we also really want to avoid NFS, even NFS4 w/o sec=krb5p via GSSAPI, for a similar reason), and why “like a local file system” is never an ideal solution.

For non-developers, and the even less technical, or at least less POSIX (UNIX/Linux) familiar, I still try to feel their pain of not having a local mounted/mapped drive, that I didn’t give them for good reason.

In the same, caring regard, I often do run one-time directory listings for them, using find to output sizes and/or dates with the full paths in, say, a project tree.  I provide these to users so they themselves are not firing off lots of finds (once they learn of the command — followed by rsync, which changes everything for the worse), possibly across all sorts of systems (let alone NFS mounts but, again, we won’t talk about those).  For example, here’s a typical run across two systems.  For simplicity’s sake (to not detract from the focus of the post), I’ll just get size and path (no date).

$ D=$(date +%Y%b%d)             # Date
$ Pshort="Dir"                  # Path short (e.g., final subdirectory)
$ Plong="/path/Dir"             # Path long (NOTE: yes, I could use sed to convert slashes into underbars for Pshort or a Pstr)
$ S1="server1 server2"          # Server list

$ for s in ${S} ; do
> ssh $s "find ${Plong} -mount -printf "%14s\t%p\n" > /tmp/find-${s}-${Pshort}-${D}.txt
> done

The resulting files will be something like find_server1_Dir_YYYYmmmdd.txt and find_server2_Dir_YYYYmmmdd.txt.

I usually sort by each column into their own files, for example …

  • Number for column 1 (size)
  • Alphabetical for column 2 (path)
$ Fsize=""
$ Fpath=""
$ for s in ${S} ; do
> f=find-${s}-${Pshort}-${D}
> Fsize="${Fsize} /tmp/${f}-bysize.txt"
> sort -n < /tmp/${f}.txt > /tmp/${f}-bysize.txt"
> Fpath="${Fpath} /tmp/${f}-bypath.txt"
> sort -k 2,2 < /tmp/${f}.txt > /tmp/${f}-bypath.txt
> done

The resulting files will be something like find_server1_Dir_YYYYmmmdd-bysize.txt, find_server1_Dir_YYYYmmmdd-bypath.txt, find_server2_Dir_YYYYmmmdd-bysize.txt and find_server2_Dir_YYYYmmmdd-bypath.txt.

But since we’re going to give the user a HTML file with colorful diff thanx to rfcdiff, usually just the path sort works on its own (as it color highlights any size differences for any path).

$ Dopts=”--html --stdout”

$ rfcdiff ${Dopts} ${Fpath} > diff-find-${Pshort}-${D}-bypath.html

Now I have a simple, single HTML file, diff-find-Dir-YYYYmmmdd-bypath.html to send my users, one with a nice, side-by-side comparison, that they can just open from their collaboration/e-mail client, with no Javascript or anything but simple HTML.  I’ve dropped rcsdiff into my ~/bin/, or on systems where I have root, /usr/local/bin.

Leave a Reply

Your email address will not be published.