SNOBOL Text Processing and Programming Language

SNOBOL — StriNg Oriented and symBOlic Language — is a family of programming languages originally developed in the mid 1960s, primarily for the purpose of text processing and string analysis.

A Quick Note About Versions and Implementations

The last stable release of SNOBOL by the original developers was SNOBOL4, in 1967. You'll see both books and websites use both "SNOBOL" and "SNOBOL4" (and sometimes "Snobol"). On anything after 1967, these all refer to the same (final) version of the language.

There were also a handful of extensions and implementations. Snocone is a language preprocessor that provides syntactic sugar to the language, making it easier to use. SPITBOL is a compiler for SNOBOL; this is of particular interest because it was originally thought that SNOBOL was uncompilable. There is also the Snowball programming language, which was inspired by and named after SNOBOL.

Because of these and other extensions, some people use the phrase "Vanilla SNOBOL" when referring to code which only implements the original SNOBOL4 specification, and not any additional features.

About the Language

SNOBOL was created specifically for text and string manipulation. Because of this, it has a relatively unique feature: patterns are considered first-class data types. This allows patterns themselves to be manipulated, just like any other data structure. Additionally, strings can be treated as code and evaluated. This allows for recursive use of patterns and highly complex string processing and analysis. A SNOBOL program can even change its own source code.

Patterns in SNOBOL can be simple, like short spans of text or regex-like character-type strings. But they can also be exceedingly complex, like a complete formal description of the grammar of a language. Programming language interpreters can be written in SNOBOL, as well as natural language grammar analysis, spell check, and (in theory) translation engines.

SNOBOL was very popular in Computer Science academia in the 1960s and 70s, and was used extensively in the humanities through the 1980s. It has largely fallen out of use at this point, in favor of less powerful Regular Expression programming using languages like Awk and Perl. There are still a handful of loyal SNOBOL developers out there, and the language has the potential to be just as useful as ever.

Online Tutorials

Tools

Community and Ongoing Learning

  • Yahoo Email Group, for SNOBOL developers and people working with similar text-processing technology;
  • SNOBOL4.com, a website about the language from a company founded by Mark Emmer, writer of several books and tutorials on the language;
  • The SNOBOL listserve.

Books about SNOBOL

Should I learn SNOBOL?

SNOBOL is not a terribly popular language, and there aren't a lot of employers looking for SNOBOL developers. So, from a career advancement standpoint, you are better off focusing on more in-demand languages.

However, if you are interested in text-centric computing (search, translation, natural-language processing, literary analysis ) you might want to spend some time with SNOBOL: especially if you've already pushed the boundaries of what can be accomplished with regular expressions.

Other Text Tools

If you're interested in SNOBOL, you'll want to check out some of these other tools for processing and analyzing text.

  • Natural Language Toolkit, a Python platform for working with human language data;
  • Stanford CoreNLP, a suite of Java-based tools for natural language analysis;
  • Awk, a scripting language designed specifically for text processing;
  • Perl, another scripting language, widely considered to have the best regular expression implementation available;
  • ANTLER is ANother Tool for Language Recognition, and can be used for parsing both natural and artificial (computer) languages;
  • Apache OpenNLP, a machine learning toolkit for natural language processing;
  • Apache Lucene, a suite of search software tools in Java and Python;
  • GATE, General Architecture for Text Engineering, a framework for "solving almost any text processing problem;"
  • Prolog, a logic programing language invented for natural language processing;
  • Icon, another text-processing language created by Ralph Griswold after his work on SNOBOL.

You might also want to read Taming Text: How to Find, Organize, and Manipulate It, by Ingersoll, Morton, and Farris. The book provides a great overview of text processing, with examples using several of the software tools listed above.

Finally, check out TAPoR3, a website and online community dedicated to tools for analyzing text.


Further Reading and Resources

We have more guides, tutorials, and infographics related to coding and development:

  • Perl Guide and Resources: this is an excellent guide to getting started with this powerful scripting language.
  • Awk Resources: learn this powerful scripting language available on most computers.
  • Prolog Resources: this will get you started with this iconic logic programming language.

Natural Language Processing Come to Life!

The science of natural language processing has come a long way since the days of SNOBOL. Find out all about it in our infographic, How to Avoid Falling in Love with a Chatbot. It covers the long history of "thinking" computers — and might even save you from a broken heart!