The Journey to the Perfect Search Engine

Go directly to the Not as SMART as Google: The Journey to the Perfect Search Engine Infographic!It’s interesting. There was a time — about two decades ago now — when most of us didn’t know about the Internet.

Sure, some of us had been exposed to Lycos and AltaVista, but the Internet and, specifically, search engines as they stand now? No way. If we wanted to contact someone, we looked them up in the phone book. If we wanted to research the history of the Byzantine Empire, we went to the library. There was no immediate gratification of going to Google and searching for anything and everything.

These days, search engines are so entrenched in our everyday lives that it’s not something we even really think about anymore. We simply open our browser window (or mobile device), type in what we’re looking for, and a second later we have an answer or solution.

The only time people really pay attention to this idea of a “search engine” anymore is when Google releases a new algorithm update. Those updates are typically a sign to businesses and marketers that it’s time to update their website in order to remain in Google’s good graces for ranking purposes. Consequently, by obeying the laws of search engines, businesses are also looking out for their visitors’ best interests.

At the end of the day, that’s what the search engines ultimately care about: internet users. Helping connect them to the information they need and keeping them safe while they do it.

But that’s not what the Internet or search engines initially looked like — or aimed to do —
back in the 1960s when this all started. If you’re not familiar with the history of search engines, buckle up. We’re going to take you on a long and windy road through all the search engines that paved the way to Google.

Not as SMART as Google: The Journey to the Perfect Search Engine

Not as SMART as Google: The Journey to the Perfect Search Engine

Nearly everyone with an internet connection has used a search engine before, but have you ever wondered how the search engine came about? Who invented it? What was the first one? Then follow along, because we have the whole story.

The Predecessor

  • Name:
    • SMART Information Retrieval System
      • SMART is an acronym for Salton’s Magical Automatic Retriever of Text
  • Released:
    • 1960s
  • Use:
    • Interactive information database
  • Developed by:
    • Gerard Salton
    • Chris Buckley
    • Others at Cornell University
  • Innovations:
    • Vector space model:
      • A method of automatically weighting search results to display those most relevant to the user
    • Relevance feedback:
      • A way for users to rank the relevance of search results
    • Rocchio classification:
      • A type of classification method that increases search precision

In the Beginning

  • Before the advent of the World Wide Web and search engines, the internet was:
    • A collection of File Transfer Protocol (FTP) sites where users could navigate to find specific shared files
    • Newsgroups where people interacted and distributed information like FAQs.
  • The First One
    • Name:
      • Archie
    • Released:
      • 1990
    • Use:
      • FTP archive index
    • Developed by:
      • Alan Emtage
      • Peter Deutsch
      • Bill Heelan
    • Innovations:
      • Allowed remote users to look through lists of anonymous FTP sites containing:
        • Software
        • FAQs
        • News archives
      • Periodically and automatically updated list of available servers
    • Other Info:
      • In September 1990, Archie had compiled a database of 210 sites
      • Archie contained 2.6 million files (roughly 150 GB of data) in 1992
  • Archie’s Friends
    • Name:
      • Veronica
      • Jughead
    • Released:
      • 1993
    • Use:
      • Gopher index
    • Developed by:
      • Fred Barrie and Steven Foster
      • Rhett Jones
    • Innovations:
      • Both engines were developed to make searching through Gopher servers easier
      • Veronica created indexes of Gopher plain-text files
      • Jughead could use Boolean search terms to look through a single Gopher server
    • Other Info:
      • Even though Archie was not meant to refer to the comic book character, the people who wrote the subsequent Veronica and Jughead thought it would be engaging to continue the theme
      • Backronyms were created for both:
        • Veronica: “Very Easy Rodent-Oriented Netwide Index to Computerized Archives”
        • Jughead: “Jonzy’s Universal Gopher Hierarchy Excavation and Display”
      • Veronica searched through 5,500 Gopher servers and indexed over 10 million items/documents
  • Invention of the web
    • In 1989, Tim Berners-Lee and his team at CERN started work on HTTP:
      • A system for sending and receiving hypertext documents that would link to one another in a kind of web.
      • It was quickly augmented to provide:
        • Greater client-server negotiation
        • Metadata
        • Security
    • Early browsers included:
      • ViolaWWW
      • Erwise
      • MidasWWW
      • Mosaic
    • As the number of web servers grew, the web became the interface for accessing the Internet:
      • New servers were announced under “What’s New” on the NCSA site
        • Many websites provided their own list of “interesting sites.”
      • This central list could not keep up with the growth, which created a need for finding and organizing all the information on the web

No Robot Necessary

  • Name:
    • ALIWEB, which stands for Archie-Like Indexing for the WEB
  • Released:
    • October 1993
  • Format:
    • Self-entry website index
  • Developed by:
    • Martijn Koster
  • Innovations:
    • HTTP equivalent of Archie
    • Didn’t use a web-searching robot
      • Webmasters of participating sites had to post their own index information for each page they want to list
        • Advantages:
          • Users could describe their own sites
          • A robot didn’t run around eating up Net bandwidth
        • Disadvantages:
          • Indexing files was complicated for most people
          • The difficulty of use meant a relatively small database
  • Other Info:
    • They tried to offset the complexity by adding other databases into ALIWEB searches, but it couldn’t compete with the newer bot-based search engines

The Indexer

  • Name:
    • WebCrawler
  • Released:
    • 1994
  • Use:
    • Crawling website index
  • Developed by:
    • Brian Pinkerton
  • Innovations:
    • First crawler to index entire web pages, rather than just file or website names
  • Other Info:
    • When first released, WebCrawler had documents from over 6,000 servers
    • Five months after its release, it received an average of 15,000 queries per day
    • WebCrawler quickly grew so popular that it was almost unusable during the day

The Library Spider

  • Name:
    • Lycos
  • Released:
    • 1994
  • Use:
    • Website index
  • Developed by:
    • Michael Mauldin
  • Innovations:
    • The largest library of indexed sites at the time
  • Other Info
    • Lycos is named after the wolf spider, Lycosidae Lycosa, because the spider hunts its prey rather than catching it in a web
    • On going public, Lycos had 54,000 documents available
    • It identified nearly 400,000 documents in one month
    • In five months, Lycos had identified 1.1 million documents
    • By November 1996, its catalog contained 60 million documents

The Word Smiths

  • Name:
    • Excite, originally named Architext
  • Released:
    • 1995
  • Use:
    • Website word index
  • Developed by six Stanford students:
    • Joe Kraus
    • Ben Lutch
    • Ryan McIntyre
    • Martin Reinfried
    • Graham Spencer
    • Mark Van Haren
  • Innovations:
    • Made search more relevant by using the idea of looking at word relationships through statistical analysis, a groundbreaking approach at the time
    • Upon launch, Excite.com had indexed 1.5 million pages, a large number for that time
  • Other Info:
    • Excite had signed major deals with Netscape and Microsoft
    • Excite continued to grow with revenues in excess of $150 million as of 1998
    • Two fellow Stanford students, Larry Page and Sergey Brin, founded of Google, and offered to sell their company to Excite for a million dollars in 1999
      • They were willing to settle for just $750,000
      • Excite declined what would become the largest search engine in history — a $180 billion dollar company
    • Yahoo wanted to buy Excite, but was turned down
    • Excite merged with @Home Network in 1999, and they went bankrupt in 2001

The Proto-Google

  • Name:
    • AltaVista
  • Released:
    • 1995
  • Use:
    • Full-text website index
  • Developed by:
    • Louis Monier
    • Michael Burrows
  • Innovations:
    • Considered by History of SEO as “the first searchable full-text database on the world wide web with a simple interface”
    • First search engine to look for:
      • Images
      • Audio
      • Video
    • Created Babel Fish, the first multi-lingual search, which could translate:
      • English
      • French
      • German
      • Italian
      • Portuguese
      • Spanish
      • Russian
  • Other Info:
    • AltaVista means “view from above”
    • In 1996, AltaVista was the largest web index
      • 33GB in size
      • 30 million pages from 225,000 servers
      • Accessed an average of 12 million times per day
        • That’s roughly 140 times per second

The Web Butler

  • Name:
    • Ask Jeeves
  • Released:
    • 1997
  • Use:
    • Natural language website index
  • Developed by:
    • Garrett Gruener
    • David Warthen
  • Innovations:
    • Developed to be a natural language search engine
    • Human editors assisted with some common search queries
  • Other Info:
    • The butler is a reference to Jeeves the valet from P.G. Wodehouse’s Jeeves-Wooster novels
    • In 2010, Ask Jeeves rebranded itself as a Community Question & Answer service

The Champion

  • Name:
    • Google
  • Released
    • 1998
  • Use:
    • Recursive website index
  • Developed by:
    • Larry Page
    • Sergey Brin
  • Innovations:
    • PageRank created a citation weighting system that:
      • Evaluated which websites were more trustworthy based on the strength of other websites that linked to them
      • Today, this is the basis for almost all search engines
  • Other Info:
    • “Page” in PageRank refers to Larry Page, not web pages.
    • Due to its focus on backlinks, Google was originally named “BackRub”
    • The first website the Google crawler searched was the Stanford University homepage
    • Google’s index is over 100 million GB in size
    • People use Google to perform over one hundred billion searches every month
      • That’s over 40,000 searches per second
    • Google Now:
      • Uses a natural language user interface to :
        • Answer questions
        • Make recommendations
        • Perform actions by delegating requests to a set of web services
      • Is an intelligent personal assistant, accessible:
        • Within the Google mobile search app
        • On the Google Chrome web browser
      • Can proactively deliver information it predicts based on user’s search habits
      • Allows people to use Now cards to get the right information at the right time without having to search for it
        • It automatically organizes information into simple cards that appear just when users need them
        • Users get commute traffic before work, find popular places nearby, get their favorite team’s current score

The Little Engines That Could

Google doesn’t have many competitors nowadays, but here are two that are trying their best, despite the huge odds:

  • Duck Duck Go
    • Claims to remove all the spam that Google delivers in its results
    • Has a clean interface
    • Doesn’t track users
    • Has far fewer ads than Google
  • Bing
    • Microsoft’s search engine
    • Provides similar results to Google
    • Has a much smaller database of webpages
    • Yahoo! uses Bing for its search engine

While most people think “Google” when they hear “search engine,” there were several different engines before Page and Brin’s web crawler took off. While not many people use Veronica or Lycos today, the internet wouldn’t be what it is without them.

Sources: searchenginehistory.com, sigir.org, csse.monash.edu.au, nlp.stanford.edu, seobythesea.com, groups.google.com, savetz.com, dummies.com, searchenginearchive.com, netlingo.com, searchnetworking.techtarget.com, whatis.techtarget.com, salientmarketing.com, learnthenet.com, ryanmacintyre.com, searchenginepeople.com, todayifoundout.com, thehistoryofseo.com, wiley.com, dictionary.reference.com, mashable.com, archive.wired.com, google.com

Sources

Download this infographic.

Embed Our Infographic On Your Site!

The Journey to the Perfect Search Engine by
Twitter Facebook

Discussion

What Do You Think?

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>