Any one using the internet has used a search engine at some point. We sometimes complain about the results we find or problems with the world wide web. After taking a look back on where the internet and search engines were forty years ago, users can see how much they take today’s technology for granted. The advancements have been amazing and are continuing every day. Hopefully, the below history and evolution of search engine methodologies will give users a little insight into how search engines developed, the current competition, and how search engines actually work.
Before looking into the history of the search engine, perhaps one should understand exactly what a search engine is. In the most basic sense, a search engine takes a keyword or phrase and searches their database of billions of websites to find relevant results. Each search engine is different based upon what spider, a software program that automatically searches the internet for updated information, it uses.
Search engines are today’s version of the card catalog. Believe it or not, the idea of search engines actually began in 1945 when Vannaver Bush published As We May Think, which implored scientists to construct a body of knowledge for the entire world to use. He believed highly that in order to use information, it must be continuously updated.
Project Xandu was created in 1960 by Ted Nelson. Nelson was also the one to coin the term hypertext in 1963. Project Xandu’s purpose was to create a computer network with with a simple user interface. Thanks to Nelson’s work, people were inspired to create the WWW, or world wide web. Even though he did not believe in the complex markup code, many concepts of today’s internet are based upon his early work.
The first search engine was not developed; however, until 1990. Alan Entage at McGill University created a list of FTP files and their addresses, which he originally wished to call “Archives”, but was called “Archie” in the end. Archie collected FTP sites not requiring usernames and passwords. This allowed users to browse one database of files instead of multiple sites. In response to Archie’s popularity, the University of Nevada System Computing Services group created Veronica which only dealt with plain text files. Jughead was another search engine created to imitate Veronica.
Before the world wide web was created, users shared files through FTP servers. This worked well when sharing files among small groups, but files became eventually became fragmented. Tim Berners-Lee actually came up with the idea of the web. In 1980, Lee proposed a project that was based on hypertext to share files among researchers at CERN, which he called Enquire. As CERN was the largest internet node in 1989, Lee took the opportunity to combine the internet with Enquire, creating the WWW. He created the first browser, WorldWideWeb, and editor, NeXTSTEP. He also created the first web server, known as httpd which stands for HyperText Transfer Protocol daemon.
As the internet developed, DNS, Domain Name Server, was created. DNS is a system of computer and internet addresses or URLs. Each address can be converted into an IP which is the numerical version of each computer’s address. This also allowed users to create links within web pages. DNS would read the address, send the request to a server, find the correct IP, and send you to the requested page.
The World Wide Wanderer, created by Matthew Gray, was the first autonomous agent on the internet. This gathered and counted not only web servers, but URLs as well. Later, this agent was called Wandex and was hard at work creating the first web database. But being one of the first of its kind, it did encounter a few problems. Specifically, Wandex would often count a website or URL multiple times, creating an increased load on servers and resources. Many webmasters are familiar with Lee’s 1994 contribution to the internet, the World Wide Web Consortium or the WC3.
Martijn Koster came to the rescue by creating ALIWEB. This was technically the first search engine as users know them today. Instead of just collecting URLs, ALIWEB required webmasters to submit a description of their site. Whereas, Wandex did not understand natural language, ALIWEB focused on it allowing users to find sites based upon a description instead of just the address.
Robots, or spiders, began to grow in popularity. These would search the web in order to keep databases up to date. Some; however, caused more problems than they were worth. Three of these robot search engines made their way to the forefront. These were JumpStation which looked at title and header information, The World Wide Web Worm which focused just on titles, and the Repository Based Software Engineering Project Spider which was the first to look at keywords.
These search engines still had many bugs to work out. Mainly, users’ searches had to be extremely specific to produce relevant results. EINet Galaxy was the first to address this problem. Later shortened just to Galaxy, this search engine allowed users to browse by category. Being based on a more human approach to searching, results came much slower.
In April of 1994, the popular search engine Yahoo! was born from the minds of David Filo and Jerry Yang. They discovered that their favorites list was receiving many hits each day due to their simple method of categorization and basic summaries of their sites.
Google has since become the champion among search engines. With their Page Rank system, pages are assigned a value based upon their relevance and importance. The Page Rank and fast searching has quickly made Google the most popular, at least until the next big thing is created.
Various search engines have been created and are still being created every day. In 1994, there was Yahoo! and Lycos as the main names. In 1995, AltaVista, Excite, and Magellan hit the web. Many users have heard of Dogpile, which was born in 1996. In 1998, Google and MSN Search came into the game. Since, smaller search engines have been created, but none have really hit as big as others mentioned above. Naturally, many of these smaller engines have since merged with larger search engines.
Search engines combining both the human based and crawler based methods are obviously the way of future search engines. By combining the benefits of both, better and better search engines are possible.
Gerald Salton is considered to be the father of modern search technology. Salton and his colleagues created SMART, Salton’s Magic Automatic Retriever of Text. SMART included concepts such as the vector space model, Term Frequency, term discrimination values, Inverse Document Frequency, and relevancy feedback mechanisms.
How do search engines actually work? They actually consist of three parts. The first are the search engine spiders. These spiders follow links on the internet to index new information and pages or to update existing information. The spider adds these to the search engine index, which is the second part of a search engine. Instead of actually searching the web when you use a search engine, you are searching the engine’s index. Think of browsing an index in a book to find information. The last part of a search engine is the search interface and relevancy software.
For every search a user performs, a search engine must check for any specially formatted syntax and misspelled terms. The search engine will then recommend other search terms in case a word is misspelled or mistyped. Search engines also compare search terms with vertical search databases which include things like news and product searches. Any relevant links are placed in with your actual search results. After finding your results, a search engine must then rank the pages according to relevance using page content, usage date, and link citation data. A list of relevant ads will also appear next to most search results.
Meta search engines combine the results of multiple search engines. As search engines have improved, meta search engines have become less popular. Search engines such as Hotbot allow users to choose among which search engines are used in their search. This allows the meta search engine to use only search engine at a time. Dogpile is currently the most widely used meta search engine.
Google, Yahoo, and MSN are the main search engine competitors today. All three have question answering, news and video services. Specialty search engines such as YouTube demonstrate a vertical search as do the specific areas of the three main search engines.
Search engine optimization is a term most web users are familiar with. Basically, SEO is the way of publishing information in a format that will make it appear more relevant to search engine queries. The early version of SEO used descriptive file names, page titles, and meta descriptions. Google has come a long way in revolutionizing search engine optimization with their Page Rank system. They have also made other changes to present users with a much more relevant set of results, but as of yet, they have not released much information into their secrets.
Web users over the years have undoubtedly noticed the changes in search engines over the last three decades. Without the strides made by people like Tim Berners-Lee, Matthew Gray, and countless others, the internet would still be a simple list of FTP sites that users would have to manually browse. Search engines have come a long way and they are still evolving every day.