Today when people talk about search engines, it is limited to just www search engines. But! before the Web became an indispensable part of the internet, there were search engines already intact to assist users find information on the Net. Programs like, Archie or Veronica, which assisted in indexing the files that were stored on the servers which were connected to the Internet. This drastically reduced the time consumed for finding programs and documents.
Nowadays, most Internet users limit their searches to the Google Search, so in this article we will focus on parameters or aspects that will make your brand a hero in Google Search.
Meta tags allow the owner of a page to specify keywords and concepts under which the page will be indexed. This can be helpful, especially in cases in which the words on the page might have double or triple meanings; the meta tags can guide the search engine in choosing which of the several possible meanings for these words are correct.
There is, however, a danger in over-reliance on meta tags, because a careless or unscrupulous page owner might add meta tags that fit very popular topics, but have nothing to do with the actual contents of the page. To protect against this, spiders will correlate meta tags with page content, rejecting the meta tags that don’t match the words on the page.
All of this assumes that the owner of a page actually wants it to be included in the results of a search engine’s activity.
Many times, the page’s owner doesn’t want it showing up on a major search engine, or doesn’t want the activity of a spider accessing the page. Consider, for example, a game that builds new, active pages each time sections of the page are displayed or new links are followed. If a Web spider accesses one of these pages, and begins following all of the links to new pages, the game could mistake the activity for a high-speed human player and spin out of control.
To avoid situations like this, the robot exclusion protocol was developed. This protocol, implemented in the meta-tag section at the beginning of a Web page, tells a spider to leave the page alone; to neither index the words on the page nor try to follow its links.
Once the spiders have completed the task of finding information on Web pages (and we should note that this is a task that is never actually completed — the constantly changing nature of the Web means that the spiders are always crawling), the search engine must store the information in a way that makes it useful.
There are two key components involved in making the gathered data accessible to users:
- The information stored with the data
- The method by which the information is indexed
In the simplest case, a search engine could just store the word and the URL where it was found. In reality, this would make for an engine of limited use, since there would be no way of telling whether the word was used in an important or a trivial way on the page, whether the word was used once or many times or whether the page contained links to other pages containing the word. In other words, there would be no way of building the ranking list that tries to present the most useful pages at the top of the list of search results.
Building a Search
Searching through an index involves a user building a query and submitting it through the search engine. The query can be quite simple, a single word at minimum.
Building a more complex query requires the use of Boolean operators that allow you to refine and extend the terms of the search. The Boolean operators most often seen are:
The searches defined by Boolean operators are literal searches — the engine looks for the words or phrases exactly as they are entered. This can be a problem when the entered words have multiple meanings. “Bed,” for example, can be a place to sleep, a place where flowers are planted, the storage space of a truck or a place where fish lay their eggs.
If you’re interested in only one of these meanings, you might not want to see pages featuring all of the others. You can build a literal search that tries to eliminate unwanted meanings, but it’s nice if the search engine itself can help out.
One of the areas of search engine research is concept-based searching. Some of this research involves using statistical analysis on pages containing the words or phrases you search for, in order to find other pages you might be interested in.
Obviously, the information stored about each page is greater for a concept-based search engine, and far more processing is required for each search. Still, many groups are working to improve both results and performance of this type of search engine. Others have moved on to another area of research, called natural-language queries.
To summarize this, acting smart with the search engine and its crawlers is the need of the time.