Friday, February 29, 2008

Challenges for Search Engine

As the technology and the circle of web network is growing the challenges for searh engine are also growing fastly. The Web search queries one can make are currently limited to searching for keywords, which may result in many Type I and type II error positives, especially using the default whole-page search.

Better results might be achieved by using a proximity search option with a search-bracket to limit matches within a paragraph or phrase, rather than matching random words scattered across large pages. Another alternative is using human operators to do the researching for 'organic' search engine users
The Web is growing much faster than any present-technology search engine can possibly index .A web page must be reindexed each time it is changed. Dynamically generated sites may be slow or difficult to index, or may result in excessive results, perhaps generating 500 times more web pages than average.

Many dynamically generated websites are not indexable by search engines; this phenomenon is known as the invisible web. Some search engines specialize in crawling dynamic content on the invisible web that is password protected or requires forms to be filled out.

To be higher in the list of search results websites uses tricks for number of keywords. It can cause polluted results by search engine which contain little or no information matching to the keywords. And it will lead the required and relevant pages pushed down list

Working Of Search Engine

A search engine performs in the following sequence
Web crawling
Indexing
Searching

Web search engines work by storing information about a large number of web pages, which they retrieve from the WWW itself. These pages are retrieved by a Web crawler (sometimes also known as a spider) — an automated Web browser which follows every link it sees.
The contents of each page are then analyzed to determine how it should be indexed (for example, words are extracted from the titles, headings, or special fields called meta tags). Data about web pages are stored in an index database for use in later queries.

Google, store all or part of the source page (referred to as a cache) as well as information about the web pages, whereas others, such as AltaVista, store every word of every page they find. This cached page always holds the actual search text since it is the one that was actually indexed, so it can be very useful when the content of the current page has been updated and the search terms are no longer in it.

When a user enters a query into a search engine (typically by using key words), the engine examines index and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the text. Most search engines support the use of boolean opraters AND, OR and NOT to further specify the search query. Some search engines provide an advanced feature called which allows users to define the distance between keywords.

There may be many webpages as the result of a search for a particular word or phrase. some are more popular, authoritative and relevant than others.Most search engines employ methods to rank the results to provide the "best" results first. How a search engine decides which pages are the best matches, and what order the results should be shown in, varies widely from one engine to another. The methods also change over time as Internet usage changes and new techniques evolve.

Search Engine

A tool is designed to search information on internet is known as search engine. User can search web pages, impages and other types of files. Search engine works as a mixture of algorithmic and human input. The first tool used for searching on the internet was Archie. The program downloaded the directory listings of all the files located on public anonymous FTP (File Transfer Protocol) sites, creating a database of file names; however, rchive did not index the contents of these files.

Gopher led to two new search programs, Veronica and Jughead. Like Archie, they searched the file names and titles stored in Gopher index systems. Veronica (Very Easy Rodent-Oriented Net-wide Index to Computerized Archives) provided a keyword search of most Gopher menu titles in the entire Gopher listings. Jughead (Jonzy's Universal Gopher Hierarchy Excavation And Display) was a tool for obtaining menu information from specific Gopher servers. While the name of the search engine "Archie" was not a reference to the Archie comic book series, "Veronica"' and "Jughead" are characters in the series, thus referencing their predecessor.

Wandex was the first web search engine. Another very early search engine, Aliweb, also appeared in 1993, and still runs today.JumpStatioin (released in early 1994) used a crawler to find web pages for searching, but search was limited to the title of web pages only. One of the first "full text" crawler-based search engines was WebCrawler, which came out in 1994. Unlike its predecessors, it let users search for any word in any webpage, which became the standard for all major search engines since. It was also the first one to be widely known by the public.

Google search engine rose to prominence. The company achieved better results for many searches with an innovation called PageRank. This iterative algorithm ranks web pages based on the number and PageRank of other web sites and pages that link there, on the premise that good or desirable pages are linked to more than others. Google also maintained a minimalist interface to its search engine. In contrast, many of its competitors embedded a search engine in a web portal .

As of late 2007, Google was by far the most popular Web search engine worldwide. A number of country-specific search engine companies have become prominent.

SEO Techniques

SEO techniques are classified by some into two broad categories: techniques that search engines recommend as part of good design, and those techniques that search engines do not approve of and attempt to minimize the effect of, referred to as spamdexing. Some industry commentators classify these methods, and the practitioners who employ them, as either white hat SEO, or black hat SEO.
White hats tend to produce results that last a long time, whereas black hats anticipate that their sites may eventually be banned either temporarily or permanently once the search engines discover what they are doing.An SEO technique is considered white hat if it conforms to the search engines' guidelines and involves no deception. As the search engine guidelines are not written as a series of rules or commandments, this is an important distinction to note.
White hat SEO is not just about following guidelines, but is about ensuring that the content a search engine indexes and subsequently ranks is the same content a user will see.White hat advice is generally summed up as creating content for users, not for search engines, and then making that content easily accessible to the spiders, rather than attempting to trick the algorithm from its intended purpose. White hat SEO is in many ways similar to web development that promotes accessibility, although the two are not identical.Blackhat SEO attempts to improve rankings in ways that are disapproved of by the search engines, or involve deception.
One black hat technique uses text that is hidden, either as text colored similar to the background, in an invisible or positioned off screen. Another method gives a different page depending on whether the page is being requested by a human visitor or a search engine, a technique known as cloaking.
Search engines may penalize sites they discover using black hat methods, either by reducing their rankings or eliminating their listings from their databases altogether. Such penalties can be applied either automatically by the search engines' algorithms, or by a manual site review.

SEO - A Marketing Strategy

Recent analyses have shown that searchers scan a search results page from top to bottom looking for a relevant result. Placement at or near the top of the rankings therefore increases the number of searchers who will visit a site. However, more search engine referrals does not guarantee more sales. SEO is not necessarily an appropriate strategy for every website, and other Internet marketing strategies can be much more effective, depending on the site operator's goals.A successful Internet marketing campaign may drive organic search results to pages, but it also may involve the use of paid advertising on search engines and other pages, building high quality web pages to engage and persuade, addressing technical issues that may keep search engines from crawling and indexing those sites, setting up analytics programs to enable site owners to measure their successes, and improving a site's conversion rate.
SEO may generate a return on investment. However, search engines are not paid for organic search traffic, their algorithms change, and there are no guarantees of continued referrals. Due to this lack of guarantees and certainty, a business that relies heavily on search engine traffic can suffer major losses if the search engines stop sending visitors. It is considered wise business practice for website operators to liberate themselves from dependence on search engine traffic.

Search Engine Optimization

Understanding of Search EngineOptimization (SEO).
Search Engine Optimization is a process of influencing search engine results and bring targeted traffic to your web site. Search Engine Optimization is a tool to increase the size of traffic to a web site from earch engine via search results for targeted keyworda. Usually, the earlier a site is presented in the search results, or the higher it "ranks", the more searchers will visit that site. SEO can also target different kinds of search, including image search, local search, and industry-specifivertical search engines.
As a marketing strategy for increasing a site's relevance, SEO considers how search algorithms work and what people search for. SEO efforts may involve a site's coding, presentation, and structure, as well as fixing problems that could prevent search engine indexing programs from fully spidering a site. Other, more noticeable efforts may include adding content to a site, ensuring that content is easily indexed by search engine robots, and making the site more appealing to users. Another class of techniques, known as black hat SEO spamdexing, use methods such as link farms and keyword stuffing that tend to harm search engine user experience. Search engines look for sites that employ these techniques and may remove them from their indexes.
The SEO can also refer to search engine optimizers, who carry out optimization projects on behalf of clients, and by employees who perform SEO services in-house. Search engine optimizers may offer SEO as a stand-alone service or as a part of a broader marketing campaign. Because effective SEO may require changes to the html source code of a site, SEO tactics may be incorporated into web site development and design.