The aim of this project is to develop a simple web-based search engine that demonstrates the main features of a search engine (Web Crawling, Indexing, Query Processing and Ranking) and the interaction between them.
Web Crawler: It is a software agent that collects documents from the web. The crawler starts with a list of URL addresses. It downloads the documents identified by these URLs and extracts hyper-links from them. The hyper-link URLs are added to the list of URLs to be downloaded. Thus, web crawling is a recursive process.
Indexer: The output of web crawling process is a set of downloaded HTML documents. To respond to user queries fast enough, the contents of these documents have to be indexed in a data structure that stores the words contained in each document and their importance.
Query Processor: This module receives search queries, performs necessary pre-processing and queries the index for relevant documents.
Phrase Searching: Search engines will generally search for words as phrases when quotation marks are placed around the phrase.
Ranker: The ranker module sorts documents based on their popularity and relevance to the search query.
Java EE, JSP & MySQL Server
I was in a team of four members and my role was: