Search Engine | Projects

Search Engine
Score  1st Place | (Distinction+) "114%"
Course  Advanced Programming Techniques
Year  2012
Description  


Description
The aim of this project is to develop a simple web-based search engine that demonstrates the main features of a search engine (Web Crawling, Indexing, Query Processing and Ranking) and the interaction between them.

Web Crawler: It is a software agent that collects documents from the web. The crawler starts with a list of URL addresses. It downloads the documents identified by these URLs and extracts hyper-links from them. The hyper-link URLs are added to the list of URLs to be downloaded. Thus, web crawling is a recursive process.

Indexer: The output of web crawling process is a set of downloaded HTML documents. To respond to user queries fast enough, the contents of these documents have to be indexed in a data structure that stores the words contained in each document and their importance.

Query Processor: This module receives search queries, performs necessary pre-processing and queries the index for relevant documents.
Phrase Searching: Search engines will generally search for words as phrases when quotation marks are placed around the phrase.

Ranker: The ranker module sorts documents based on their popularity and relevance to the search query.


Using
Java EE, JSP & MySQL Server


My Role
I was in a team of four members and my role was:
  1. The Team Leader.
  2. Database: Participating in designing the ER diagram of the search engine.
  3. Database: Implementing the entity and session beans.
  4. Crawler: The complete crawling recursive operation.
  5. Indexer: Lookup function by URLs and returns the list of keywords which are used to get this URL in the search results.
  6. User Interface: Developing and designing the Interface of the search engine.
  7. Participating in the main flow of all servlets for Crawler, Indexer, Ranker & Lookup.