Views : 264       Downloads : 212 Download PDF




WEBTracker: A Web Crawler for Maximizing Bandwidth Utilization

Corresponding Author : Md. Ruhul Amin (shajib-cse@sust.edu)

Authors : Mohiul Alam Prince (maprince@gmail.com), Md. Akter Hussain (akter.1985@yahoo.com)

Keywords : WEBTracker, Web Crawler, Information Retrieval, World Wide Web

Abstract :

The most challenging part of a web crawler is to download contents at the fastest rate to utilize bandwidth and processing of the downloaded data so that it will never starve the downloader. Our implemented scalable web crawling system, named as WEBTracker has been designed to meet this challenge. It can be used very efficiently in the distributed environment to maximize downloading. WEBTracker has a Central Crawler Server and it administers all the crawler nodes. At each crawler node, Crawler Manager runs the downloader and manages the downloaded contents. Central Crawler Server and its Crawler Managers are members of the Distributed File System which ensures synchronized distributed operations of the system. In this paper, we have only concentrated on the architecture of a web crawling node which is owned by the Crawler Manager. We have shown that our implemented crawler architecture makes efficient use of allocated bandwidth, keeps the processor less busy for the processing of downloaded contents and makes efficient use of the run time memory.  

Published on December 31st, 2012 in Volume 16, Issue 1, Applied Sciences and Technology