ABSTRACT Crawling the Web with Limited Memory
Oh la la
Your session has expired but don’t worry, your message
has been saved.Please log in and we’ll bring you back
to this page. You’ll just need to click “Send”.
Your evaluation is of great value to our authors and readers. Many thanks for your time.
When you're done, click "publish"
Only blue fields are mandatory.
Your mailing list is currently empty.
It will build up as you send messages
and links to your peers.
besides you has access to this list.
Enter the e-mail addresses of your recipients in the box below. Note: Peer Evaluation will NOT store these email addresses log in
Your message has been sent.
Full text for this article was not available? Send a request to the author(s)
: ABSTRACT Crawling the Web with Limited Memory
Abstract : Search engines rely on Web crawlers to create a Web index, by exploring the Web graph, downloading pages, and finding links to new pages to be explored. At any given moment, there are a number of pages waiting to be downloaded in the crawler queue. We study the growth of the queue of pending pages during a crawl of a large subset of the Web. In a normal breadth-first crawler, the queue of pending pages quickly grows very large. We present a strategy for managing the pending queue that reduces its maximum size by 50 % while preserving the coverage and quality of the pages visited. This can be applied in general Web search as well as topic-specific crawling, peer-to-peer search, on-demand Web crawling, and other environments in which memory usage has to be kept to a minimum. 1.
: Computer Science
Leave a comment
This contribution has not been reviewed yet. review?