EagleEye basic search engine

850One of my university courseworks was to develop a working search engine. Which consists of modules like text analyzer, stemmer, tokenizer, web crawler and so on. Even though I'm not 100% staisfied with my performance, because when I did I learned so much, but was lazy to go back and correct or improve my design. Anyway you can find documentation, exectutables and full source in this article.


The EagleEye was developed for ENG554 course at University Of Portsmouth by Raivis Strogonovs The task of the course was to develop a working search engine with 5 basic modules: Tokenizer, stemmer, Inverted file builder, web crawler and GUI. The search engine is more or less written by following a book - "Introduction to information retrieval" by Chritopher D. Manning, Prabhakar Raghavan, Hinrich Schutze

It incorporates the following features:

  • Web crawler (multi-threaded)
  • Inverted index builder (multi-threaded)
  • Searching with custom VSM scoring
  • Text Analyzer - tokenizer and stemmer
  • Spell Checker

Operation manual

More or less everyone should be able to use the EagleEye without reading operation manual, but to begin using the search engine you have to first have to press a button "Start Eagling". It is responsible for starting the web crawler and indexer threads.

If you want to configure the indexer and web crawler then before you press "Start Eagling" go to "Tools/Options" and there you can set the following things:

  • Indexer Threads
  • Crawler Threads
  • Crawler Timeout in ms - basically the delay between crawling the same host
  • Crawler depth - limit how many child links can be accessed
  • Seeds - the initial web-sites from where crawlers will start crawling

After you have configured and crawled small portion of the world wide web, the press "Learn English" button so the EagleEye can learn English spelling. After it has learned English it will notify the user in the search bar.

At this stage you are ready to do some searching, just enter your query in the search bar and press Enter or button "Fly!" It will open a new panel, where it will display all the search results if any. If there are more than 10 results, it will split split them in pages. However, when you have done your initial search you can't return back to home panel, but you can carry on searching using the results panel. It is done similarly to home panel, just input your query in the search bar and press Enter or "Fly!".

If for some reasons you want to recrawl the world wide web, you can do it as many times as you wish, just press the "Start Eagling" button

Warning
The web documents are saved in your RAM memory and there is no check how much memory is used by the search engine. Windows has a limit of 1Gb per application, if the EagleEye will exceed that, then it will crash. For Linux and Mac it may vary.
 
 

Files:

Documentation

Download: EagleEye.jar (3.05M)

Download Source: EagleEye.zip (12.67M)

 

 




ADVERTISEMENT

  • Chris Sparks

    Chris Sparks in Simple TLS/SSL SMTP client for Qt5

    qt.network.ssl: QSslSocket::connectToHostEncrypted: TLS initialization failed

    Edward Martin

    Edward Martin in Introduction to data encryption

    The introduction to the data encryption is very much helpful as you will get to know about the data encryption procedure which might help you to protect the files and folders. You can check the 

    bean

    bean in DIY 3D printer, AKA RepRap

    The 3D printer creates the dimensional object and with the features of the 3D printer, it executes the process if need further query then visit fix error code 0xc004f074...

    preter jack

    preter jack in Single Cycle MIPS CPU in VHDL

    Single-cycle CPU is more improved than the normal and usual CPUs and it has a lot of advantages and all you can get from fix

    henrydevid

    henrydevid in nRF51 Makefile with Qt Creator

    It is great for all, I have some different tricks if you want to know it then for that I have some different tricks, for that, you want to learn to code. For that, if you want to know it then for that you just visit some tutorial. And from there it is great for all. If you faced any type...

    Steve Rogger

    Steve Rogger in OpenCV on Raspberry Pi

    I have read the post and it is very much helpful because I have got to know about the open cv on the raspberry pi. The images and the screenshots provided here will guide you for your questions. The code you will get here is very unique and can be only used to run OpenCV. If you face any...

    preter jack

    preter jack in Programming micro-controller (Arduino) with cheap HC-06 Bluetooth

    This type of IC Bluetooth offers a lot of advantages and this type of Bluetooth are really very user-friendly and easy to use. It provides many unique features that are all mentioned over the