Skip to content

A webcrawler which notifies one over a telegram bot, if new links were created on a website. (with some sort of auto login for ethz moodle websites)

License

Notifications You must be signed in to change notification settings

thieme55/TelegramWebCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TelegramWebCrawler

A web crawler using a Telegram bot as an interface. Whenever new links are created on a website it sends a notification. It also supports ETHZ Moodle websites that require logging in.

Dependencies:

  • jsoup 1.13
  • htmlunit 2.37

Setup

  1. You need a Telegram bot which can be created with the @BotFather Telegram bot. This token is needed in the ETHCrawler.java file.
  2. Create a Telegram channel, add your personal bot, make the bot an admin, and allow it to send messages.
  3. Add your Telegram channel name to the ETHCrawler.java file.
  4. Add the urls you wish to scrape to the ETHCrawler.java file.

Moodle setup (optional)

  1. If you want to use Moodle websites which require logging in, provide the credentials in the ETHCrawler.java file.

Docker

It's possible to run the project using Docker. There is no pre-built image available as of right now but you can still build it yourself.

Use the following command to build and run the project in a container:

docker run --rm -it $(docker build -q .)

Remarks

  • The Telegram API does not allow sending lots of messages at the same time. This may restrict the use of multiple Telegram channels for the same bot (I couldn't get it to work with some delay)
  • Moodle login could easily break at any time due to relative and absolute XPaths used in the LogInBot.java file.

About

A webcrawler which notifies one over a telegram bot, if new links were created on a website. (with some sort of auto login for ethz moodle websites)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •