Skip to content

cavazzatommaso/Itis-Circolari-Scraper-Node.Js

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Itis Circolari Scraper

Description:

This software was developed in Node Js and allows the printing of a json file containing the title and url of the circulars obtained from the site: http://www.iis-silva-ricci.gov.it/documenti/cat_view/1-circolari.html

How it work?

Is a simple scraper created with Cheerio Module. After loading the website, we search for each one element using the class ".dm_row" after doing it the program push into an arrey the variable.

$('.dm_row').each(function(i,elem){
            var data = $(this);
            var titolo = data.children().first().text();
            var href = data.children().first().children().get(1).attribs['href'];
            //Delete from the title \n or \t
            titolo = titolo.replace(/(\r\n\t|\n|\r\t|\t)/gm,"");
            json.push({title: titolo,url: "http://www.iis-silva-ricci.gov.it"+href});
            ct++;
        });

You can use and see this from Heroku

Use it

You are free to use this, all you have to do is an http.get request.

For the first page:

https://itiscircolari.herokuapp.com/
//this will return the content for default page (1)

For Other page:

https://itiscircolari.herokuapp.com/?page=3
//this will return the content for page 3
//just change the number for other result

Sperimental Features:

Use at your own risk

For More pages:

https://itiscircolari.herokuapp.com/?end=3
//this will return the content from page 1 to page 3
//Please do not stress to much the website

The problem is that the return of this call is not defined. Some times it's work perfectly, other times don't.

Error:

//The Json if would happen some error is:
[{error:"404", details:"Page not found (MIN = 1) (MAX = 28)}]
//For fix this just pay attention to ne number of page

📝 To Do:

  • Order the result when calling more pages

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published