Itis Circolari Scraper

Description:

This software was developed in Node Js and allows the printing of a json file containing the title and url of the circulars obtained from the site: http://www.iis-silva-ricci.gov.it/documenti/cat_view/1-circolari.html

How it work?

Is a simple scraper created with Cheerio Module. After loading the website, we search for each one element using the class ".dm_row" after doing it the program push into an arrey the variable.

$('.dm_row').each(function(i,elem){
            var data = $(this);
            var titolo = data.children().first().text();
            var href = data.children().first().children().get(1).attribs['href'];
            //Delete from the title \n or \t
            titolo = titolo.replace(/(\r\n\t|\n|\r\t|\t)/gm,"");
            json.push({title: titolo,url: "http://www.iis-silva-ricci.gov.it"+href});
            ct++;
        });

You can use and see this from

Use it

You are free to use this, all you have to do is an http.get request.

For the first page:

https://itiscircolari.herokuapp.com/
//this will return the content for default page (1)

For Other page:

https://itiscircolari.herokuapp.com/?page=3
//this will return the content for page 3
//just change the number for other result

Sperimental Features:

Use at your own risk

For More pages:

https://itiscircolari.herokuapp.com/?end=3
//this will return the content from page 1 to page 3
//Please do not stress to much the website

The problem is that the return of this call is not defined. Some times it's work perfectly, other times don't.

Error:

//The Json if would happen some error is:
[{error:"404", details:"Page not found (MIN = 1) (MAX = 28)}]
//For fix this just pay attention to ne number of page

📝 To Do:

Order the result when calling more pages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
node_modules		node_modules
src		src
.gitattributes		.gitattributes
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Itis Circolari Scraper

Description:

How it work?

Use it

For the first page:

For Other page:

Sperimental Features:

Use at your own risk

For More pages:

Error:

📝 To Do:

About

Releases

Packages

Languages

cavazzatommaso/Itis-Circolari-Scraper-Node.Js

Folders and files

Latest commit

History

Repository files navigation

Itis Circolari Scraper

Description:

How it work?

Use it

For the first page:

For Other page:

Sperimental Features:

Use at your own risk

For More pages:

Error:

📝 To Do:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages