This software was developed in Node Js and allows the printing of a json file containing the title and url of the circulars obtained from the site:
Is a simple scraper created with Cheerio Module. After loading the website, we search for each one element using the class ".dm_row" after doing it the program push into an arrey the variable.
var data = $(this);
var titolo = data.children().first().text();
var href = data.children().first().children().get(1).attribs['href'];
//Delete from the title \n or \t
titolo = titolo.replace(/(\r\n\t|\n|\r\t|\t)/gm,"");
json.push({title: titolo,url: ""+href});
You are free to use this, all you have to do is an http.get request.
//this will return the content for default page (1)
//this will return the content for page 3
//just change the number for other result
//this will return the content from page 1 to page 3
//Please do not stress to much the website
The problem is that the return of this call is not defined. Some times it's work perfectly, other times don't.
//The Json if would happen some error is:
[{error:"404", details:"Page not found (MIN = 1) (MAX = 28)}]
//For fix this just pay attention to ne number of page
- Order the result when calling more pages