Releases: tamarasaurus/contract-scraper
Releases · tamarasaurus/contract-scraper
v2.0.0
Improvements
- Upgrade to puppeteer 3.0.0
- Remove iconv
- Remove request fetcher, always use puppeteer to fetch and encode page contents
- Remove need for
scrapeAfterLoading
key
v1.0.12
Update the script tag provider to always parse an array of items
v1.0.11
Fix encoding issues by ignoring "windows-*" encoding types
v1.0.10
Allow scraping the innerHTML
of elements
v1.0.9
Allow the use of the :eq
selector in contracts
v1.0.8
This release adds the ability to scrape JSON inside script tags using a contract
v1.0.7
This release fixes a bug in the background-image
attribute when parsing absolute urls.
v1.0.6
This release fixes a bug when converting windows-1252
encoding to utf-8
.
v1.0.5
This version disables web security in the Puppeteer fetcher
v1.0.4
This version adds support for nested attributes, for example, scraping a nested list of items and returning them as an array:
<ul class="friends">
<li>
<span>Spiderman</span>
<ul>
<li><strong>Iron</strong><em>Man</em></li>
<li><strong>Captain</strong><em>America</em></li>
</ul>
</li>
</ul>
The contract:
{
"itemSelector": ".friends li",
"attributes": {
"name": { "type": "text", "selector": "span" },
"friends": {
"itemSelector": "ul li",
"attributes": {
"firstName": { "type": "text", "selector": "strong" },
"lastName": { "type": "text", "selector": "em" }
}
}
}
}
So this will return all the friends
as an array:
[
{
name: 'Spiderman',
friends: [
{ firstName: 'Iron', lastName: 'Man' },
{ firstName: 'Captain', lastName: 'America' },
]
}
]