You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Which package is this bug report for? If unsure which one to select, leave blank
None
Issue description
Template typescript CheerioCrawleer
URLs with regex in Starturl.
Add preNavigationHooks and set a cookie in that URL
npm install
npm start
output:
...
WARN CheerioCrawler: Reclaiming failed request back to the list or queue. SyntaxError: Invalid regular expression: /^/Antonov++Andrii/: Nothing to repeat
at new RegExp (<anonymous>)
at pathMatch (D:\Fire\Proyectos\my-crawler-borrar\node_modules\tough-cookie\dist\pathMatch.js:35:13)
at matchRFC (D:\Fire\Proyectos\my-crawler-borrar\node_modules\tough-cookie\dist\memstore.js:68:51)
at D:\Fire\Proyectos\my-crawler-borrar\node_modules\tough-cookie\dist\memstore.js:87:13
at Array.forEach (<anonymous>)
at MemoryCookieStore.findCookies (D:\Fire\Proyectos\my-crawler-borrar\node_modules\tough-cookie\dist\memstore.js:82:17)
at CookieJar.getCookies (D:\Fire\Proyectos\my-crawler-borrar\node_modules\tough-cookie\dist\cookie\cookieJar.js:536:15)
at CookieJar.getCookieString (D:\Fire\Proyectos\my-crawler-borrar\node_modules\tough-cookie\dist\cookie\cookieJar.js:597:14)
at CookieJar.callSync (D:\Fire\Proyectos\my-crawler-borrar\node_modules\tough-cookie\dist\cookie\cookieJar.js:168:16)
at CookieJar.getCookieStringSync (D:\Fire\Proyectos\my-crawler-borrar\node_modules\tough-cookie\dist\cookie\cookieJar.js:610:22) {"id":"ZTnkJJu5aEw0Obe","url":"https://www.google.com/Antonov++Andrii/","retryCount":3}
INFO CheerioCrawler: Error analysis: {"totalErrors":3,"uniqueErrors":1,"mostCommonErrors":["3x: Invalid regular expression: _ Nothing to repeat (<anonymous>)"]}
INFO CheerioCrawler: Finished! Total 4 requests: 1 succeeded, 3 failed. {"terminal":true}
Code sample
// For more information, see https://crawlee.dev/import{CheerioCrawler}from'crawlee';//Example of URLs with regex (even though it returns a 404):conststartUrls=["https://www.example.com/dev/Cibus+%7C+Pluxee/","https://www.example.com/dev/Y+C++S+T+U+D+I+O/","https://www.example.com/dev/Antonov++Andrii/",'https://www.example.com/dev/Mobile+Dialer+%28+HelloBDTel+-Ten+Card+Company+%29'];constcrawler=newCheerioCrawler({// proxyConfiguration: new ProxyConfiguration({ proxyUrls: ['...'] }),requestHandler: async({ request, $, log })=>{consttitle=$('title').text();log.info(`${title}`,{url: request.loadedUrl});},errorHandler: async({},_: Error)=>{// console.log(request.url);},// Comment this option to scrape the full website.maxRequestsPerCrawl: 4,persistCookiesPerSession: false,preNavigationHooks: [(crawlingContext,_)=>{// ...try{const{ session, request }=crawlingContext;if(session){constcookieString='adlt=1;';consturlWithoutPath=newURL(request.url);urlWithoutPath.pathname='/';// Restablecer el path a solo "/"consttargetUrl=urlWithoutPath.toString();session.setCookie(cookieString,targetUrl);}}catch(error){}},],});awaitcrawler.run(startUrls);
Package version
3.11.4, 3.11.5
Node.js version
22.6.0
Operating system
Windows 11, and Ubuntu 24.04
Apify platform
Tick me if you encountered this issue on the Apify platform
I have tested this on the next release
No response
Other context
No response
The text was updated successfully, but these errors were encountered:
firecrauter
changed the title
[BUG] Error in session.setCookie function: URL incorrectly interpreted as Regex
[BUG] Error in session.setCookie (tough-cookie): URL incorrectly interpreted as Regex
Oct 27, 2024
firecrauter
changed the title
[BUG] Error in session.setCookie (tough-cookie): URL incorrectly interpreted as Regex
[BUG] session.setCookie (tough-cookie): URL incorrectly interpreted as Regex
Oct 27, 2024
Fixed in /tough-cookie/pull/465. I guess now I just have to wait for them to release an updated version of tough-cookie, and then Crawlee can be updated.
I'm sorry for so many references and for opening/closing
Which package is this bug report for? If unsure which one to select, leave blank
None
Issue description
output:
Code sample
Package version
3.11.4, 3.11.5
Node.js version
22.6.0
Operating system
Windows 11, and Ubuntu 24.04
Apify platform
I have tested this on the
next
releaseNo response
Other context
No response
The text was updated successfully, but these errors were encountered: