session id is diffrent in newUrlFunction
and requestHandler
#2726
Replies: 2 comments 3 replies
-
We don't rotate proxies on browsers like this, since it's not possible to change a proxy in an existing browser context. We do have an option to enable this behavior, but it will result in creating a new browser context for each request till you reach the session pool size, so you can easily exhaust your memory. So if you want to use that, be sure to limit the session pool size, it defaults to 100 which is way too many for this option. https://crawlee.dev/api/browser-pool/interface/LaunchContextOptions#browserPerProxy (see the PR for mode details: #2418) const crawler = new PuppeteerCrawler({
// ...,
launchContext: {
browserPerProxy: true,
}
}); Alternatively, you could use incognito contexts, but that also means no caching of resources, which also results in a major performance hit (based on our measures processing took twice the time). https://crawlee.dev/api/browser-pool/interface/LaunchContextOptions#useIncognitoPages We plan to overhaul this in the next major version and nea (which will likely be out in the first half of next year). |
Beta Was this translation helpful? Give feedback.
-
@B4nan Hi, in later testing of settting codeimport { PlaywrightCrawler, ProxyConfiguration } from 'crawlee';
const startUrls = []
for(let i=0;i<100;i++){
startUrls.push(`https://example.net?q=${i}`)
}
const POOL_SIZE=10
function createProxyConfig() {
return new ProxyConfiguration({
newUrlFunction: (sessionId, options) => {
const s = sessionId + ''
const port = s.charCodeAt(s.length - 1) % POOL_SIZE + 50600
return `http://localhost:${port}`
}
})
}
const crawler = new PlaywrightCrawler({
headless: true,
maxRequestRetries: -1,
navigationTimeoutSecs: 100,
proxyConfiguration: createProxyConfig(),
minConcurrency: 5,
maxConcurrency: 5,
maxRequestsPerMinute: 500,
useSessionPool: true,
persistCookiesPerSession: true,
sessionPoolOptions:{
maxPoolSize:POOL_SIZE
},
launchContext: {
browserPerProxy: true,
},
async requestHandler({ page, proxyInfo, session }) {
console.log('requestHandler', session?.id, proxyInfo?.port, page.url())
}
})
await crawler.run(startUrls); log and parse code to find different proxy of same sessionconst input = `
requestHandler session_ep9wDhwkjb 50600 https://example.net/?q=0
requestHandler session_xRiNOaUl7v 50601 https://example.net/?q=1
requestHandler session_fJAOdWjM2m 50602 https://example.net/?q=2
requestHandler session_M0mYt9wBNR 50603 https://example.net/?q=3
requestHandler session_0LJRAXgyAF 50604 https://example.net/?q=4
requestHandler session_xRiNOaUl7v 50601 https://example.net/?q=8
requestHandler session_0LJRAXgyAF 50604 https://example.net/?q=9
requestHandler session_Q9IHaozP68 50605 https://example.net/?q=5
requestHandler session_Q9IHaozP68 50605 https://example.net/?q=6
requestHandler session_fJAOdWjM2m 50602 https://example.net/?q=11
requestHandler session_Q9IHaozP68 50605 https://example.net/?q=12
requestHandler session_M0mYt9wBNR 50603 https://example.net/?q=13
requestHandler session_0LJRAXgyAF 50606 https://example.net/?q=7
requestHandler session_xRiNOaUl7v 50601 https://example.net/?q=14
requestHandler session_M0mYt9wBNR 50603 https://example.net/?q=17
requestHandler session_M0mYt9wBNR 50603 https://example.net/?q=16
requestHandler session_Q9IHaozP68 50607 https://example.net/?q=10
requestHandler session_Q9IHaozP68 50607 https://example.net/?q=18
requestHandler session_0LJRAXgyAF 50604 https://example.net/?q=19
requestHandler session_Q9IHaozP68 50605 https://example.net/?q=20
requestHandler session_M0mYt9wBNR 50603 https://example.net/?q=22
requestHandler session_fJAOdWjM2m 50602 https://example.net/?q=23
requestHandler session_3s2UIf3g15 50608 https://example.net/?q=21
requestHandler session_3s2UIf3g15 50608 https://example.net/?q=15
requestHandler session_xRiNOaUl7v 50601 https://example.net/?q=24
requestHandler session_Q9IHaozP68 50605 https://example.net/?q=25
requestHandler session_ep9wDhwkjb 50600 https://example.net/?q=26
requestHandler session_fJAOdWjM2m 50602 https://example.net/?q=27
requestHandler session_0LJRAXgyAF 50604 https://example.net/?q=28
requestHandler session_xRiNOaUl7v 50601 https://example.net/?q=29
requestHandler session_fJAOdWjM2m 50602 https://example.net/?q=30
requestHandler session_M0mYt9wBNR 50603 https://example.net/?q=32
requestHandler session_M0mYt9wBNR 50603 https://example.net/?q=33
requestHandler session_3s2UIf3g15 50608 https://example.net/?q=35
requestHandler session_ep9wDhwkjb 50600 https://example.net/?q=34
requestHandler session_Q9IHaozP68 50605 https://example.net/?q=37
requestHandler session_0LJRAXgyAF 50604 https://example.net/?q=38
requestHandler session_0LJRAXgyAF 50604 https://example.net/?q=39
requestHandler session_xRiNOaUl7v 50601 https://example.net/?q=40
requestHandler session_gbneBPsejg 50609 https://example.net/?q=31
requestHandler session_gbneBPsejg 50609 https://example.net/?q=36
requestHandler session_M0mYt9wBNR 50603 https://example.net/?q=41
requestHandler session_0LJRAXgyAF 50604 https://example.net/?q=42
requestHandler session_3s2UIf3g15 50608 https://example.net/?q=43
requestHandler session_Q9IHaozP68 50607 https://example.net/?q=44
requestHandler session_0LJRAXgyAF 50606 https://example.net/?q=45
requestHandler session_0LJRAXgyAF 50606 https://example.net/?q=46
requestHandler session_M0mYt9wBNR 50603 https://example.net/?q=47
requestHandler session_0LJRAXgyAF 50606 https://example.net/?q=48
requestHandler session_0LJRAXgyAF 50604 https://example.net/?q=49
requestHandler session_Q9IHaozP68 50607 https://example.net/?q=50
requestHandler session_M0mYt9wBNR 50603 https://example.net/?q=51
requestHandler session_Q9IHaozP68 50607 https://example.net/?q=52
requestHandler session_Q9IHaozP68 50607 https://example.net/?q=53
requestHandler session_Q9IHaozP68 50605 https://example.net/?q=54
requestHandler session_M0mYt9wBNR 50603 https://example.net/?q=55
requestHandler session_Q9IHaozP68 50607 https://example.net/?q=56
requestHandler session_3s2UIf3g15 50608 https://example.net/?q=57
requestHandler session_xRiNOaUl7v 50601 https://example.net/?q=58
requestHandler session_M0mYt9wBNR 50603 https://example.net/?q=59
requestHandler session_Q9IHaozP68 50607 https://example.net/?q=60
requestHandler session_fJAOdWjM2m 50602 https://example.net/?q=61
requestHandler session_fJAOdWjM2m 50602 https://example.net/?q=62
requestHandler session_0LJRAXgyAF 50604 https://example.net/?q=63
requestHandler session_0LJRAXgyAF 50604 https://example.net/?q=64
requestHandler session_0LJRAXgyAF 50606 https://example.net/?q=65
requestHandler session_ep9wDhwkjb 50600 https://example.net/?q=66
requestHandler session_Q9IHaozP68 50607 https://example.net/?q=68
requestHandler session_Q9IHaozP68 50605 https://example.net/?q=67
requestHandler session_Q9IHaozP68 50605 https://example.net/?q=69
requestHandler session_0LJRAXgyAF 50604 https://example.net/?q=70
requestHandler session_ep9wDhwkjb 50600 https://example.net/?q=71
requestHandler session_fJAOdWjM2m 50602 https://example.net/?q=72
requestHandler session_0LJRAXgyAF 50606 https://example.net/?q=73
requestHandler session_M0mYt9wBNR 50603 https://example.net/?q=74
requestHandler session_ep9wDhwkjb 50600 https://example.net/?q=75
requestHandler session_xRiNOaUl7v 50601 https://example.net/?q=76
requestHandler session_xRiNOaUl7v 50601 https://example.net/?q=77
requestHandler session_M0mYt9wBNR 50603 https://example.net/?q=78
requestHandler session_Q9IHaozP68 50607 https://example.net/?q=79
requestHandler session_M0mYt9wBNR 50603 https://example.net/?q=81
requestHandler session_Q9IHaozP68 50605 https://example.net/?q=80
requestHandler session_0LJRAXgyAF 50604 https://example.net/?q=82
requestHandler session_0LJRAXgyAF 50604 https://example.net/?q=83
requestHandler session_ep9wDhwkjb 50600 https://example.net/?q=84
requestHandler session_fJAOdWjM2m 50602 https://example.net/?q=85
requestHandler session_0LJRAXgyAF 50606 https://example.net/?q=86
requestHandler session_ep9wDhwkjb 50600 https://example.net/?q=87
requestHandler session_ep9wDhwkjb 50600 https://example.net/?q=88
requestHandler session_3s2UIf3g15 50608 https://example.net/?q=89
requestHandler session_3s2UIf3g15 50608 https://example.net/?q=90
requestHandler session_gbneBPsejg 50609 https://example.net/?q=92
requestHandler session_xRiNOaUl7v 50601 https://example.net/?q=91
requestHandler session_3s2UIf3g15 50608 https://example.net/?q=93
requestHandler session_3s2UIf3g15 50608 https://example.net/?q=94
requestHandler session_ep9wDhwkjb 50600 https://example.net/?q=95
requestHandler session_gbneBPsejg 50609 https://example.net/?q=96
requestHandler session_gbneBPsejg 50609 https://example.net/?q=97
requestHandler session_fJAOdWjM2m 50602 https://example.net/?q=98
requestHandler session_M0mYt9wBNR 50603 https://example.net/?q=99
`;
function parseLog(input: string) {
const session2port: { [key: string]: Record<string,boolean> } = {};
const port2session: { [key: string]: Record<string,boolean> } = {};
const lines = input.trim().split('\n');
lines.forEach(line => {
const [_,sessionId,port] = line.trim().split(' ');
if (!session2port[sessionId]) {
session2port[sessionId] = {};
}
session2port[sessionId][port]=true;
if(!port2session[port]){
port2session[port]={}
}
port2session[port][sessionId]=true
});
return {port2session,session2port}
}
const result = parseLog(input);
console.log(result);
{
port2session: {
'50600': { session_ep9wDhwkjb: true },
'50601': { session_xRiNOaUl7v: true },
'50602': { session_fJAOdWjM2m: true },
'50603': { session_M0mYt9wBNR: true },
'50604': { session_0LJRAXgyAF: true },
'50605': { session_Q9IHaozP68: true },
'50606': { session_0LJRAXgyAF: true },
'50607': { session_Q9IHaozP68: true },
'50608': { session_3s2UIf3g15: true },
'50609': { session_gbneBPsejg: true }
},
session2port: {
session_ep9wDhwkjb: { '50600': true },
session_xRiNOaUl7v: { '50601': true },
session_fJAOdWjM2m: { '50602': true },
session_M0mYt9wBNR: { '50603': true },
session_0LJRAXgyAF: { '50604': true, '50606': true },
session_Q9IHaozP68: { '50605': true, '50607': true },
session_3s2UIf3g15: { '50608': true },
session_gbneBPsejg: { '50609': true }
}
} |
Beta Was this translation helpful? Give feedback.
-
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/core
Issue description
The actual performance of
IP Rotation and session management
is confusing and not functioning properly:please view the sample code, in short, the problem is:
newUrlFunction
's sesson id is diffrent torequestHandler
's session id of same requestsession ids of different url in
requestHandler
are same, but the corespond proxys are different. (should the sessions of first five requests be same or not?)Code sample
Package version
3.11.5
Node.js version
20
Operating system
Win10
Apify platform
I have tested this on the
next
releaseNo response
Other context
No response
Beta Was this translation helpful? Give feedback.
All reactions