请求拦截
启用请求拦截后,每个请求都会被阻塞,除非它被继续、响应或中止。
一个简单的请求拦截器示例,它会中止所有图片请求
import puppeteer from 'puppeteer';
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', interceptedRequest => {
if (interceptedRequest.isInterceptResolutionHandled()) return;
if (
interceptedRequest.url().endsWith('.png') ||
interceptedRequest.url().endsWith('.jpg')
)
interceptedRequest.abort();
else interceptedRequest.continue();
});
await page.goto('https://example.com');
await browser.close();
})();
多个拦截处理程序和异步解析
默认情况下,如果在调用 request.abort
、request.continue
或 request.respond
之后再次调用它们,Puppeteer 会抛出 Request is already handled!
异常。
始终假设未知处理程序可能已经调用了 abort/continue/respond
。即使您的处理程序是您注册的唯一处理程序,第三方软件包也可能注册自己的处理程序。因此,在调用 abort/continue/respond
之前,始终使用 request.isInterceptResolutionHandled 检查解析状态非常重要。
重要的是,当您的处理程序正在等待异步操作时,拦截解析可能会被另一个监听器处理。因此,request.isInterceptResolutionHandled
的返回值仅在同步代码块中是安全的。始终同步执行 request.isInterceptResolutionHandled
和 abort/continue/respond
。
此示例演示了两个同步处理程序协同工作
/*
This first handler will succeed in calling request.continue because the request interception has never been resolved.
*/
page.on('request', interceptedRequest => {
if (interceptedRequest.isInterceptResolutionHandled()) return;
interceptedRequest.continue();
});
/*
This second handler will return before calling request.abort because request.continue was already
called by the first handler.
*/
page.on('request', interceptedRequest => {
if (interceptedRequest.isInterceptResolutionHandled()) return;
interceptedRequest.abort();
});
此示例演示了异步处理程序协同工作
/*
This first handler will succeed in calling request.continue because the request interception has never been resolved.
*/
page.on('request', interceptedRequest => {
// The interception has not been handled yet. Control will pass through this guard.
if (interceptedRequest.isInterceptResolutionHandled()) return;
// It is not strictly necessary to return a promise, but doing so will allow Puppeteer to await this handler.
return new Promise(resolve => {
// Continue after 500ms
setTimeout(() => {
// Inside, check synchronously to verify that the intercept wasn't handled already.
// It might have been handled during the 500ms while the other handler awaited an async op of its own.
if (interceptedRequest.isInterceptResolutionHandled()) {
resolve();
return;
}
interceptedRequest.continue();
resolve();
}, 500);
});
});
page.on('request', async interceptedRequest => {
// The interception has not been handled yet. Control will pass through this guard.
if (interceptedRequest.isInterceptResolutionHandled()) return;
await someLongAsyncOperation();
// The interception *MIGHT* have been handled by the first handler, we can't be sure.
// Therefore, we must check again before calling continue() or we risk Puppeteer raising an exception.
if (interceptedRequest.isInterceptResolutionHandled()) return;
interceptedRequest.continue();
});
为了更细粒度的内省(参见下面的协作拦截模式),您也可以在使用abort/continue/respond
之前同步调用request.interceptResolutionState。
以下是使用request.interceptResolutionState
重写的上述示例
/*
This first handler will succeed in calling request.continue because the request interception has never been resolved.
*/
page.on('request', interceptedRequest => {
// The interception has not been handled yet. Control will pass through this guard.
const {action} = interceptedRequest.interceptResolutionState();
if (action === InterceptResolutionAction.AlreadyHandled) return;
// It is not strictly necessary to return a promise, but doing so will allow Puppeteer to await this handler.
return new Promise(resolve => {
// Continue after 500ms
setTimeout(() => {
// Inside, check synchronously to verify that the intercept wasn't handled already.
// It might have been handled during the 500ms while the other handler awaited an async op of its own.
const {action} = interceptedRequest.interceptResolutionState();
if (action === InterceptResolutionAction.AlreadyHandled) {
resolve();
return;
}
interceptedRequest.continue();
resolve();
}, 500);
});
});
page.on('request', async interceptedRequest => {
// The interception has not been handled yet. Control will pass through this guard.
if (
interceptedRequest.interceptResolutionState().action ===
InterceptResolutionAction.AlreadyHandled
)
return;
await someLongAsyncOperation();
// The interception *MIGHT* have been handled by the first handler, we can't be sure.
// Therefore, we must check again before calling continue() or we risk Puppeteer raising an exception.
if (
interceptedRequest.interceptResolutionState().action ===
InterceptResolutionAction.AlreadyHandled
)
return;
interceptedRequest.continue();
});
协作拦截模式
request.abort
、request.continue
和request.respond
可以接受一个可选的priority
来在协作拦截模式下工作。当所有处理程序都使用协作拦截模式时,Puppeteer 保证所有拦截处理程序将按注册顺序运行并等待。拦截将解析为最高优先级的解析。以下是协作拦截模式的规则
- 所有解析必须为
abort/continue/respond
提供一个数字priority
参数。 - 如果任何解析没有提供数字
priority
,则遗留模式处于活动状态,协作拦截模式处于非活动状态。 - 异步处理程序在拦截解析完成之前完成。
- 最高优先级的拦截解析“获胜”,即拦截最终根据哪个解析被赋予最高优先级而被中止/响应/继续。
- 在平局的情况下,
abort
>respond
>continue
。
为了标准化,在指定协作拦截模式优先级时,请使用0
或DEFAULT_INTERCEPT_RESOLUTION_PRIORITY
(从HTTPRequest
导出),除非您有明确的理由使用更高的优先级。这会优雅地优先考虑respond
而不是continue
,以及abort
而不是respond
,并允许其他处理程序协同工作。如果您确实有意使用不同的优先级,则更高的优先级会胜过较低的优先级。允许使用负优先级。例如,continue({}, 4)
将胜过continue({}, -2)
。
为了保持向后兼容性,任何在不指定priority
(遗留模式)的情况下解析拦截的处理程序都会导致立即解析。为了使协作拦截模式起作用,所有解析都必须使用priority
。在实践中,这意味着您仍然必须测试request.isInterceptResolutionHandled
,因为超出您控制范围的处理程序可能在没有优先级(遗留模式)的情况下调用了abort/continue/respond
。
在此示例中,遗留模式占主导地位,请求立即中止,因为至少有一个处理程序在解析拦截时省略了priority
// Final outcome: immediate abort()
page.setRequestInterception(true);
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Legacy Mode: interception is aborted immediately.
request.abort('failed');
});
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Control will never reach this point because the request was already aborted in Legacy Mode
// Cooperative Intercept Mode: votes for continue at priority 0.
request.continue({}, 0);
});
在此示例中,遗留模式占主导地位,请求继续,因为至少有一个处理程序没有指定priority
// Final outcome: immediate continue()
page.setRequestInterception(true);
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Cooperative Intercept Mode: votes to abort at priority 0.
request.abort('failed', 0);
});
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Control reaches this point because the request was cooperatively aborted which postpones resolution.
// { action: InterceptResolutionAction.Abort, priority: 0 }, because abort @ 0 is the current winning resolution
console.log(request.interceptResolutionState());
// Legacy Mode: intercept continues immediately.
request.continue({});
});
page.on('request', request => {
// { action: InterceptResolutionAction.AlreadyHandled }, because continue in Legacy Mode was called
console.log(request.interceptResolutionState());
});
在此示例中,协作拦截模式处于活动状态,因为所有处理程序都指定了priority
。continue()
获胜,因为它比abort()
具有更高的优先级。
// Final outcome: cooperative continue() @ 5
page.setRequestInterception(true);
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Cooperative Intercept Mode: votes to abort at priority 10
request.abort('failed', 0);
});
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Cooperative Intercept Mode: votes to continue at priority 5
request.continue(request.continueRequestOverrides(), 5);
});
page.on('request', request => {
// { action: InterceptResolutionAction.Continue, priority: 5 }, because continue @ 5 > abort @ 0
console.log(request.interceptResolutionState());
});
在此示例中,协作拦截模式处于活动状态,因为所有处理程序都指定了priority
。respond()
获胜,因为它的优先级与continue()
相同,但respond()
胜过continue()
。
// Final outcome: cooperative respond() @ 15
page.setRequestInterception(true);
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Cooperative Intercept Mode: votes to abort at priority 10
request.abort('failed', 10);
});
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Cooperative Intercept Mode: votes to continue at priority 15
request.continue(request.continueRequestOverrides(), 15);
});
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Cooperative Intercept Mode: votes to respond at priority 15
request.respond(request.responseForRequest(), 15);
});
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Cooperative Intercept Mode: votes to respond at priority 12
request.respond(request.responseForRequest(), 12);
});
page.on('request', request => {
// { action: InterceptResolutionAction.Respond, priority: 15 }, because respond @ 15 > continue @ 15 > respond @ 12 > abort @ 10
console.log(request.interceptResolutionState());
});
协作请求继续
Puppeteer 要求显式调用 request.continue()
,否则请求将挂起。即使您的处理程序意味着不采取任何特殊操作,或“选择退出”,也必须仍然调用 request.continue()
。
随着协作拦截模式的引入,协作请求继续出现了两种用例:无意见和有意见。
第一种情况(常见)是您的处理程序意味着选择退出对请求进行任何特殊操作。它对进一步的操作没有意见,只是打算默认继续和/或推迟到可能具有意见的其他处理程序。但如果没有任何其他处理程序,我们必须调用 request.continue()
以确保请求不会挂起。
我们称之为无意见继续,因为意图是在没有人有更好的想法的情况下继续请求。使用 request.continue({...}, DEFAULT_INTERCEPT_RESOLUTION_PRIORITY)
(或 0
)用于此类型的继续。
第二种情况(不常见)是您的处理程序实际上确实有意见,并且意味着通过覆盖在其他地方发出的较低优先级的 abort()
或 respond()
来强制继续。我们称之为有意见继续。在这些罕见的情况下,您希望指定一个覆盖的继续优先级,请使用自定义优先级。
总之,要考虑您的 request.continue
使用是否只是默认/旁路行为,还是属于您的处理程序的预期用例。考虑对范围内用例使用自定义优先级,否则使用默认优先级。请注意,您的处理程序可能同时具有有意见和无意见的情况。
为包维护者升级到协作拦截模式
如果您是包维护者,并且您的包使用拦截处理程序,您可以更新您的拦截处理程序以使用协作拦截模式。假设您有以下现有处理程序
page.on('request', interceptedRequest => {
if (request.isInterceptResolutionHandled()) return;
if (
interceptedRequest.url().endsWith('.png') ||
interceptedRequest.url().endsWith('.jpg')
)
interceptedRequest.abort();
else interceptedRequest.continue();
});
要使用协作拦截模式,请升级 continue()
和 abort()
page.on('request', interceptedRequest => {
if (request.isInterceptResolutionHandled()) return;
if (
interceptedRequest.url().endsWith('.png') ||
interceptedRequest.url().endsWith('.jpg')
)
interceptedRequest.abort('failed', 0);
else
interceptedRequest.continue(
interceptedRequest.continueRequestOverrides(),
0
);
});
通过这些简单的升级,您的处理程序现在使用协作拦截模式。
但是,我们建议使用更健壮的解决方案,因为上述方法引入了几个微妙的问题
- 向后兼容性。如果任何处理程序仍然使用传统模式解析(即,不指定优先级),该处理程序将立即解析拦截,即使您的处理程序先运行。这可能会导致您的用户感到不安,因为突然您的处理程序没有解析拦截,而另一个处理程序正在优先处理,而用户所做的只是升级了您的包。
- 硬编码优先级。您的包用户无法为您的处理程序指定默认解析优先级。当用户希望根据用例操作优先级时,这可能会变得很重要。例如,一个用户可能希望您的包具有高优先级,而另一个用户可能希望它具有低优先级。
为了解决这两个问题,我们建议从您的包中导出一个 setInterceptResolutionConfig()
函数。用户可以调用 setInterceptResolutionConfig()
来显式地在您的包中激活协作拦截模式,这样他们就不会对拦截解析方式的改变感到意外。他们还可以选择使用 setInterceptResolutionConfig(priority)
指定一个适合其用例的自定义优先级。
// Defaults to undefined which preserves Legacy Mode behavior
let _priority = undefined;
// Export a module configuration function
export const setInterceptResolutionConfig = (priority = 0) =>
(_priority = priority);
/**
* Note that this handler uses `DEFAULT_INTERCEPT_RESOLUTION_PRIORITY` to "pass" on this request. It is important to use
* the default priority when your handler has no opinion on the request and the intent is to continue() by default.
*/
page.on('request', interceptedRequest => {
if (request.isInterceptResolutionHandled()) return;
if (
interceptedRequest.url().endsWith('.png') ||
interceptedRequest.url().endsWith('.jpg')
)
interceptedRequest.abort('failed', _priority);
else
interceptedRequest.continue(
interceptedRequest.continueRequestOverrides(),
DEFAULT_INTERCEPT_RESOLUTION_PRIORITY // Unopinionated continuation
);
});
如果您的包需要对解析优先级进行更细粒度的控制,请使用类似于以下的配置模式。
interface InterceptResolutionConfig {
abortPriority?: number;
continuePriority?: number;
}
// This approach supports multiple priorities based on situational
// differences. You could, for example, create a config that
// allowed separate priorities for PNG vs JPG.
const DEFAULT_CONFIG: InterceptResolutionConfig = {
abortPriority: undefined, // Default to Legacy Mode
continuePriority: undefined, // Default to Legacy Mode
};
// Defaults to undefined which preserves Legacy Mode behavior
let _config: Partial<InterceptResolutionConfig> = {};
export const setInterceptResolutionConfig = (
config: InterceptResolutionConfig
) => (_config = {...DEFAULT_CONFIG, ...config});
page.on('request', interceptedRequest => {
if (request.isInterceptResolutionHandled()) return;
if (
interceptedRequest.url().endsWith('.png') ||
interceptedRequest.url().endsWith('.jpg')
) {
interceptedRequest.abort('failed', _config.abortPriority);
} else {
// Here we use a custom-configured priority to allow for Opinionated
// continuation.
// We would only want to allow this if we had a very clear reason why
// some use cases required Opinionated continuation.
interceptedRequest.continue(
interceptedRequest.continueRequestOverrides(),
_config.continuePriority // Why would we ever want priority!==0 here?
);
}
});
以上解决方案确保了向后兼容性,同时还允许用户在使用协作拦截模式时调整您的包在解析链中的重要性。在用户完全升级其代码和所有第三方包以使用协作拦截模式之前,您的包将继续按预期工作。如果任何处理程序或包仍然使用传统模式,您的包也可以继续在传统模式下运行。