请求拦截
一旦启用请求拦截,每个请求都将暂停,除非它被继续、响应或中止。
一个简单的请求拦截器示例,它中止所有图像请求
import puppeteer from 'puppeteer';
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', interceptedRequest => {
if (interceptedRequest.isInterceptResolutionHandled()) return;
if (
interceptedRequest.url().endsWith('.png') ||
interceptedRequest.url().endsWith('.jpg')
)
interceptedRequest.abort();
else interceptedRequest.continue();
});
await page.goto('https://example.com');
await browser.close();
})();
多个拦截处理程序和异步解析
默认情况下,如果在 request.abort
、request.continue
或 request.respond
中的任何一个被调用后再次调用它们,Puppeteer 将抛出一个 Request is already handled!
异常。
始终假设未知的处理程序可能已经调用了 abort/continue/respond
。即使你的处理程序是你注册的唯一处理程序,第三方包也可能注册它们自己的处理程序。因此,在调用 abort/continue/respond
之前,始终使用 request.isInterceptResolutionHandled 检查解析状态非常重要。
重要的是,当你的处理程序正在等待异步操作时,拦截解析可能会被另一个监听器处理。因此,request.isInterceptResolutionHandled
的返回值仅在同步代码块中是安全的。始终将 request.isInterceptResolutionHandled
和 abort/continue/respond
同步地一起执行。
此示例演示了两个协同工作的同步处理程序
/*
This first handler will succeed in calling request.continue because the request interception has never been resolved.
*/
page.on('request', interceptedRequest => {
if (interceptedRequest.isInterceptResolutionHandled()) return;
interceptedRequest.continue();
});
/*
This second handler will return before calling request.abort because request.continue was already
called by the first handler.
*/
page.on('request', interceptedRequest => {
if (interceptedRequest.isInterceptResolutionHandled()) return;
interceptedRequest.abort();
});
此示例演示了协同工作的异步处理程序
/*
This first handler will succeed in calling request.continue because the request interception has never been resolved.
*/
page.on('request', interceptedRequest => {
// The interception has not been handled yet. Control will pass through this guard.
if (interceptedRequest.isInterceptResolutionHandled()) return;
// It is not strictly necessary to return a promise, but doing so will allow Puppeteer to await this handler.
return new Promise(resolve => {
// Continue after 500ms
setTimeout(() => {
// Inside, check synchronously to verify that the intercept wasn't handled already.
// It might have been handled during the 500ms while the other handler awaited an async op of its own.
if (interceptedRequest.isInterceptResolutionHandled()) {
resolve();
return;
}
interceptedRequest.continue();
resolve();
}, 500);
});
});
page.on('request', async interceptedRequest => {
// The interception has not been handled yet. Control will pass through this guard.
if (interceptedRequest.isInterceptResolutionHandled()) return;
await someLongAsyncOperation();
// The interception *MIGHT* have been handled by the first handler, we can't be sure.
// Therefore, we must check again before calling continue() or we risk Puppeteer raising an exception.
if (interceptedRequest.isInterceptResolutionHandled()) return;
interceptedRequest.continue();
});
为了进行更精细的自省(请参阅下面的“协作拦截模式”),你也可以在使用 abort/continue/respond
之前同步调用 request.interceptResolutionState。
这是使用 request.interceptResolutionState
重写的上述示例
/*
This first handler will succeed in calling request.continue because the request interception has never been resolved.
*/
page.on('request', interceptedRequest => {
// The interception has not been handled yet. Control will pass through this guard.
const {action} = interceptedRequest.interceptResolutionState();
if (action === InterceptResolutionAction.AlreadyHandled) return;
// It is not strictly necessary to return a promise, but doing so will allow Puppeteer to await this handler.
return new Promise(resolve => {
// Continue after 500ms
setTimeout(() => {
// Inside, check synchronously to verify that the intercept wasn't handled already.
// It might have been handled during the 500ms while the other handler awaited an async op of its own.
const {action} = interceptedRequest.interceptResolutionState();
if (action === InterceptResolutionAction.AlreadyHandled) {
resolve();
return;
}
interceptedRequest.continue();
resolve();
}, 500);
});
});
page.on('request', async interceptedRequest => {
// The interception has not been handled yet. Control will pass through this guard.
if (
interceptedRequest.interceptResolutionState().action ===
InterceptResolutionAction.AlreadyHandled
)
return;
await someLongAsyncOperation();
// The interception *MIGHT* have been handled by the first handler, we can't be sure.
// Therefore, we must check again before calling continue() or we risk Puppeteer raising an exception.
if (
interceptedRequest.interceptResolutionState().action ===
InterceptResolutionAction.AlreadyHandled
)
return;
interceptedRequest.continue();
});
协作拦截模式
request.abort
、request.continue
和 request.respond
可以接受一个可选的 priority
来在协作拦截模式下工作。当所有处理程序都使用协作拦截模式时,Puppeteer 保证所有拦截处理程序将按注册顺序运行并等待。拦截将解析为最高优先级的解析。以下是协作拦截模式的规则
- 所有解析都必须向
abort/continue/respond
提供一个数值priority
参数。 - 如果任何解析没有提供数值
priority
,则传统模式处于活动状态,并且协作拦截模式处于非活动状态。 - 异步处理程序在拦截解析最终确定之前完成。
- 最高优先级的拦截解析“获胜”,即,拦截最终根据哪个解析被赋予最高优先级而中止/响应/继续。
- 如果发生平局,则
abort
>respond
>continue
。
为了标准化,当指定协作拦截模式优先级时,除非你有明确的理由使用更高的优先级,否则请使用 0
或 DEFAULT_INTERCEPT_RESOLUTION_PRIORITY
(从 HTTPRequest
导出)。这会优雅地优先选择 respond
而不是 continue
,并优先选择 abort
而不是 respond
,并允许其他处理程序进行协作。如果你确实有意要使用不同的优先级,则较高的优先级胜过较低的优先级。允许负优先级。例如,continue({}, 4)
将胜过 continue({}, -2)
。
为了保持向后兼容性,任何在没有指定 priority
的情况下解析拦截的处理程序(传统模式)都会导致立即解析。要使协作拦截模式正常工作,所有解析都必须使用 priority
。实际上,这意味着你仍然必须测试 request.isInterceptResolutionHandled
,因为你无法控制的处理程序可能在没有优先级的情况下调用了 abort/continue/respond
(传统模式)。
在此示例中,传统模式占主导地位,并且请求立即中止,因为至少有一个处理程序在解析拦截时省略了 priority
// Final outcome: immediate abort()
page.setRequestInterception(true);
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Legacy Mode: interception is aborted immediately.
request.abort('failed');
});
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Control will never reach this point because the request was already aborted in Legacy Mode
// Cooperative Intercept Mode: votes for continue at priority 0.
request.continue({}, 0);
});
在此示例中,传统模式占主导地位,并且请求继续,因为至少有一个处理程序未指定 priority
// Final outcome: immediate continue()
page.setRequestInterception(true);
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Cooperative Intercept Mode: votes to abort at priority 0.
request.abort('failed', 0);
});
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Control reaches this point because the request was cooperatively aborted which postpones resolution.
// { action: InterceptResolutionAction.Abort, priority: 0 }, because abort @ 0 is the current winning resolution
console.log(request.interceptResolutionState());
// Legacy Mode: intercept continues immediately.
request.continue({});
});
page.on('request', request => {
// { action: InterceptResolutionAction.AlreadyHandled }, because continue in Legacy Mode was called
console.log(request.interceptResolutionState());
});
在此示例中,协作拦截模式处于活动状态,因为所有处理程序都指定了 priority
。continue()
获胜,因为它比 abort()
具有更高的优先级。
// Final outcome: cooperative continue() @ 5
page.setRequestInterception(true);
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Cooperative Intercept Mode: votes to abort at priority 10
request.abort('failed', 0);
});
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Cooperative Intercept Mode: votes to continue at priority 5
request.continue(request.continueRequestOverrides(), 5);
});
page.on('request', request => {
// { action: InterceptResolutionAction.Continue, priority: 5 }, because continue @ 5 > abort @ 0
console.log(request.interceptResolutionState());
});
在此示例中,协作拦截模式处于活动状态,因为所有处理程序都指定了 priority
。respond()
获胜,因为它的优先级与 continue()
并列,但 respond()
击败了 continue()
。
// Final outcome: cooperative respond() @ 15
page.setRequestInterception(true);
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Cooperative Intercept Mode: votes to abort at priority 10
request.abort('failed', 10);
});
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Cooperative Intercept Mode: votes to continue at priority 15
request.continue(request.continueRequestOverrides(), 15);
});
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Cooperative Intercept Mode: votes to respond at priority 15
request.respond(request.responseForRequest(), 15);
});
page.on('request', request => {
if (request.isInterceptResolutionHandled()) return;
// Cooperative Intercept Mode: votes to respond at priority 12
request.respond(request.responseForRequest(), 12);
});
page.on('request', request => {
// { action: InterceptResolutionAction.Respond, priority: 15 }, because respond @ 15 > continue @ 15 > respond @ 12 > abort @ 10
console.log(request.interceptResolutionState());
});
协作请求继续
Puppeteer 要求显式调用 request.continue()
,否则请求将挂起。即使你的处理程序意味着不采取任何特殊操作,或者“选择退出”,也必须调用 request.continue()
。
随着协作拦截模式的引入,出现了两种用于协作请求继续的用例:非主观的和主观的。
第一种情况(常见)是你的处理程序意味着选择不针对请求执行任何特殊操作。它对进一步的操作没有意见,只是打算默认继续和/或推迟给可能有意见的其他处理程序。但是,如果没有任何其他处理程序,我们必须调用 request.continue()
以确保请求不会挂起。
我们称之为非主观继续,因为其意图是在没有其他人有更好的主意的情况下继续请求。对于这种类型的继续,请使用 request.continue({...}, DEFAULT_INTERCEPT_RESOLUTION_PRIORITY)
(或 0
)。
第二种情况(不常见)是你的处理程序实际上确实有意见,并且意味着通过覆盖在其他地方发出的较低优先级的 abort()
或 respond()
来强制继续。我们称之为主观继续。在这些罕见的情况下,当你意味着指定覆盖继续优先级时,请使用自定义优先级。
总而言之,请考虑一下你使用 request.continue
是仅仅意味着默认/绕过行为,还是属于你的处理程序的预期用例。考虑为作用域内的用例使用自定义优先级,否则使用默认优先级。请注意,你的处理程序可能同时具有主观和非主观的情况。
为软件包维护者升级到协作拦截模式
如果你是软件包维护者,并且你的软件包使用拦截处理程序,则可以更新你的拦截处理程序以使用协作拦截模式。假设你具有以下现有处理程序
page.on('request', interceptedRequest => {
if (request.isInterceptResolutionHandled()) return;
if (
interceptedRequest.url().endsWith('.png') ||
interceptedRequest.url().endsWith('.jpg')
)
interceptedRequest.abort();
else interceptedRequest.continue();
});
要使用协作拦截模式,请升级 continue()
和 abort()
page.on('request', interceptedRequest => {
if (request.isInterceptResolutionHandled()) return;
if (
interceptedRequest.url().endsWith('.png') ||
interceptedRequest.url().endsWith('.jpg')
)
interceptedRequest.abort('failed', 0);
else
interceptedRequest.continue(
interceptedRequest.continueRequestOverrides(),
0,
);
});
通过这些简单的升级,你的处理程序现在改为使用协作拦截模式。
但是,我们建议使用稍微更健壮的解决方案,因为上述解决方案会引入几个微妙的问题
- 向后兼容性。如果任何处理程序仍然使用传统模式解析(即,未指定优先级),则该处理程序将立即解析拦截,即使你的处理程序首先运行。这可能会给你的用户带来令人不安的行为,因为突然之间,你的处理程序无法解析拦截,并且当用户所做的只是升级你的软件包时,另一个处理程序正在获得优先级。
- 硬编码优先级。你的软件包用户无法为你的处理程序指定默认解析优先级。当用户希望基于用例操作优先级时,这可能变得很重要。例如,一个用户可能希望你的软件包具有高优先级,而另一个用户可能希望其具有低优先级。
为了解决这两个问题,我们建议的方法是从你的软件包导出 setInterceptResolutionConfig()
。然后,用户可以调用 setInterceptResolutionConfig()
以在你的软件包中显式激活协作拦截模式,这样他们就不会对拦截的解析方式发生更改感到惊讶。他们还可以选择使用 setInterceptResolutionConfig(priority)
指定一个自定义优先级,该优先级适用于他们的用例
// Defaults to undefined which preserves Legacy Mode behavior
let _priority = undefined;
// Export a module configuration function
export const setInterceptResolutionConfig = (priority = 0) =>
(_priority = priority);
/**
* Note that this handler uses `DEFAULT_INTERCEPT_RESOLUTION_PRIORITY` to "pass" on this request. It is important to use
* the default priority when your handler has no opinion on the request and the intent is to continue() by default.
*/
page.on('request', interceptedRequest => {
if (request.isInterceptResolutionHandled()) return;
if (
interceptedRequest.url().endsWith('.png') ||
interceptedRequest.url().endsWith('.jpg')
)
interceptedRequest.abort('failed', _priority);
else
interceptedRequest.continue(
interceptedRequest.continueRequestOverrides(),
DEFAULT_INTERCEPT_RESOLUTION_PRIORITY, // Unopinionated continuation
);
});
如果你的软件包需要对解析优先级进行更精细的控制,请使用如下配置模式
interface InterceptResolutionConfig {
abortPriority?: number;
continuePriority?: number;
}
// This approach supports multiple priorities based on situational
// differences. You could, for example, create a config that
// allowed separate priorities for PNG vs JPG.
const DEFAULT_CONFIG: InterceptResolutionConfig = {
abortPriority: undefined, // Default to Legacy Mode
continuePriority: undefined, // Default to Legacy Mode
};
// Defaults to undefined which preserves Legacy Mode behavior
let _config: Partial<InterceptResolutionConfig> = {};
export const setInterceptResolutionConfig = (
config: InterceptResolutionConfig,
) => (_config = {...DEFAULT_CONFIG, ...config});
page.on('request', interceptedRequest => {
if (request.isInterceptResolutionHandled()) return;
if (
interceptedRequest.url().endsWith('.png') ||
interceptedRequest.url().endsWith('.jpg')
) {
interceptedRequest.abort('failed', _config.abortPriority);
} else {
// Here we use a custom-configured priority to allow for Opinionated
// continuation.
// We would only want to allow this if we had a very clear reason why
// some use cases required Opinionated continuation.
interceptedRequest.continue(
interceptedRequest.continueRequestOverrides(),
_config.continuePriority, // Why would we ever want priority!==0 here?
);
}
});
上述解决方案可确保向后兼容性,同时还允许用户在使用协作拦截模式时调整你的软件包在解析链中的重要性。在用户完全升级其代码和所有第三方软件包以使用协作拦截模式之前,你的软件包将继续按预期工作。如果任何处理程序或软件包仍然使用传统模式,则你的软件包仍然可以以传统模式运行。