Summary
(Glossary: Asset's hashed pathname is the path key that is used in KV to match the corresponding stored asset.
ASSET_MANIFEST
maps the normal pathname to hashed pathname, like 404.html
=> 404.eeab4c0347.html
.)
In a case when asset requested by its hashed pathname (e.g. 404.eeab4c0347.html
) rather than the normal one (e.g. 404.html
), the worker could handle such (special) request and do return the correct asset as if handling normal request. However, in such case, the worker unexpectedly ignores cache options, thus it would not cache the response and would always fetch asset from KV every time it handles such hashed pathname request.
Expected Result
When asset requested by hashed pathname, the worker should return 404, or return the asset and apply cache options properly.
Actual Result
The worker does respond with the requested asset, but cache options fail, and worker always fetches asset from KV.
Steps to Reproduce
It could be reproduced just with the default template.
wrangler generate --site hello-world-site
cd .\hello-world-site\
wrangler publish
With following default cache options:
https://github.com/cloudflare/kv-asset-handler/blob/3228cd78de1f4f61d74cecb13a3b19054d67d501/src/index.ts#L56-L60
When we request asset in normal path manner (e.g. img/200-wrangler-ferris.gif
), the worker will retrieve the asset (response) from Cache (if cache exists) and return. For example:
$ curl -s -D - -o /dev/null https://hello-world-site.***.workers.dev/img/200-wrangler-
ferris.gif | egrep "HTTP/|content-type:|etag:|cf-cache-status:|age:"
HTTP/2 200
content-type: image/gif
age: 20
etag: img/200-wrangler-ferris.8f4194bc08.gif
cf-cache-status: HIT
We can tell it is a cached response from that the age
and cf-cache-status
header appears.
And the etag
header in the response reveals the asset's hashed pathname. So, literally everyone could easily find an asset's hashed pathname in this way.
However, if we try to directly use hashed pathname requesting asset, (take img/200-wrangler-ferris.8f4194bc08.gif
for example here)
$ curl -s -D - -o /dev/null https://hello-world-site.***.workers.dev/img/200-wrangler-ferris.8f4194bc08.gif | egrep "HTTP/|content-type:|etag:|cf-cache-status:|age:"
HTTP/2 200
content-type: image/gif
The response is not retrieved from the Cache, but rather constructed with asset object freshly retrieved from the KV, and the absence of age
and cf-cache-status
header could support this statement.
Causes
https://github.com/cloudflare/kv-asset-handler/blob/3228cd78de1f4f61d74cecb13a3b19054d67d501/src/index.ts#L126-L134
Hashed pathname (e.g. 200-wrangler-ferris.8f4194bc08.gif
) is not in ASSET_MANIFEST
(what in ASSET_MANIFEST
is 200-wrangler-ferris.gif
), thus shouldEdgeCache
is false
when handling hashed pathname (pathKey
) request, and worker would not fetch response from Cache, instead, would try to fetch asset from KV:
https://github.com/cloudflare/kv-asset-handler/blob/3228cd78de1f4f61d74cecb13a3b19054d67d501/src/index.ts#L166-L169
https://github.com/cloudflare/kv-asset-handler/blob/3228cd78de1f4f61d74cecb13a3b19054d67d501/src/index.ts#L209-L213
But since the hashed pathname (pathKey
) is real, the asset could do be successfully retrieved from KV. However, due to shouldEdgeCache
is false
, the following cache control operation is skipped:
https://github.com/cloudflare/kv-asset-handler/blob/3228cd78de1f4f61d74cecb13a3b19054d67d501/src/index.ts#L215-L226
I am afraid that this may be a potentially vulnerable bug or defeat which could be exploited maliciously, since KV I/O is "preciously" billing and limited.