Skip to content

Commit dd3981c

Browse files
committed
download: Fix to send URL path rather than absolute URL in HTTP GET requests
This behavior conforms to RFC 2616 (HTTP/1.1) §5.1.2: > The most common form of Request-URI is that used to identify a > resource on an origin server or gateway. In this case the absolute > path of the URI MUST be transmitted (see section 3.2.1, abs_path) as > the Request-URI, and the network location of the URI (authority) MUST > be transmitted in a Host header field. Helps download & serve: http://blog.animeworld.com/#/#/ Probably also helps download WordPress sites in general.
1 parent 2bfcb95 commit dd3981c

File tree

2 files changed

+5
-1
lines changed

2 files changed

+5
-1
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,9 @@ Release Notes ⋮
178178

179179
* Downloading improvements
180180
* Can force redownload of older URLs using the `--stale-before` CLI option.
181+
* Fix to send URL path rather than absolute URL in HTTP GET requests,
182+
improving conformance to RFC 2616 (HTTP/1.1).
183+
* This helps download WordPress sites successfully.
181184

182185
* Parsing improvements
183186
* Can identify URL references inside Atom feeds and RSS feeds.

src/crystal/download.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,7 @@ def __call__(self):
119119
url_parts = urlparse(self.url)
120120
scheme = url_parts.scheme
121121
host_and_port = url_parts.netloc
122+
path = url_parts.path or '/'
122123

123124
if scheme == 'http':
124125
conn = HTTPConnection(host_and_port)
@@ -138,7 +139,7 @@ def __call__(self):
138139
for (k, v) in _EXTRA_HEADERS.items():
139140
headers[k] = v
140141

141-
conn.request('GET', self.url, headers=headers)
142+
conn.request('GET', path, headers=headers)
142143
response = conn.getresponse()
143144

144145
metadata = ResourceRevisionMetadata({

0 commit comments

Comments
 (0)