If I try to retrieve the content http://doctype.com/ using wget, curl or any non-browser web client, garbled content is returned.

In more details : I use an internal application in which I store my bookmarks. This application try to acces the bookmarked URL in order to retrieve the favicon. For doctype.com, garbled content is returned. I have been able to reproduce the problem with wget and curl.

Edit : Thanks to both of you for the explanation and a workaround. Though as of now it seems your cache is now behaving correctly (2009/08/15 22h00, Paris time).

4 answers

Alex Holt 341
5
points

Ha. beat me to it.. i couldn't get his page to load earlier ... not sure why that is Paul :P

Anyhow, if Olivier wants to access the data from the page in a script, there's no reason why he cant just gunzip the content that curl pulls down... if you're shell scripting it... you can just pipe the output of curl through the gunzip command and you'll end up with the straight html... like:

# curl http://doctype.com/ | gunzip > doctype-index.html 

That gives you the html you want.. ;)

Answered over 7 years ago by Alex Holt
3
points

This is a by-product of our caching system. Logged out users (which Curl will appear to be) always get a gzipped version of the page - that's why it's garbled.

Ideally our cache servers should look at the accept headers, and serve Curl an uncompressed version of the page. This is certainly possible, and we'll get that working at some point in the next few weeks.

In the meantime it isn't going to be possible to get an uncompressed download via Curl. Sorry I couldn't be of more help.

Answered over 7 years ago by Paul Farnell
0
points

We've now fixed the caching so that it observes the accept headers. You can now get Doctype's plain HTML using curl without any additional processing.

Answered over 7 years ago by Paul Farnell
0
points

Thank you Paul ! Regards

Answered over 7 years ago by Olivier Jaquemet