404 | Daniel Miessler

What follows is a primer on the key security-oriented characteristics of the
HTTP protocol. It’s a collection of a number of different sub-topics,
explained in my own way
>, for the purpose of having a single reference point when needed.
-
Basics
> -
Query Strings, Parameters, and Ampersands
> -
URL Encoding
> -
Authentication
> -
HTTP Requests
> -
Request Methods
> -
HTTP Responses
> -
Status / Response Codes
> -
HTTP Headers
> -
Proxies
> -
Cookies
>
Basics
>
-
Message-based You make a request, you get a response.
-
Line-based Lines are quite significant in HTTP. Each header is on an
individual line (each line ends with a ), and a blank line separates the
header section from the optional body section. -
Stateless HTTP doesn’t have the concept of state built-in, which is why
things like cookies are used to track users within and across sessions.
Query Strings, Parameters, and Ampersands
>
-
Query Strings (?) A query string is defined by using the question mark
(?) character after the URL being requested, and it defines what is
being sent to the web application for processing. They are typically
used to pass the contents of HTML forms, and are encoded using
name:value pairs.http://google.com/search?query=mysearch -
Parameters (something=something) In the request above the parameter is
the “query” value–presumably indicating it’s what’s being searched for.
It is followed by an equals sign (=) and then the value of the
parameter.http://google.com/search?q=mysearch -
The Ampersand (&) Ampersands are used to separate a list of
parameters being sent to the same form, e.g. sending a query value, a
language, and a verbose value to a search
form.http://google.com/search?q=mysearch&lang=en&verbose=1
[ Ampersands are not mentioned in the HTTP spec itself; they are used as a
matter of convention. ]
URL Encoding
>
URL encoding seems more tricky than it is. It’s basically a workaround for a
single rule in RFC 1738, which states that:
…Only alphanumerics [0-9a-zA-Z], the special characters “$-_.+!*'(),” [not
including the quotes – ed], and reserved characters used for their reserved
purposes may be used unencoded within a URL.
The issue is that humans are inclined to use far more than just those
characters, so we need some way of getting the larger range of characters
transformed into the smaller, approved set. That’s what URL Encoding does.
As mentioned
here
>
in a most excellent piece on the topic, there are a few basic groups of
characters that need to be encoded:
-
ASCII Control Characters: because they’re not printable.
-
Non-ASCII Characters: because they’re not in the approved set
(see the requirement above from RFC 1738). This includes the upper
portion of the ISO-Latin character set (see
my encoding primer
>
to learn more about character sets) -
Reserved Characters: these are kind of like system variables in
programming–they mean something within URLs, so they can’t be used
outside of that meaning.
-
Dollar (“$”)
-
Ampersand (“&”)
-
Plus (“+”)
-
Comma (“,”)
-
Forward slash/Virgule (“/”)
-
Colon (“:”)
-
Semi-colon (“;”)
-
Equals (“=”)
-
Question mark (“?”)
-
‘At’ symbol (“@”)
-
Space ( )
-
Quotes (“”)
-
Less Than and Greater Than Symbols (<>)
-
Pound (#)
-
Percent (%)
-
Curly Braces ({})
-
The Pipe Symbol (|)
-
Backslash ()
-
Caret (^)
-
Tilde (~)
-
Square Brackets ([ ])
-
Backtick (`)
For any of these characters listed that can’t (or shouldn’t be) be put in a
URL natively, the following encoding algorithm must be used to make it
properly URL-encoded:
-
Find the
ISO 8859-1
>
code point for the character in question -
Convert that code point to two characters of hex
-
Append a percent sign (%) to the front of the two hex characters
This is why you see so many instances of %20 in your URLs. That’s the
URL-encoding for a space.
Authentication
>
Here are the primary HTTP authentication types:
Basic
-
A user requests page protected by basic auth
-
Server sends back a 401 and a WWW-Authenticate header with the value of
basic -
The client takes his username and password–separated by a colon–and
Base64 encodes it -
The client then sends that value in an Authorization header, like so:
Authorization: Basic BTxhZGRpbjpbcGAuINMlc2FtZC==
[ As the authors of The Web Application Hacker’s Handbook point out, Basic
Authentication isn’t as bad as people make it out to be. Or, to be more
precise, it’s no worse than Forms-based Authentication (the most common
type). The reason for this is simple: Both send credentials in plain-text by
default (actually, at least Basic offers Base64, whereas Forms-based isn’t
even encoded). Either way, the only way for either protocol to even approach
security is by adding SSL/TLS. ]
Digest
-
A user requests page protected by digest auth
-
The server sends back a 401 and a WWW-Authenticate header with the value
of digest along with a nonce value and a realm value -
The user concatenates his credentials with the nonce and realm and uses
that as input to MD5 to produce one has (HA1) -
The user concatenates the method and the URI to create a second MD5 hash
(HA2) -
The user then sends an Authorize header with the realm, nonce, URI, and
the response–which is the MD5 of the two previous hashes combined
Forms-based Authentication
This is the most common type of web authentication, and it works by
presenting a user with an HTML form for entering his/her username and
password, and then sends those values to the server for verification. Some
things to note:
-
The login information should be sent via POST rather than GET
-
The POST should be sent over HTTPS, not in the clear
-
Ideally, the entire login page itself should be HTTPS, not just the page
that the credentials are being sent to
Shown below is a typical structure of a login form (this one from
wordpress.com):
Source link