The function http_send_response() did too much. It not only took
the request fields and built them together into a response, it
delegated too little and many functions were "hacked" into it, for
instance shady directory-changes for vhosts and hand-construction
of response structs.
The preparations for a rework were already made in previous commits,
including a tighter focus on the response-struct itself. Instead of
doing everything locally in the http_send_response() function, the
new http_prepare_response() only really takes the request-struct and
builds a response-struct. The response-struct is expanded such that
it's possible to do the data-sending simply with the response-struct
itself and not any other magic parameters that just drop out of the
function.
Another matter are the http_send_status()-calls. Because the
aforementioned function is so central, this refactoring has included
many areas. Instead of calling http_send_status() in every error-case,
which makes little sense now given we first delegate everything through
a response struct, errors are just sent as a return value and caught
centrally (in serve() in main.c), which centralizes the error handling
a bit.
It might look a bit strange now and it might not be clear in which
direction this is going, but subsequent commits will hopefully give
clarity in this regard.
Signed-off-by: Laslo Hunhold <dev@frign.de>
While off_t might be better suited for file-offsets and -sizes, the
IEEE Computer Society was unable to mandate limits (min, max) for it
in the POSIX specification in the last 32 years. Because it's impossible
to portably determine these numbers for signed integers, I decided
to switch to size_t for the offsets to be able to pass proper values
to strtonum(), because C99 is sane and has defined limits for size_t
(i.e. SIZE_MIN and SIZE_MAX).
On my system, long long and off_t have the same size, so it didn't
trigger any bugs, but strtonum() could pass a bigger number to
lower and upper than they can handle and make them overflow.
The rationale for switching to size_t is actually given by the fact that
functions like mmap() blur the border between memory and filesystem.
Another point is that glibc has a horrible define _FILE_OFFSET_BITS
you need to set to 64 to actually get decent values for off_t, which
was a huge headache in sbase until we found that out.
Signed-off-by: Laslo Hunhold <dev@frign.de>
I know that the effect of 'const' on compiler optimizations is smaller
than many believe, but it provides a good insight to the caller which
parameters are not modified and simplifies parallelization, in case
that is desired at a later point.
Throughout processing, the big structs mostly remained unmodified, with
the exception of parse_range(), which added a null-byte in the "Range"-
header to simplify its parsing. This commit refactors parse_range()
such that it won't modify this string anymore.
Additionally, the parser was made even stricter: Usually, strtoll()
(which is wrapped by strtonum()) allows whitespace and plus and minus
signs before the number, which is not part of the specification. The
stricter parser also better differentiates now between invalid requests
and range-lists. In that context, the switch in http_send_response()
was replaced for better readability.
Signed-off-by: Laslo Hunhold <dev@frign.de>
I wasn't happy with how responses were generated. HTTP-headers were
handled by hand and it was duplicated in multiple parts of the code.
Due to the duplication, some functions like timestamp() had really
ugly semantics.
The HTTP requests are parsed much better: We have an enum of fields
we care about that are automatically read into our request struct. This
commit adapts this idea to the response: We have an enum of fields
we might put into our response, and a response-struct holds the
content of these fields. A function http_send_header() automatically
sends a header based on the entries in response. In case we don't
use a field, we just leave the field in the response-struct empty.
With this commit, some logical changes came with it:
- timestamp() now has a sane signature, TIMESTAMP_LEN is no more and
it can now return proper errors and is also reentrant by using
gmtime_r() instead of gmtime()
- No more use of a static timestamp-array, making all the methods
also reentrant
- Better internal-error-reporting: Because the fields are filled
before and not during sending the response-headers, we can better
report any internal errors as status 500 instead of sending a
partial non-500-header and then dying.
These improved data structures make it easier to read and hack the code
and implement new features, if desired.
Signed-off-by: Laslo Hunhold <dev@frign.de>
Now that the range-support is actually working, we can send out
the Accept-Ranges-header so that clients know they can send
range-requests to the server.
This can be seen empirically when watching a video and skipping around
into unbuffered space. Previously, it would not be possible and the
time-selector would flip back to the furthest point the previous
buffering had progressed. Now it is working flawlessly.
Signed-off-by: Laslo Hunhold <dev@frign.de>
Stop immediately after responding with status code 304 "Not Modified".
This also solves missing log output for status 304.
If there is an error while sending a file, try to clean up and close the
file.
Based on a patch by guysv. We now make sure that the valid
path-characters ", ', <, >, & can not be used for XSS on a target, for
example with a file called
"><img src="blabla" onerror="alert(1)"
by properly HTML-escaping these characters.
Signed-off-by: Laslo Hunhold <dev@frign.de>
If charset is unspecified, the encoding falls back to ISO 8859-1 or
something else that is defined in HTTP/1.1.
Given there is no reason not to use UTF-8 nowadays[0] and one can convert
legacy encodings to UTF-8 easily, if the case comes up, it is a sane
default to specify it in the config.def.h.
[0]: https://utf8everywhere.org/
Signed-off-by: Laslo Hunhold <dev@frign.de>
Since now config.def.h has been reduced we don't have any more unused
variables and thus the manual fiddling with error-levels is no longer
necessary.
To get a completely clean result though we have to still cast some
variables here and there.
And many other things, too many to list here. For example, it now
properly logs uds instead of erroring out.
Separating concerns in many places definitely improves the readability.