Commit graph

27 commits

Author SHA1 Message Date
Laslo Hunhold
cb7a1f6390
Replace off_t with size_t
While off_t might be better suited for file-offsets and -sizes, the
IEEE Computer Society was unable to mandate limits (min, max) for it
in the POSIX specification in the last 32 years. Because it's impossible
to portably determine these numbers for signed integers, I decided
to switch to size_t for the offsets to be able to pass proper values
to strtonum(), because C99 is sane and has defined limits for size_t
(i.e. SIZE_MIN and SIZE_MAX).

On my system, long long and off_t have the same size, so it didn't
trigger any bugs, but strtonum() could pass a bigger number to
lower and upper than they can handle and make them overflow.

The rationale for switching to size_t is actually given by the fact that
functions like mmap() blur the border between memory and filesystem.
Another point is that glibc has a horrible define _FILE_OFFSET_BITS
you need to set to 64 to actually get decent values for off_t, which
was a huge headache in sbase until we found that out.

Signed-off-by: Laslo Hunhold <dev@frign.de>
2020-08-05 18:59:55 +02:00
Laslo Hunhold
d105c28aad
Ensure const-correctness where possible and refactor parse_range()
I know that the effect of 'const' on compiler optimizations is smaller
than many believe, but it provides a good insight to the caller which
parameters are not modified and simplifies parallelization, in case
that is desired at a later point.

Throughout processing, the big structs mostly remained unmodified, with
the exception of parse_range(), which added a null-byte in the "Range"-
header to simplify its parsing. This commit refactors parse_range()
such that it won't modify this string anymore.

Additionally, the parser was made even stricter: Usually, strtoll()
(which is wrapped by strtonum()) allows whitespace and plus and minus
signs before the number, which is not part of the specification. The
stricter parser also better differentiates now between invalid requests
and range-lists. In that context, the switch in http_send_response()
was replaced for better readability.

Signed-off-by: Laslo Hunhold <dev@frign.de>
2020-08-05 18:28:21 +02:00
Laslo Hunhold
90d5179ea0
Rename REQ_MOD to REQ_IF_MODIFIED_SINCE
The named constants for header fields of the response struct all
pretty much matched the actual header name, which I think improves
readability for everyone familiar with the HTTP-spec.

The request header fields named constants followed the rule, except
the "If-Modified-Since"-header, which is addressed in this commit.

Signed-off-by: Laslo Hunhold <dev@frign.de>
2020-08-05 15:46:03 +02:00
Laslo Hunhold
2c50d0c654
Rename request "r" to "req"
Now that we have response-structs called "res", the naming "r" is a
bit ambiguous.

Signed-off-by: Laslo Hunhold <dev@frign.de>
2020-08-05 15:43:29 +02:00
Laslo Hunhold
c51b31d7ac
Refactor response-generation
I wasn't happy with how responses were generated. HTTP-headers were
handled by hand and it was duplicated in multiple parts of the code.
Due to the duplication, some functions like timestamp() had really
ugly semantics.

The HTTP requests are parsed much better: We have an enum of fields
we care about that are automatically read into our request struct. This
commit adapts this idea to the response: We have an enum of fields
we might put into our response, and a response-struct holds the
content of these fields. A function http_send_header() automatically
sends a header based on the entries in response. In case we don't
use a field, we just leave the field in the response-struct empty.

With this commit, some logical changes came with it:

  - timestamp() now has a sane signature, TIMESTAMP_LEN is no more and
    it can now return proper errors and is also reentrant by using
    gmtime_r() instead of gmtime()
  - No more use of a static timestamp-array, making all the methods
    also reentrant
  - Better internal-error-reporting: Because the fields are filled
    before and not during sending the response-headers, we can better
    report any internal errors as status 500 instead of sending a
    partial non-500-header and then dying.

These improved data structures make it easier to read and hack the code
and implement new features, if desired.

Signed-off-by: Laslo Hunhold <dev@frign.de>
2020-08-05 13:41:44 +02:00
Laslo Hunhold
26c593ade1
Refactor range-parsing into a separate function
The method http_send_response() is already long enough and this
separation of concerns both helps shorten it a bit, improves
readability and reduces the chance of programming errors.

Signed-off-by: Laslo Hunhold <dev@frign.de>
2020-08-04 16:32:54 +02:00
Laslo Hunhold
db4e35d3d5
Refactor range-parsing
Quark previously didn't really handle suffix-range-requests
(those of the form "-num", asking for the last num bytes) properly
and also did not catch the error when the lower in the range
"lower-upper" was actually larger than or equal to the size of the
requested file.

I always planned to refactor the parsing but got the motivation by
Eric Radman <ericshane@eradman.com>, who kindly reported the latter bug
to me.

Signed-off-by: Laslo Hunhold <dev@frign.de>
2020-07-23 18:16:08 +02:00
Laslo Hunhold
6b508a0e07
Explicitly initialize struct tm with zeroes
This is recommended by the manual as strptime(), in principle,
might only touch the fields it parses from the string. Given
the struct tm implementations differ from operating system to
operating system, we make sure and set everything to zero
before passing it to strptime().

Signed-off-by: Laslo Hunhold <dev@frign.de>
2020-07-23 16:54:21 +02:00
Laslo Hunhold
660b308617
Use timegm() instead of mktime() to generate UNIX-timestamp
The broken down time-representation tm generated earlier is in UTC,
and mktime() assumes that it's in local time instead, leading to
the problem that quark might not send a NOT_MODIFIED in a different
timezone.

timegm() instead correctly interprets the broken down
time-representation tm as UTC and returns the proper timestamp.
It might not be portable like mktime(), but it's complicated to
emulate it otherwise.

Thanks to Jeremy Bobbin <jer@jer.cx> for reporting the bug and
providing this fix, which is why I've added him to the LICENSE.
Thanks also to Hiltjo for his input.

Signed-off-by: Laslo Hunhold <dev@frign.de>
2020-07-23 16:48:34 +02:00
Rainer Holzner
b7d0d6889d
Fix for sending HTTP response status 304
Stop immediately after responding with status code 304 "Not Modified".
This also solves missing log output for status 304.

If there is an error while sending a file, try to clean up and close the
file.
2020-05-07 13:40:29 +02:00
Laslo Hunhold
065394cb64
Change target prefix mapping argument order
Put the chost-specification at the end and make it optional. This makes
more sense than having to give an arbitrary useless name in case you
weren't using virtual hosts in the first place.

While at it, clear up the wording in the manpage.

Signed-off-by: Laslo Hunhold <dev@frign.de>
2019-02-24 00:53:03 +01:00
Laslo Hunhold
e299e186ed
Don't replace '+' with ' ' when decoding URLs
After the initial report by Platon Ryzhikov, I couldn't validate this
behaviour with the given RFC 3986[0], which only speaks of percent encoding
for reserved characters.

[0]:https://tools.ietf.org/html/rfc3986

Signed-off-by: Laslo Hunhold <dev@frign.de>
2019-01-10 22:02:23 +01:00
Laslo Hunhold
bbd47e1427
Specify UTF-8 for non-binary content-types
If charset is unspecified, the encoding falls back to ISO 8859-1 or
something else that is defined in HTTP/1.1.

Given there is no reason not to use UTF-8 nowadays[0] and one can convert
legacy encodings to UTF-8 easily, if the case comes up, it is a sane
default to specify it in the config.def.h.

[0]: https://utf8everywhere.org/

Signed-off-by: Laslo Hunhold <dev@frign.de>
2019-01-02 17:04:23 +01:00
Aaron Burrow
d2013a6337 Fix one byte NULL stack overflow
Don't append a forward slash if the length of a folder is PATH_MAX-1. This can
happen if HEADER_MAX is larger than PATH_MAX or if the `-m` option is used to
increase the path length.
2018-07-16 22:48:20 +02:00
Laslo Hunhold
9ff3f780e1 Send a relative redirection header wherever possible
This makes quark much more flexible when it is run behind a network
filter or other kind of tunnel. Only send an absolute redirection when
we are handling vhosts.
2018-07-02 18:43:06 +02:00
Laslo Hunhold
3ff82c514b Clean up request host properly
We all agree that the IPv6 address format is a big clusterfuck and only
an insane person would've come up with it given the double colons
interfere with the way one actually appends a port to a normal IPv4 address.

To counteract in this issue, the RFC specifies that one should enclose
IPv6-addresses in square brackets to make the disctinction possible,
i.e.

	host: ::1
	port: 80

	--> [::1]:80

The host field can contain both a port suffix and, of course by the RFC,
have the address enclosed in square brackets. Given I personally see
this as a "transport enclosure" I'd rather like to see it gone as soon
as possible and thus implement this cleanup in the http-header-parser so
the output is nice and clean and we don't have to deal with this garbage
later on.

Thanks to Josuah Demangeon <mail@josuah.net> for his wonderful input and
his dedication to read the RFCs 3986 and 2732 in such great detail.
2018-04-03 01:03:03 +02:00
Laslo Hunhold
a20136fa18 Update the documentation to reflect the new flag-centric usage 2018-03-05 09:51:29 +01:00
Hiltjo Posthuma
444b8f5b32 http_send_response: fix undefined behaviour for copying the target string
... the format string and buffer were the same (undefined behaviour).
2018-03-05 01:21:14 +01:00
Laslo Hunhold
c8401c591f Add esnprintf() and refactor some code
The (size_t) discards the case where the return value of snprintf is < 0. This
is rather unlikely, but we'll keep it in mind anyway.
2018-03-05 00:59:37 +01:00
Laslo Hunhold
1879e14e79 Be extra pedantic again and remove all warnings
Since now config.def.h has been reduced we don't have any more unused
variables and thus the manual fiddling with error-levels is no longer
necessary.
To get a completely clean result though we have to still cast some
variables here and there.
2018-03-05 00:30:53 +01:00
Quentin Rameau
3ff3e5ea6e Add some missing headers and interface visibility macro
strings.h for strncasecmp
time.h for strptime
2018-03-05 00:21:54 +01:00
Laslo Hunhold
6b55e36036 Introduce flag-centric usage
The config.h-interface has proven to be very effective for a lot of
suckless tools, but it just does not make too much sense for a web
server like quark.

 $ quark

If you run multiple instances of it, you want to see in the command line
(or top) what it does, and given the amount of options it's logical to
just express them as options given in the command line.
It also is a problem if you can modify quark via the config.h,
contradicting the manual. Just saying "Well, then don't touch config.h"
is also not good, as the vhost and map options were only exposed via
this interface.

What is left in config.h are mime-types and two constants relating to
the incoming HTTP-header-limits.

In order to introduce these changes, some structs and safe utility
functions were added and imported from OpenBSD respectively.
2018-03-05 00:14:25 +01:00
Laslo Hunhold
7b7f166dd5 Add target prefix mapping
This allows e.g. to redirect when a directory has been moved.
2018-02-27 12:43:05 +01:00
Laslo Hunhold
02d6ae5a57 Add support for adding a prefix to a target when matching vhosts
This makes quark's vhost-handling very powerful while still being
simple.

Imagine you have a website with a subdomain you really want
to move back to your main domain.
Say the subdomain is called "old.example.org" and you want to serve it
under "example.org" but in the subdirectory "old/", i.e. you want to
redirect a request "old.example.org/subdir/" to "example.org/old/subdir".

For a vhost-handler that only takes 4 arguments for each vhost this is
actually pretty powerful.
2018-02-27 11:36:24 +01:00
Laslo Hunhold
4948053bee Use scheme-relative (aka protocol-relative) URLs for redirects
This ensures that quark really does not care if the incoming connection
is plain HTTP or relayed TLS-traffic from a proxy or tunnel. Depending
on the previous negotiation, the client will make the right decision on
which scheme to use in a given context.
2018-02-27 03:38:55 +01:00
Josuah Demangeon
55d7f000cd add headers to make it compile under OpenBSD
- 'struct in6_addr' is defined in <netinet/in.h>
- 'AF_INET6' is defined in <sys/socket.h>
2018-02-12 20:35:37 +01:00
Laslo Hunhold
ccdb51b96d Refactor the single source file into multiple modules
And many other things, too many to list here. For example, it now
properly logs uds instead of erroring out.
Separating concerns in many places definitely improves the readability.
2018-02-04 21:27:33 +01:00