After the initial report by Platon Ryzhikov, I couldn't validate this
behaviour with the given RFC 3986[0], which only speaks of percent encoding
for reserved characters.
[0]:https://tools.ietf.org/html/rfc3986
Signed-off-by: Laslo Hunhold <dev@frign.de>
If charset is unspecified, the encoding falls back to ISO 8859-1 or
something else that is defined in HTTP/1.1.
Given there is no reason not to use UTF-8 nowadays[0] and one can convert
legacy encodings to UTF-8 easily, if the case comes up, it is a sane
default to specify it in the config.def.h.
[0]: https://utf8everywhere.org/
Signed-off-by: Laslo Hunhold <dev@frign.de>
Don't append a forward slash if the length of a folder is PATH_MAX-1. This can
happen if HEADER_MAX is larger than PATH_MAX or if the `-m` option is used to
increase the path length.
This makes quark much more flexible when it is run behind a network
filter or other kind of tunnel. Only send an absolute redirection when
we are handling vhosts.
We all agree that the IPv6 address format is a big clusterfuck and only
an insane person would've come up with it given the double colons
interfere with the way one actually appends a port to a normal IPv4 address.
To counteract in this issue, the RFC specifies that one should enclose
IPv6-addresses in square brackets to make the disctinction possible,
i.e.
host: ::1
port: 80
--> [::1]:80
The host field can contain both a port suffix and, of course by the RFC,
have the address enclosed in square brackets. Given I personally see
this as a "transport enclosure" I'd rather like to see it gone as soon
as possible and thus implement this cleanup in the http-header-parser so
the output is nice and clean and we don't have to deal with this garbage
later on.
Thanks to Josuah Demangeon <mail@josuah.net> for his wonderful input and
his dedication to read the RFCs 3986 and 2732 in such great detail.
Since now config.def.h has been reduced we don't have any more unused
variables and thus the manual fiddling with error-levels is no longer
necessary.
To get a completely clean result though we have to still cast some
variables here and there.
The config.h-interface has proven to be very effective for a lot of
suckless tools, but it just does not make too much sense for a web
server like quark.
$ quark
If you run multiple instances of it, you want to see in the command line
(or top) what it does, and given the amount of options it's logical to
just express them as options given in the command line.
It also is a problem if you can modify quark via the config.h,
contradicting the manual. Just saying "Well, then don't touch config.h"
is also not good, as the vhost and map options were only exposed via
this interface.
What is left in config.h are mime-types and two constants relating to
the incoming HTTP-header-limits.
In order to introduce these changes, some structs and safe utility
functions were added and imported from OpenBSD respectively.
This makes quark's vhost-handling very powerful while still being
simple.
Imagine you have a website with a subdomain you really want
to move back to your main domain.
Say the subdomain is called "old.example.org" and you want to serve it
under "example.org" but in the subdirectory "old/", i.e. you want to
redirect a request "old.example.org/subdir/" to "example.org/old/subdir".
For a vhost-handler that only takes 4 arguments for each vhost this is
actually pretty powerful.
This ensures that quark really does not care if the incoming connection
is plain HTTP or relayed TLS-traffic from a proxy or tunnel. Depending
on the previous negotiation, the client will make the right decision on
which scheme to use in a given context.
And many other things, too many to list here. For example, it now
properly logs uds instead of erroring out.
Separating concerns in many places definitely improves the readability.