I wasn't happy with the tokenizer for the m- and v-flags, because it
was handling space-separated input and there was no way to have spaces
within the tokens themselves. This is a fine detail, but I didn't want
to impose this restriction where it could be solved (path prefixes or
folder names can very well contain spaces).
Given it's a bit quirky to handle multiple arguments to a single flag
in the command line, especially when parameters are optional, this
alternative wasn't further considered and I instead implemented a
tokenizer that allows escaping spaces with '\'.
While at it, I clarified the manual regarding this point.
Signed-off-by: Laslo Hunhold <dev@frign.de>
Put the chost-specification at the end and make it optional. This makes
more sense than having to give an arbitrary useless name in case you
weren't using virtual hosts in the first place.
While at it, clear up the wording in the manpage.
Signed-off-by: Laslo Hunhold <dev@frign.de>
After the initial report by Platon Ryzhikov, I couldn't validate this
behaviour with the given RFC 3986[0], which only speaks of percent encoding
for reserved characters.
[0]:https://tools.ietf.org/html/rfc3986
Signed-off-by: Laslo Hunhold <dev@frign.de>
If charset is unspecified, the encoding falls back to ISO 8859-1 or
something else that is defined in HTTP/1.1.
Given there is no reason not to use UTF-8 nowadays[0] and one can convert
legacy encodings to UTF-8 easily, if the case comes up, it is a sane
default to specify it in the config.def.h.
[0]: https://utf8everywhere.org/
Signed-off-by: Laslo Hunhold <dev@frign.de>
Don't append a forward slash if the length of a folder is PATH_MAX-1. This can
happen if HEADER_MAX is larger than PATH_MAX or if the `-m` option is used to
increase the path length.
This makes quark much more flexible when it is run behind a network
filter or other kind of tunnel. Only send an absolute redirection when
we are handling vhosts.
When cleaning up after a caught signal, quark forwards the signal to all
processes in the process group with `kill(0, ...)`. If we do not open up a new
process group in the parent process, quarks parent will be sent a SIG... too,
resulting it to shut down (especially considering that the parent process might
run as root).
As a result, if we set up the service with djb's excellent daemontools,
`svc -d quark` will terminate the svscan-process and tear all other services
down with it.
See also <https://cr.yp.to/daemontools/faq/create.html#pgrphack>.
We all agree that the IPv6 address format is a big clusterfuck and only
an insane person would've come up with it given the double colons
interfere with the way one actually appends a port to a normal IPv4 address.
To counteract in this issue, the RFC specifies that one should enclose
IPv6-addresses in square brackets to make the disctinction possible,
i.e.
host: ::1
port: 80
--> [::1]:80
The host field can contain both a port suffix and, of course by the RFC,
have the address enclosed in square brackets. Given I personally see
this as a "transport enclosure" I'd rather like to see it gone as soon
as possible and thus implement this cleanup in the http-header-parser so
the output is nice and clean and we don't have to deal with this garbage
later on.
Thanks to Josuah Demangeon <mail@josuah.net> for his wonderful input and
his dedication to read the RFCs 3986 and 2732 in such great detail.
The previous parsing of the -v vhosts made sure there were 4 tokens.
If there was no prefix specified, usage() is called. Now, it only
checks for the firsts 3, with .prefix set to null if there are only
3 tokens.
Since now config.def.h has been reduced we don't have any more unused
variables and thus the manual fiddling with error-levels is no longer
necessary.
To get a completely clean result though we have to still cast some
variables here and there.
The config.h-interface has proven to be very effective for a lot of
suckless tools, but it just does not make too much sense for a web
server like quark.
$ quark
If you run multiple instances of it, you want to see in the command line
(or top) what it does, and given the amount of options it's logical to
just express them as options given in the command line.
It also is a problem if you can modify quark via the config.h,
contradicting the manual. Just saying "Well, then don't touch config.h"
is also not good, as the vhost and map options were only exposed via
this interface.
What is left in config.h are mime-types and two constants relating to
the incoming HTTP-header-limits.
In order to introduce these changes, some structs and safe utility
functions were added and imported from OpenBSD respectively.
This makes quark's vhost-handling very powerful while still being
simple.
Imagine you have a website with a subdomain you really want
to move back to your main domain.
Say the subdomain is called "old.example.org" and you want to serve it
under "example.org" but in the subdirectory "old/", i.e. you want to
redirect a request "old.example.org/subdir/" to "example.org/old/subdir".
For a vhost-handler that only takes 4 arguments for each vhost this is
actually pretty powerful.
This ensures that quark really does not care if the incoming connection
is plain HTTP or relayed TLS-traffic from a proxy or tunnel. Depending
on the previous negotiation, the client will make the right decision on
which scheme to use in a given context.
To make the code a bit more flexible, let's get rid of the forking-code
in serve() and do it in main(). This way, we are more liberal in the
future to possibly handle it in a different way.
And many other things, too many to list here. For example, it now
properly logs uds instead of erroring out.
Separating concerns in many places definitely improves the readability.
- add missing header netinet/in.h for socket declarations (POSIX).
- rename sendfile to responsefile, sendfile(2) is a syscall on FreeBSD.
- remove _XOPEN_SOURCE: this will give a warning about strptime on Linux
glibc, but unbreaks the build on NetBSD and FreeBSD.
thanks also to josuah and quinq for testing!
Thanks Michael Forney for reporting this! We cannot use identifiers
beginning with an underscore, says the C99-standard, section 7.1.3:
"All identifiers that begin with an underscore are always reserved for
use as identifiers with file scope in both the ordinary and tag name
spaces."
We go around this by putting the underscore at the end.
The (c)-symbol has become more of a remnant after the Berne convention
has been signed. Given the ISC exploits some simplifications introduced
with the Berne convention, it just makes sense to drop this relict as
well and just state our Copyright without much ado about nothing.
https://opensource.org/licenses/ISC
Check for its presence and bail out if found.
If the socket file is present, either a server is already bound to it,
or the last one errored out and we'd want to inspect this.
Also it could be an unrelated file given by error.