In the underscore case, the next character is retrieved to check
whether the underscore is at a word break. However, if this character
is UTF8_INVALID, the call to parser_pushch will be a noop. This
results in the loop continuing on further than it should. This just
adds a check to see if next is UTF8_INVALID and returns if it is.
Signed-off-by: Brian Ashworth <bosrsf04@gmail.com>
Currently, the first underscore encountered while underlining ends
underlining. As a result, underscores in underlined words are not
ignored e.g. _hello_world_ does not parse correctly.
This checks the next character to see if it is still in a word before
ending underlining.
Regardless of standards considerations, if there's any advice
that needs to be hammered into man authors, it's to be concise
and accurate, but not pedantic. As Will Strunk commanded,
"Omit needless words."
The most needless words of all are promotional. No man page
should utter words like "powerful", "extraordinarily versatile",
"user-friendly", or "has a wide range of options".
-- Doug McIlroy[1]
[1] https://lists.gnu.org/archive/html/groff/2018-11/msg00058.html
An empty string will rarely be useful, since the only thing that
can be done to it is appending a character with the current state
of the string API. Storing empty strings with a NULL storage pointer
creates unnecessary edge cases in any code handling strings.
The tables test no longer segfaults.
The environment variable SOURCE_DATE_EPOCH [0] is standardized and can
be used to produce reproducible output. Distributions like Debian will
set this variable before the build and scdoc should use it (instead of
the current date) for any timestamps within the man pages.
[0]: https://reproducible-builds.org/docs/source-date-epoch/