The commit message has a very large left and right padding, but doesn't
improve readability and might make the commit message more difficult to
read on smaller screens.
This patch shortens the padding.
With the Python 2 to 3 migration and the type checking, we can be
fairly confident that smstr are always constructed from strings, not
bytes.
This allows the code to be simplified, as we no longer need to carry
the dual raw/unicode representation.
This patch introduces type annotations, which can be checked with mypy.
The coverage is not very comprehensive for now, but it is a starting
point and will be expanded in later patches.
This patch applies auto-formatting of the source code using black
(https://github.com/psf/black).
This makes the code style more uniform and simplifies editing.
Note I also tried yapf, and IMO produced nicer output and handled some
corner cases much better, but unfortunately it doesn't yet support type
annotations, which will be introduced in later commits.
So in the future we might switch to yapf instead.
Python 3 was released more than 10 years ago, and support for Python 2
is going away, with many Linux distributions starting to phase it out.
This patch migrates git-arr to Python 3.
The generated output is almost exactly the same, there are some minor
differences such as HTML characters being quoted more aggresively, and
handling of paths with non-utf8 values.
By default, the markdown generator creates links for local files
transparently. For example, "[text](link.md)" will generate
"<a href=link.md>text</a>".
This works fine for absolute and external links, but breaks for links
relative to the repository itself, as git-arr links are of the form
"dir/f=file.ext.html".
So this patch adds a markdown extension to rewrite the links. It uses a
heuristic to detect them, which should work for the vast majority of
common cases.
The default CSS is not very comfortable for markdown, as for example the
links are hidden.
This patch makes the markdown CSS tunable by wrapping it into a div, and
then adjusting the default styles to increase readability.
As an experiment, make the sections of the summary to be toggable. This
can help readability, although it's unclear if it's worth the additional
complexity and could be removed later.
Including the tree as part of the summary gives a bit more information
and provides an easy path into the tree.
It does clutter things a bit, so this is an experiment and may be
removed later.
An empty string as a pathspec element matches all paths, but git has
recently started complaining about it, as it could be problematic for
some operations like rm. In the future, it will be considered an error.
So this patch uses "." instead of the empty pathspec, as recommended.
d426430e6e
We display the location of the repository, but the entire row is not
convenient for copy-pasting.
This patch changes the wording to "git clone" so the entire row can be
copied and pasted into a terminal.
There's a trick, because if we just changed the wording to:
<td>git clone</td> <td>https://example.com/repo</td>
that would get copied as:
git clone\thttps://example.com/repo
which does not work well when pasted into a terminal (as the \t gets
"eaten" in most cases).
So this patch changes the HTML to have a space after "clone":
<td>git clone </td> <td>https://example.com/repo</td>
and the CSS to preserve the space, so the following gets copied:
git clone \thttps://example.com/repo
which works when pasting on a terminal.
There's a significant amount of overrides to make the font sizes
smaller, but that can hurt readability in some cases. We should try to
use the "natural" sizes as much as possible.
This patch does that, removing a lot of the font sizes and bringing them
to be based on the normal sizes.
It also changes listings to use monospace, for readability.
When using pygments, make the line numbers grey.
This was the intention all along, but the <a> style overrides the <div>
style and the grey colour does not take effect.
This patch fixes the problem by setting the style specifically to <a>
within the line numbers div.
This patch adds a "prefix" configuration option, so repositories created
with recursion are named with a prefix.
This can be useful to disambiguate between repositories that are named
the same but live in different directories.
This patch moves the pages to HTML5, and adds some simple meta tags and CSS media
constraints so things render better on mobile browsers, while leaving the
desktop unaffected.
It's still not ideal, though.
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
When having symbolic links to the same repositories (e.g. if you have "repo"
and a "repo.git" linking to it), it can be useful to ignore based on regular
expressions to avoid having duplicates in the output.
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
The top level index contains a "last updated" field, but it doesn't get
updated if using the --only option, which is very common in post-update hooks,
and causes the date to be stale.
This patch fixes that by always generating the top level index, even if --only
was given.
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
The page title in a root tree displays as "git >> repo >> branch >>",
which looks odd and fails to convey the fact that the page represents a
tree. Appending a '/' (for example "git >> repo >> branch >> /") makes
it more obvious that the page shows a tree, in general, and the root
tree, in particular.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
For blobs in subdirectories, the page title always includes a double
slash between the final directory component and the filename (for
example, "git >> repo >> branch >> doc//readme.txt"). This is unsightly.
git-arr:blob() ensures that the directory passed to views/blob always
has a trailing slash, so we can drop the slash inserted by views/blob
between the directory and the filename.
As a side-effect, this also changes the page title for blobs in the root
directory. Instead of "git >> repo >> branch >> /readme.txt", the title
becomes "git >> repo >> branch >> readme.txt", which is slightly more
aesthetically pleasing.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Binding (or "pegging") a Repo at a particular branch via new_in_branch()
increases the cognitive burden since the reader must maintain a mental
model of which Repo instances are pegged and which are not. This burden
outweighs whatever minor convenience (if any) is gained by pegging the
Repo at a particular branch. It is easier to reason about the code when
the branch name is passed to clients directly rather than indirectly via
a pegged Repo.
Preceding patches retired all callers of new_in_branch(), therefore
remove it.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Passing the branch name into the view indirectly via
Repo.new_in_branch() increases cognitive burden, thus outweighing
whatever minor convenience (if any) is gained by doing so. The code is
easier to reason about when the branch name is passed to the view
directly.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Passing the branch name into the view indirectly via
Repo.new_in_branch() increases cognitive burden, thus outweighing
whatever minor convenience (if any) is gained by doing so. The code is
easier to reason about when the branch name is passed to the view
directly.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Passing the branch name into the view indirectly via
Repo.new_in_branch() increases cognitive burden, thus outweighing
whatever minor convenience (if any) is gained by doing so. The code is
easier to reason about when the branch name is passed to the view
directly.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Empty (zero-length) blobs are currently rendered by 'pygments'
misleadingly as a single empty line, or, when 'pygments' is unavailable,
as "nothingness" preceding a horizontal rule. In either case, it is
somewhat difficult to glean concrete information about the blob.
Address this by instead rendering summary information about the blob: in
particular, its classification ("empty") and its size ("0 bytes"). This
is analogous to the summary information rendered for binary blobs
("binary" and size).
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Although hexdump(1)-style rendering of binary blob content may reveal
some meaningful information about the data, it wastes even more storage
space than embedding the raw data itself. However, many binary files
have a "magic number" or other signature near the beginning of the file,
so it is often possible to glean useful information from just the
initial chunk of the file without having the entire content available.
Thus, limiting the rendered data to just an initial chunk saves storage
space while still potentially presenting useful information about the
binary content.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Raw binary blob content tends to look like "line noise" and is rarely,
if ever, meaningful. A hexdump(1)-style rendering (specifically,
"hexdump -C"), on the other hand, showing runs of hexadecimal byte
values along with an ASCII representation of those bytes can sometimes
reveal useful information about the data.
(A subsequent patch will add the ability to cap the amount of data
rendered in order to reduce storage space requirements.)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Binary blobs are currently rendered as raw data directly into the HTML
output, looking much like "line noise". This is rarely, if ever,
meaningful, and consumes considerable storage space since the entire raw
blob content is embedded in the generated HTML file.
Address this issue by instead emitting summary information about the
blob, such as its classification ("binary") and its size. Other
information can be added as needed.
As in Git itself, a blob is considered binary if a NUL is present in the
first ~8KB.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Batch output of git-cat-file has the form:
<sha1> SP <type> SP <size> LF <contents> LF
It unconditionally includes a trailing line-feed which Repo.blob()
incorrectly returns as part of blob content. For textual blobs, this
extra character is often benign, however, for binary blobs, it can
easily change the meaning of the data in unexpected or disastrous ways.
Fix this by respecting the blob size reported by git-cat-file.
(The alternate approach of unconditionally dropping the final LF also
works, however, respecting the reported size is perhaps a bit more
robust and "correct".)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Historically, the 'blob' view was unconditionally handed cooked
(utf8-encoded) blob content, so embed_image_blob(), which requires raw
blob content, has been forced to reload the blob in raw form, which is
ugly and expensive. However, now that the Blob returned by Repo.blob()
is able to vend raw or cooked content, it is no longer necessary for
embed_image_blob() to reload the blob to gain access to the raw content.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Some blob representations require raw blob content, however, the 'blob'
view is unconditionally handed cooked (utf8-encoded) content, thus
representations which need raw content are forced to reload the blob in
raw form, which is ugly and expensive.
The ultimate goal is to eliminate the wasteful blob reloading when raw
content is needed. Toward that end, teach Blob how to vend raw or cooked
content.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Some blob representations (such as embedded images) require raw blob
content, however, the 'blob' view is unconditionally handed cooked
(utf8-encoded) content, thus representations which need raw content are
forced to reload the blob in raw form, which is ugly and expensive (due
to shelling out to git-cat-file a second time).
The ultimate goal is to eliminate the wasteful blob reloading when raw
content is needed. As a first step, introduce a Blob abstraction to be
returned by Repo.blob() rather than the cooked content. A subsequent
change will flesh out Blob, allowing it to return raw or cooked content
on demand without the client having to specify one or the other when
invoking Repo.blob().
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Sneakily extracting the raw 'fd' from the utf8-encoding wrapper
returned by GitCommand.run() is ugly and fragile. Instead, take
advantage of the new formal API for requesting raw command output.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Currently, clients which want the raw output from a Git command must
sneakily extract the raw 'fd' from the utf8-encoding wrapper returned
by GitCommand.run(). This is ugly and fragile. Instead, provide a
formal mechanism for requesting raw output.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Currently, clients which want the raw output from a Git command must
sneakily extract the raw 'fd' from the utf8-encoding wrapper returned
by run_git(). This is ugly and fragile. Instead, provide a formal
mechanism for requesting raw output.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
The 'max_pages' default value of 5 is quite low. Coupled with
'commits_per_page' default 50, this allows for only 250 commits, which
is likely unsuitable for even relatively small projects. Options are to
remove the cap altogether or to raise the default limit. At this time,
choose the latter, which should be friendlier to larger projects, in
general, while still guarding against run-away storage space
consumption.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Branch names in Git may be hierarchical (for example, "wip/parser/fix"),
however, git-arr's Bottle routing rules do not take this into account.
Fix this shortcoming.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Branch names in Git may be hierarchical (for example, "wip/parser/fix"),
however, git-arr does not take this into account in its Bottle routing
rules.
Unfortunately, when updated to recognize hierarchical branch names, the
rules become ambiguous in their present order since Bottle matches them
in the order registered. The ambiguity results in incorrect matches. For
instance, branch pages (/r/<repo>/b/<bname>/) are matched before tree
pages (/r/<repo>/b/<bname>/t/), however, when branch names can be
hierarchical, a tree path such as "/r/proj/b/branch/t/" also looks like
a branch named "branch/t", and thus undesirably matches the branch rule
rather than the tree rule. This problem can be resolved by adjusting the
order of rules.
Therefore, re-order the rules from most to least specific as a
preparatory step prior to actually fixing them to accept hierarchical
branch names. This is a purely textual relocation. No functional
changes intended.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Git branch names can be hierarchical (for example, "wip/parser/fix"),
however, git-arr does not take this into account when formulating URLs
on branch, tree, and blobs pages. These URLs are dysfunctional because
it is assumed incorrectly that a single "../" is sufficient to climb
over the branch name when computing relative paths to resources higher
in the hierarchy. This problem manifests as failure to load static
resources (stylesheet, etc.), broken links to commits on branch pages,
and malfunctioning breadcrumb trails.
Fix this problem by computing the the proper number of "../" based upon
the branch name, rather than assuming that a single "../" will work
unconditionally. (This is analogous to the treatment already given to
hierarchical pathnames in tree and blob views.)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Pagination link "next" does not respect 'max_pages', thus it incorrectly
remains enabled on the final page capped by 'max_pages'. When clicked,
the user is taken to a "404 Page not found" error page, which makes for
a poor user experience.
Fix this problem by teaching the "next" link to respect 'max_pages'.
(As a side-effect, this also causes 'serve' mode to respect 'max_pages',
which was not previously the case. This change of behavior is
appropriate since it brings 'serve' mode, which is intended primarily
for testing, more in line with 'generate' mode.)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
When the number of commits on a branch page is less than
'commits_per_page', the pagination "next" link is disabled, indicating
correctly that this is the final page. However, if the number of commits
on the branch page is exactly 'commits_per_page', then the "next" link
is incorrectly enabled, even on the final page. When clicked, the user
is taken to a "404 Page not found" error page, which makes for a poor
user experience.
Fix this problem by reliably detecting when the branch page is the final
one. Do so by asking for (but not displaying) one commit more than
actually needed by the page. If the additional commit is successfully
retrieved, then another page definitely follows this one. If not
retrieved, then this is definitely the final page.
(Unfortunately, the seemingly more expedient approach of checking if the
final commit on the current page is a root commit -- has no parents --
is not a reliable indicator that this the final page since a branch may
have multiple root commits due to merging.)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
At its inception, Git did not show a "creation event" diff for a
project's root commit since early projects, such as the Linux kernel,
were already well established, and a large root diff was considered
uninteresting noise.
On the other hand, new projects adopting Git typically have small root
commits, and such a "creation event" is likely to have meaning, rather
than being pure noise. Consequently, git-diff-tree gained a --root flag
in dc26bd89 (diff-tree: add "--root" flag to show a root commit as a big
creation event, 2005-05-19), though it was disabled by default.
Displaying the root "creation event" diff, however, became the default
behavior when configuration option 'log.showroot' was added to git-log
in 0f03ca94 (config option log.showroot to show the diff of root
commits; 2006-11-23). And, gitk (belatedly) followed suit when it
learned to respect 'log.showroot' in b2b76d10 (gitk: Teach gitk to
respect log.showroot; 2011-10-04).
By default, these tools now all show the root diff as a "creation
event", however, git-arr suppresses it unconditionally. Resolve this
shortcoming by adding a new git-arr configuration option "rootdiff" to
control the behavior (enabled by default).
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
By default, git-arr limits the number of pages of commits to 5, however,
it is reasonable to expect that some projects will want all commits to
be shown. Rather than forcing such projects to choose an arbitrarily
large number as the value of 'max_pages', provide a formal mechanism to
specify unlimited commit pages.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
When emitting a blob in the root tree of a commit, write_tree() composes
the blob's HTML filename with an extra slash before the "f=", like this:
output/r/repo/b/master/t//f=README.txt.html
Although the double-slash is not harmful on Unix, it is unsightly, and
may be problematic for other platforms or filesystems which interpret
double-slash specially or disallow it. Therefore, suppress the extra
slash for blobs in the root tree.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>