This patch memoizes some of the functions to help speed up execution.
The speedup is quite variable, but ~30% is normal when generating a
medium size repository, and the output is byte-for-byte identical.
The Markdown extension for rewriting local links was using an API that
is now deprecated, and as of python-markdown 3.4 it is no longer
available.
This patch adjusts the code to use the new API which should be available
from 3.0 onwards.
This patch introduces type annotations, which can be checked with mypy.
The coverage is not very comprehensive for now, but it is a starting
point and will be expanded in later patches.
This patch applies auto-formatting of the source code using black
(https://github.com/psf/black).
This makes the code style more uniform and simplifies editing.
Note I also tried yapf, and IMO produced nicer output and handled some
corner cases much better, but unfortunately it doesn't yet support type
annotations, which will be introduced in later commits.
So in the future we might switch to yapf instead.
Python 3 was released more than 10 years ago, and support for Python 2
is going away, with many Linux distributions starting to phase it out.
This patch migrates git-arr to Python 3.
The generated output is almost exactly the same, there are some minor
differences such as HTML characters being quoted more aggresively, and
handling of paths with non-utf8 values.
By default, the markdown generator creates links for local files
transparently. For example, "[text](link.md)" will generate
"<a href=link.md>text</a>".
This works fine for absolute and external links, but breaks for links
relative to the repository itself, as git-arr links are of the form
"dir/f=file.ext.html".
So this patch adds a markdown extension to rewrite the links. It uses a
heuristic to detect them, which should work for the vast majority of
common cases.
Raw binary blob content tends to look like "line noise" and is rarely,
if ever, meaningful. A hexdump(1)-style rendering (specifically,
"hexdump -C"), on the other hand, showing runs of hexadecimal byte
values along with an ASCII representation of those bytes can sometimes
reveal useful information about the data.
(A subsequent patch will add the ability to cap the amount of data
rendered in order to reduce storage space requirements.)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Binary blobs are currently rendered as raw data directly into the HTML
output, looking much like "line noise". This is rarely, if ever,
meaningful, and consumes considerable storage space since the entire raw
blob content is embedded in the generated HTML file.
Address this issue by instead emitting summary information about the
blob, such as its classification ("binary") and its size. Other
information can be added as needed.
As in Git itself, a blob is considered binary if a NUL is present in the
first ~8KB.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Historically, the 'blob' view was unconditionally handed cooked
(utf8-encoded) blob content, so embed_image_blob(), which requires raw
blob content, has been forced to reload the blob in raw form, which is
ugly and expensive. However, now that the Blob returned by Repo.blob()
is able to vend raw or cooked content, it is no longer necessary for
embed_image_blob() to reload the blob to gain access to the raw content.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Some blob representations require raw blob content, however, the 'blob'
view is unconditionally handed cooked (utf8-encoded) content, thus
representations which need raw content are forced to reload the blob in
raw form, which is ugly and expensive.
The ultimate goal is to eliminate the wasteful blob reloading when raw
content is needed. Toward that end, teach Blob how to vend raw or cooked
content.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Some blob representations (such as embedded images) require raw blob
content, however, the 'blob' view is unconditionally handed cooked
(utf8-encoded) content, thus representations which need raw content are
forced to reload the blob in raw form, which is ugly and expensive (due
to shelling out to git-cat-file a second time).
The ultimate goal is to eliminate the wasteful blob reloading when raw
content is needed. As a first step, introduce a Blob abstraction to be
returned by Repo.blob() rather than the cooked content. A subsequent
change will flesh out Blob, allowing it to return raw or cooked content
on demand without the client having to specify one or the other when
invoking Repo.blob().
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
This patch introduces the embed_markdown and embed_images configuration
options, so users can enable and disable those features on a per-repository
basis.
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
This patch makes minor changes to the code that handles embedded images,
mostly to make it use mimetypes, and to remove SVG support (at least for now)
due to security concerns.
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
The lexer guesser based on content is often wrong; to minimize the chances of
that happening, we only use it on files that start with "#!", for which it
usually has smarter rules.
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
If we can't guess the lexer by the file name, try to guess based on the
content.
This allows pygments to colorize extension-less files, usually scripts.
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
In practise pygments seems to have a very hard time processing large files and
files with long lines, so try to avoid using it in those cases.
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>