Git Product home page Git Product logo

anchovy's Introduction

0x2b || !0x2b, that is the question...

I've been a Python-focused developer for over a decade, and only ever dabbled lightly in C/C++, but 0x2b or not 0x2b just doesn't have the same ring. I've done backend web dev, true full-stack web dev, and dependency-injection-driven ETL pipelines. Now I'm moving towards education, while staying grounded in real-world software development. I provided technical review for Dead Simple Python; check it out!

anchovy's People

Contributors

pydsigner avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

anchovy's Issues

Add smart purge functionality

  • Checksum inputs, parameters, and anchovy version
  • Skip input files that match checksums
  • Make chain of custody work with aggregator steps

Add a font minifier Step

  • Use cssselect, tinycss2, and lxml/beautifulsoup4 to identify which glyphs are used for each font
  • Beware of glyphs that may be inserted from JavaScript
  • Use fonttools.subset to reduce font files to the identified glyphs

Add a development webserver

Use http.server.ThreadingHTTPServer to offer a built-in webserver so something extra like nginx isn't needed for testing.

  • Directly runnable server with configurable directory and port
  • Automatic index support
  • Automatic mime type detection
  • Option to run from anchovy cli post-build
  • Automatically detect root directory from anchovy settings
  • Etag support
  • Some sort of way to configure port through anchovy config file?
  • Add some tests

Development server does not recognize a mimetype for WEBP images

The development server detects mimetypes using the stdlib mimetypes guesser in strict mode. However, .webp is not included as a strict mimetype in any Python version at this time. Switching the guesser to non-strict mode will provide behavior more in line with user expectation and make our gallery example work better with the built-in server.

Anchovy CSS processor produces empty rules

Description:

Any qualified rule whose only children are other qualified rules or at rules will be present as an empty rule in the final output. This is not strictly harmful but is undesirable.

Minimal Input:

.cls {
    @media (max-width: 700px) {
        test: 1;
    }
}
a {
    b {
        padding: 0;
    }
}

Expected:

@media (max-width: 700px) {
    .cls {
        test:1;
    }
}
a b {
    padding: 0;
}

Actual:

.cls {
}
@media (max-width: 700px) {
.cls {
test:1;
}
}
a {
}
a b {
padding: 0;
}

Link rewriter step

Use custody information to track how file names change, then rewrite links to those files accordingly in HTML, CSS, and perhaps JS.

Standardize markdown support

  • Drop support for markdown libraries other than markdown-it-py
  • Add support for standard YAML frontmatter in Markdown
  • Add flag to JinjaExtendedMarkdownStep to enable footnotes plugin

Anchovy CSS preprocessor consumes essential whitespaces in media queries

Description:

Any media query with a declaration as an immediate child will have all the whitespace within the declaration removed, leading to problems when more then one value is supplied to the declaration. tinycss will supply a comment to force compliance, but minifiers may strip these out.

Minimal Input:

@media (max-width: 700px) {
    test: 1 2;
}

Expected:

@media (max-width: 700px) {
    test: 1 2;
}

Actual:

@media (max-width: 700px) {
test:1/**/2;}

Add a standard method for general configuration of Anchovy/Step behavior

anchovy.simple.BaseStandardStep has encoding and newline attributes that some users might like to override. There is no simple way to do so without monkey-patching the class or overriding those attributes with a subclass everywhere they're inherited. Likewise, there are CLI arguments for Anchovy that cannot be specified in the Settings class. Supporting a more generalized dictionary — perhaps reworking Settings to be that more generalized option — would allow an obvious way to implement those features in Anchovy itself as well as similar features for those writing their own Steps.

Add featureful Matcher class

Matcher functions are fairly useful, but it would be often helpful to combine them with and or or. We could make a PathMatcher class with __and__ and __or__ implemented so we can get these behaviors. Follow the behavior of and and or for determining which match is sent to the PathCalcs.

Add a CSS pruning step

Use cssselect, tinycss2, and lxml/beautifulsoup4 to identify unused CSS rules and exclude them.

Custody tracking incorrect for UnpackArchiveStep

#61 worked around the issue identified in #60, but actually broke the staleness checker for UnpackArchiveStep, causing it to always get unpacked with this message:

Missing upstream record (examples\code_index\code.zip)...

We will need to revert #61 and pursue a proper fix for #66 instead.

Custody tracker can crash on UnpackArchiveStep with customized PathCalc

A Rule like

    Rule(
        REMatcher(r'.*\.zip'),
        [WorkingDirPathCalc(transform=lambda p: p.parent), None],
        UnpackArchiveStep()
    )

Will cause Custody checking to crash on reruns because taking the parent of the target dir of an archive that's being unpacked into the working dir root will result in going outside the working dir. It doesn't appear we need the target dir of the archive to actually be included in the UnpackArchiveStep's custom custody information.

Add support for general matching functions

Right now the only way to match paths is to use regular expressions. Let's generalize this behavior so we can unlock the power of the Paths we're matching against.

  • Figure out a way to make this matching useful for output path determination.

Debugger mode

  • Replace various prints with a more powerful logger
  • Add DEBUG level logs to Matchers and PathCalcs
  • Add CLI option to enable DEBUG level logs

Allow Steps to export explicit chain of custody data at runtime

The way the Step API is designed, Steps receive one input path and a variable number of output paths, already identified at the beginning of each processing cycle. This is generally effective, but the edges have been pushed really from the beginning with JinjaMarkdownStep, which reads the input markdown file but also a Jinja template referenced by the markdown file. More recently, ResourcePackerStep completely does away with the input file meaning anything at all except for gathering other files. This all works well enough, but as we look to features like #30, which will require knowing all files that could affect the output of the step, we see that the engine's knowledge is insufficient. Further, proposed functionalities like #27, #28, #34, and #35 are all really going to require output paths calculated by the Steps themselves at runtime. #29 offers a possible workaround, but leans towards duplication of effort. Instead, let's take advantage of the fact we don't return anything from Steps currently. Step.__call__() can be extended to allow either the current None, in which case custody will be handled as at present, or a tuple of two lists of Paths, the first being input and the second being output. Steps which wish to declare only one of the two can easily return the paths that go in as input; this will make Step.__call__() totally symmetrical apart from the input path parameter being singular and the input path return value being plural.

Add a feature-rich Markdown processor

Take advantage of MarkdownIt features and plugins to provide a more complete markdown experience.

  • toml frontmatter
  • attrs
  • tables
  • custom containers
  • code highlighting with pygments
  • optional wordcount
  • optional templating/variable substitution
  • optional typography substitutions

Indexing steps are not marked stale when new indexable files are added

The variation of processed files for most Steps consists in the files gathered by the containing Rule's Matcher, with any other dependencies remaining consistent from one run to another or defined within the entry-point file itself. For example, JinjaExtendedMarkdownStep will enter for each markdown file matched, then pull in either the default template from the Step configuration or the template defined in the markdown file's frontmatter. This explicit connection ensures that the output HTML will be regenerated any time either the markdown or the Jinja template changes.

In contrast, a Step that generates an index of files, like the CodeIndexStep proposed in #59, enters from the Jinja template itself, and establishes custody connections only to markdown files that exist when a fresh run occurs. If the only change that occurs is an added markdown file, the Step will not have a chance to gather the new file, leading to false up-to-date decisions.

Custody tracking does not check output files for UnpackArchiveStep

Related to #60, #66, and #67. After these three custody fixes, only the target directory for UnpackArchiveStep is being checked for existence, not the the files that come from the the archive. This means that when the purge flag is supplied but the UnpackArchiveStep is outputting into a non-transient directory like a build-cache working dir or output dir, the Step is not marked as needing to be refreshed and the entry_from_path call when skip_step is run errors out on the missing files.

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "<snip>\anchovy\.venv\Scripts\anchovy.exe\__main__.py", line 7, in <module>
  File "<snip>\anchovy\src\anchovy\cli.py", line 225, in main
    run_from_rules(settings, rules, custodian, argv=remaining, prog=f'anchovy {label}')
  File "<snip>\anchovy\src\anchovy\cli.py", line 119, in run_from_rules
    context.run()
  File "<snip>\anchovy\src\anchovy\core.py", line 190, in run
    self.process(input_paths)
  File "<snip>\anchovy\src\anchovy\core.py", line 168, in process
    output_paths = self.custodian.skip_step(path, output_paths)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<snip>\anchovy\src\anchovy\custody.py", line 234, in skip_step
    for o_entry in map(self.entry_from_path, prior_outputs):
  File "<snip>\anchovy\src\anchovy\custody.py", line 142, in entry_from_path
    stat = path.stat()
           ^^^^^^^^^^^
  File "<snip>\pathlib.py", line 1013, in stat
    return os.stat(self, follow_symlinks=follow_symlinks)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'working\\code_index\\basic_site.py'

Unify DirPathCalc and OutputDirPathCalc/WorkingDirPathCalc inheritance structure

It is surprising that DirPathCalc is not the superclass of OutputDirPathCalc and WorkingDirPathCalc. This structure is the result of DirPathCalc not supporting ContextDir keys, which itself has rendered DirPathCalc essentially useless in common operations, which will not care to produce any files beyond the walls of working_dir and output_dir. Both the structural issue and the usefulness issue can be resolved by changing DirPathCalc's path parameter to support Paths or ContextDirs.

Add comprehensive automated regression tests

Right now the testbed for Anchovy consists of manual example site runs with ocular diffs. We need:

  • More example site configurations.
    • Example using working_dir, archive, advanced markdown, minification, and anchovy-css.
  • A pytest harness to execute configs and diff outputs.
    • Find a way to mark outputs as text or binary.
    • Find a consistently available diff tool for text outputs.
    • Make a tool to store and check hashes for binary outputs.
  • Updates to the GitHub CI script to execute the tests and check coverage.

match_re() can unintuitively capture portions of context.input_dir or context.working_dir

If context.input_dir has a directory starting with . anywhere between its children and the current working directory (for example, .venv/Lib/site-packages/mysite/package_data), it will be filtered out by expressions like match_re(r'(.*/)*\..*') which are intended to exclude dot files only within the input_dir. We should offer an out-of-the-box method for users to limit what the regular expression processes to the contents of input/working directories.

Custodian.degenericize_path cannot process bare ContextDirs

The issue mentioned in #60, and worked around by removing directories from custody tracking, can still appear in other places, such as in testing for #65, where a glob in the root directory is resulting in a key of glob_manifest:working_dir:*.py, which includes a 'working_dir' component that must be degenericized. Presently, that results in explosions:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "<snip>\.venv\Scripts\anchovy.exe\__main__.py", line 7, in <module>
  File "<snip>\anchovy\src\anchovy\cli.py", line 232, in main
    run_from_rules(settings, rules, custodian, argv=remaining, prog=f'anchovy {label}')
  File "<snip>\anchovy\src\anchovy\cli.py", line 119, in run_from_rules
    context.run()
  File "<snip>\anchovy\src\anchovy\core.py", line 190, in run
    self.process(input_paths)
  File "<snip>\anchovy\src\anchovy\core.py", line 176, in process
    self.process(further_processing)
  File "<snip>\anchovy\src\anchovy\core.py", line 159, in process
    stale, msg = self.custodian.refresh_needed(path, output_paths)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<snip>\anchovy\src\anchovy\custody.py", line 310, in refresh_needed
    if not self.check_prior(up_key):
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "<snip>\anchovy\src\anchovy\custody.py", line 277, in check_prior
    return checker(CustodyEntry(ptype, key, pmeta))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "examples\code_index.py", line 77, in glob_manifest_stale
    parent = context.custodian.degenericize_path(path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<snip>\anchovy\src\anchovy\custody.py", line 119, in degenericize_path
    dir_key = t.cast('ContextDir', str(path.parents[-2]))
                                       ~~~~~~~~~~~~^^^^
  File "<snip>\pathlib.py", line 445, in __getitem__
    raise IndexError(idx)
IndexError: -2

Catch up on documentation

  • Add missing docstrings for steps
  • Adding missing docstrings for modules
  • Improve internal documentation

Rework Markdown support to enable swappable engines

Replace md_parser and md_renderer arguments to JinjaMarkdownStep with parse and render methods. Default behavior could be to use commonmark, or to detect several different markdown libraries and use them. Good choices are commonmark, markdown-it-py, markdown, and mistletoe.

PathCalc for removing file extensions from webpages

It would be useful to have a PathCalc that turned paths like projects/anchovy.html into projects/anchovy/index.html. This would then effectively hide the .html from the URL, while otherwise maintaining the same structure.

Smooth off edges of Dependency system

anchovy.dependencies.Dependency is currently an instantiable class, but does lookups in anchovy.dependencies.DEPENDENCY_TYPES, has inheritance conflicts with its Or/And children, and is only used through constructor functions anyways. Let's cut down on the functionality of Dependency and make the constructor functions subclasses.

Add system for reporting/discovering Step dependencies

As currently structured, dependencies are managed by isolating import dependencies into separate modules. This only works for Python dependencies, and limits opportunities to co-locate code with similar behaviors but different dependencies or even to expose groups of Steps through __init__.py modules. Let's move hard requirement checks into the processing stage where possible, and into Step __init__ methods where not, and offer a standardized methodology for exposing and reporting dependencies.

Split anchovy_css out into its own package

These steps offer functionality with applicability outside of anchovy. Split the core functionality out into packages and make them dependencies, retaining provision of the Steps themselves inside anchovy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.