stencila / dockta Goto Github PK

View Code? Open in Web Editor NEW

121.0 121.0 12.0 5.77 MB

🐳 A Docker image builder for researchers

Home Page: https://stencila.github.io/dockta/

License: Apache License 2.0

Makefile 0.64% JavaScript 2.12% Shell 5.86% TypeScript 85.60% R 2.04% Dockerfile 3.14% Python 0.59%

docker dockerfile nodejs python r reproducibility

dockta's Introduction

Programmable, reproducible, interactive documents

👋 Intro • 🚴 Roadmap • 📜 Docs • 📥 Install • 🛠️ Develop

🙏 Acknowledgements • 💖 Supporters • 🙌 Contributors

👋 Introduction

Stencila is a platform for creating and publishing, dynamic, data-driven content. Our aim is to lower the barriers for creating truly programmable documents, and to make it easier to publish them as beautiful, interactive, and semantically rich, articles and applications. Our roots are in scientific communication, but our tools are useful beyond.

This is v2 of Stencila, a rewrite in Rust focussed on the synergies between three recent and impactful innovations and trends:

Conflict-free replicated data types (CRDTs) for de-centralized collaboration and version control.
Large language models (LLMs) for assisting in writing and editing, prose and code.
The blurring of the lines between documents and applications as seen in tools such as Notion and Coda.

We are embarking on a rewrite because CRDTs will now be the foundational synchronization and storage layer for Stencila documents. This requires fundamental changes to most other parts of the platform. Furthermore, a rewrite allow us to bake in, rather than bolt on, new modes of interaction between authors and LLM assistants and add mechanisms to mitigate the risks associated with using LLMs (e.g. by recording the actor, human or LLM, that made the change to a document). Much of the code in the v1 branch will be reused (after some tidy-ups and refactoring), so v2 is not a complete rewrite.

🚴 Roadmap

Our general strategy is to iterate horizontally across the feature set, rather than fully developing features sequentially. This will better enable early user testing of workflows and reduce the risk of finding ourselves painted into an architectural corner. So expect initial iterations to have limited functionality and be buggy.

We'll be making alpha and beta releases of v2 early and often across all products (e.g. CLI, desktop, SDKs). We're aiming for a 2.0.0 release by the end of Q3 2024.

🟢 Stable • 🔶 Beta • ⚠️ Alpha • 🚧 Under development • 🧪 Experimental • 🧭 Planned • ❔ Maybe

Schema

The Stencila Schema is the data model for Stencila documents (definition here, generated reference documentation here). Most of the schema is well defined but some document node types are still marked as under development. A summary by category:

Category	Description	Status
Works	Types of creative works (e.g. `Article`, `Figure`, `Review`)	🟢 Stable (mostly based on schema.org)
Prose	Types used in prose (e.g. `Paragraph`, `List`, `Heading`)	🟢 Stable (mostly based on HTML, JATS, Pandoc etc)
Code	Types for executable (e.g. `CodeChunk`) and non-executable code (e.g. `CodeBlock`)	🔶 Beta (may change)
Math	Types for math symbols and equations (e.g. `MathBlock`)	🔶 Beta (may change)
Data	Fundamental data types (e.g. `Number`) and validators (e.g. `NumberValidator`)	🔶 Beta (may change)
Flow	Types for control flow within a document (e.g. `If`, `For`, `Call`)	🔶 Beta (may change)
Style	Types for styling parts of a documents (`Span` and `Division`)	🔶 Beta (may change)
Edits	Types related to editing a documents (e.g. `InstructionBlock`, `DeleteInline`)	🔶 Beta (may change)

Storage and synchronization

In v2, documents can be stored as binary Automerge CRDT files, branched and merged, and with the ability to import and export the document in various formats. Collaboration, including real-time, is made possible by exchanging fine-grained changes to the CRDT over the network. In addition, we want to enable interoperability with a Git-based workflow.

Functionality	Description	Status
Documents read/write-able	Able to write a Stencila document to an Automerge binary file and read it back in	⚠️ Alpha
Documents import/export-able	Able to import or export document as alternative formats, using tree diffing to generate CRDT changes	⚠️ Alpha
Documents fork/merge-able	Able to create a fork of a document in another file and then later merge with the original	🧭 Planned
Documents diff-able	Able to view a diff, in any of the supported formats, between versions of a document and between a document and another file	🧭 Planned
Git merge driver	CLI can act as a custom Git merge driver	🧭 Planned
Relay server	Documents can be synchronized by exchanging changes via a relay server	🧭 Planned
Rendezvous server	Documents can be synchronized by exchanging changes peer-to-peer using TCP or UDP hole punching	❔ Maybe

Formats

Interoperability with existing formats has always been a key feature of Stencila. We are bringing over codecs (a.k.a. converters) from the v1 branch and porting other functionality from encoda to Rust.

Format	Encoding	Decoding	Notes
JSON	🟢	🟢
JSON5	🟢	🟢
JSON-LD	🟢	🟢
CBOR	🟢	🟢
CBOR+Zstandard	🟢	🟢
YAML	🟢	🟢
Plain text	🔶	-
HTML	🚧	🧭
JATS	🚧	🚧	Planned for completion. Port decoding and tests from `encoda`.
Markdown	⚠️	⚠️
R Markdown	🧭	🧭	Relies on Markdown; `v1`
Myst Markdown	🚧	🚧	In progress; PR
Jupyter Notebook	🧭	🧭	Relies on Markdown; `v1`
Scripts	🧭	🧭	Relies on Markdown; `v1`
Pandoc	🧭	🧭	Planned. `v1`
LaTeX	🧭	🧭	Relies on Pandoc; `v1`; discussion
Org	🧭	🧭	Relies on Pandoc; PR
Microsoft Word	🧭	🧭	Relies on Pandoc; `v1`
ODT	🧭	🧭	Relies on Pandoc
Google Docs	🧭	🧭	Planned `v1`
PDF	🧭	🧭	Planned, relies on HTML; `v1`
Codec Plugin API	🧭	🧭	An API allowing codecs to be developed as plugins in Python, Node.js, and other languages

Kernels

Kernels are what executes the code in Stencila CodeChunks and CodeExpressions, as well as in control flow document nodes such as ForBlock and IfBlock. In addition, there are kernels for rendering math (e.g. MathBlock) and styling (e.g. StyledInline) nodes.

Kernel	Purpose	Status
Bash	Execute Bash code	🔶 Beta
Zsh	Execute Zsh code	❔ Maybe; `v1`
Python	Execute Python code	🔶 Beta
R	Execute R code	⚠️ Alpha
QuickJs	Execute JavaScript in embedded sandbox	🔶 Beta
Node.js	Execute JavaScript in a Node.js env	🔶 Beta
Deno	Execute TypeScript code	❔ Maybe; `v1`
SQLite	Execute SQL code	🧭 Planned; `v1`
Jupyter kernels	Execute code in Jupyter kernels	🚧 In progress; PR
Rhai	Execute a sand boxed, embedded language	🔶 Beta
AsciiMath	Render AsciiMath symbols and equations	🔶 Beta
TeX	Render TeX math symbols and equations	🔶 Beta
Graphviz	Render Graphviz DOT to SVG images	⚠️ Beta
Jinja	Interpolate document variables into styling and other code	⚠️ Beta
Style	Transpile Tailwind and CSS for styling	🔶 Beta
HTTP	Interact with RESTful APIs	❔ Maybe; `v1`

[TIP] Run stencila kernels (or cargo run -p cli kernels in development) for an up to date list of kernels, including those available through plugins.

Tools

Tools are what we call the self-contained Stencila products you can download and use locally on your machine to interact with Stencila documents.

Environments	Purpose	Status
CLI	Manage documents from the command line and read and edit them using a web browser	⚠️ Alpha
Desktop	Manage, read and edit documents from a desktop app	⚠️ Alpha repo
VSCode extension	Manage, read and edit documents from within VSCode	⚠️ Alpha

SDKs

Stencila's software development kits (SDKs) enable developers to create plugins to extend Stencila's core functionality or to build other tools on top of. At this stage we are planning to support Python, Node.js and R but more languages may be added if there is demand.

Language	Description	Status
Python	Types and function bindings for using Stencila from Python	⚠️ Alpha PyPI
TypeScript	JavaScript classes and TypeScript types for the Stencila Schema	⚠️ Alpha NPM
Node.js	Types and function bindings for using Stencila from Node.js	⚠️ Alpha NPM

Testing and auditing

Making sure Stencila v2 is well tested, fast, secure, and accessible, is important. Here's what where doing towards that:

What	Description	Status
Property-based testing	Establish property-based (a.k.a generative) testing for Stencila documents	🟢 Done
Round-trip testing	Establish property-based tests of round-trip conversion to/from supported formats and reading/writing from/to Automerge CRDTs	🟢 Done here and here
Coverage reporting	Report coverage by feature (e.g. by codec) to give developers better insight into the status of each	🟢 Done Codecov
Dependency audits	Add dependency audits to continuous integration workflow.	🟢 Done
Accessibility testing	Add accessibility testing to continuous integration workflow.	🟢 Done here
Performance monitoring	Establish continuous benchmarking	🟢 Done here
Security audit	External security audit sponsored by NLNet.	🧭 Planned Q2 2023 (after most `v2` functionality added and before public beta)
Accessibility audit	External accessibility audit sponsored by NLNet.	🧭 Planned Q3 2023 (before `v2.0.0` release)

📜 Documentation

At this stage, documentation for v2 is mainly reference material, much of it generated:

More reference docs as well as guides and tutorial will be added over the coming months. We will be bootstrapping the publishing of all docs (i.e. to use Stencila itself to publish HTML pages) and expect to have an initial published set in.

📥 Install

Although v2 is in early stages of development, and functionality may be limited or buggy, we are releasing alpha versions of the Stencila CLI and SDKs. Doing so allows us to get early feedback and monitor what impact the addition of features has on build times and distribution sizes.

CLI

Windows

To install the latest release download stencila-<version>-x86_64-pc-windows-msvc.zip from the latest release and place it somewhere on your PATH.

MacOS

To install the latest release in /usr/local/bin,

curl --proto '=https' --tlsv1.2 -sSf https://stencila.dev/install.sh | sh

To install a specific version, append -s vX.X.X. Or, if you'd prefer to do it manually, download stencila-<version>-x86_64-apple-darwin.tar.gz from the one of the releases and then,

tar xvf stencila-*.tar.gz
cd stencila-*/
sudo mv -f stencila /usr/local/bin # or wherever you prefer

Linux

To install the latest release in ~/.local/bin/,

curl --proto '=https' --tlsv1.2 -sSf https://stencila.dev/install.sh | sh

To install a specific version, append -s vX.X.X. Or, if you'd prefer to do it manually, download stencila-<version>-x86_64-unknown-linux-gnu.tar.gz from the one of the releases and then,

tar xvf stencila-*.tar.gz
mv -f stencila ~/.local/bin/ # or wherever you prefer

Docker

The CLI is also available in a Docker image you can pull from the Github Container Registry,

docker pull stencila/stencila

and use locally like this for example,

docker run -it --rm -v "$PWD":/work -w /work --network host stencila/stencila --help

The same image is also published to the Github Container Registry if you'd prefer to use that,

docker pull ghcr.io/stencila/stencila

SDKs

Python

Use your favorite package manager to install Stencila's SDK for Python:

python -m pip install stencila

[!NOTE] If you encounter problems with the above command, you may need to upgrade Pip using pip install --upgrade pip.

poetry add stencila

Node

Use your favorite package manager to install @stencila/node:

npm install @stencila/node

yarn add @stencila/node

pnpm add @stencila/node

TypeScript

Use your favorite package manager to install @stencila/types:

npm install @stencila/types

yarn add @stencila/types

pnpm add @stencila/types

🛠️ Develop

Code organization

This repository is organized into the following modules. Please see their respective READMEs, where available, for guides to contributing to each.

schema: YAML files which define the Stencila Schema, an implementation of, and extensions to, schema.org, for programmable documents.
json: A JSON Schema and JSON LD @context, generated from Stencila Schema, which can be used to validate Stencila documents and transform them to other vocabularies
rust: Several Rust crates implementing core functionality and a CLI for working with Stencila documents.
python: A Python package, with classes generated from Stencila Schema and bindings to Rust functions, so you can work with Stencila documents from within Python.
ts: A package of TypeScript types generated from Stencila Schema so you can create type-safe Stencila documents in the browser, Node.js, Deno etc.
node: A Node.js package, using the generated TypeScript types and bindings to Rust functions, so you can work with Stencila documents from within Node.js.
prompts: Prompts for used to instruct AI assistants in different contexts and for different purposes.
docs: Documentation, including reference documentation generated from schema and CLI tool.
examples: Examples of documents conforming to Stencila Schema, mostly for testing purposes.
scripts: Scripts used for making releases and during continuous integration.

Continuous integration and deployment

Several Github Action workflows are used for testing and releases. All products (i.e CLI, Docker image, SKDs) are released at the same time with the same version number. To create and release a new version:

bash scripts/bump-version.sh <VERSION>
git push && git push --tags

Workflow	Purpose	Status
`test.yml`	Run linting, tests and other checks. Commit changes to any generated files.
`pages.yml`	Publish docs, JSON-LD, JSON Schema, etc to https://stencila.dev hosted on GitHub Pages
`version.yml`	Trigger the `release.yml` workflow when a version tag is created.
`release.yml`	Create a release, including building and publishing CLI, Docker image and SDKs.
`install.yml`	Test installation and usage of CLI, Docker image and SDKs across various operating systems and language versions.

🙏 Acknowledgements

Stencila is built on the shoulders of many open source projects. Our sincere thanks to all the maintainers and contributors of those projects for their vision, enthusiasm and dedication. But most of all for all their hard work! The following open source projects in particular have an important role in the current version of Stencila. We sponsor these projects where, and to an extent, possible through GitHub Sponsors and Open Collective.

	Link	Summary
	Automerge	A Rust library of data structures for building collaborative applications.
	Clap	A Command Line Argument Parser for Rust.
	NAPI-RS	A framework for building pre-compiled Node.js addons in Rust.
	PyO₃	Rust bindings for Python, including tools for creating native Python extension modules.
	Rust	A multi-paradigm, high-level, general-purpose programming language which emphasizes performance, type safety, and concurrency.
	Serde	A framework for serializing and deserializing Rust data structures efficiently and generically.
	Similar	A Rust library of diffing algorithms including Patience and Hunt–McIlroy / Hunt–Szymanski LCS.
	Tokio	An asynchronous runtime for Rust which provides the building blocks needed for writing network applications without compromising speed.

💖 Supporters

We wouldn’t be doing this without the support of these forward looking organizations.

🙌 Contributors

Thank you to all our contributors (not just the ones that submitted code!). If you made a contribution but are not listed here please create an issue, or PR, like this.

dockta's People

Contributors

Stargazers

Watchers

Forkers

disisisid bmpvieira jwijay kevin2000141 anouarganfoud alfinmubarok tasyaxganz rdgao eltociear alee

dockta's Issues

feat(JATSParser): detect package import statements

Look for .xml files with <!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) ... elements and detect any package import statements (e.g. Python import, R library, or Node.js require) in any elements matching:

code[specific-use=cell] > named-content > alternatives > code
fig[fig-type=repro-fig] > alternatives > code

feat(DockerBuilder): only COPY files if they have changed

To be consistent with Docker we could only COPY and ADD files if they have changed. This would be an optimisation and would be non-trivial to implement so is probably low priority.

fix(Generator): run Docker as non-root user

It's best practice to run processes in Docker containers as a non-root user

So we should add something like the following to generated Docker files.

RUN useradd --create-home --uid 1001 -s /bin/bash user
USER user
WORKDIR /home/user

feat(DockerBuilder): implement .dockerignore like behaviour

Currently we send all files in the folder to the Docker daemon as a build context, with a few exceptions:

https://github.com/stencila/dockter/blob/v0.2.6/src/DockerBuilder.ts#L61

It would also be sensible to ignore node_modules etc. See https://codefresh.io/docker-tutorial/not-ignore-dockerignore/

Chore: publish npm package and upload binaries

Add publishing commands to npm scripts, Makefile and travis.yml. See:

Chore: activate Codecov for this project

The .travis.yml file is already setup to push code coverage to Codecov, it just needs to be turned on.

docs: improve and increase coverage of in-code documentation

Once this is done, set tslint.json to:

{
  "extends": "tslint-config-standard",
  "rules" : {
    "completed-docs": true
  }
}

Document the crosswalk from Dockerfile labels to Schema.org/CodeMeta

Meta-data can be added to Dockerfiles using the LABEL and the, now deprecated, MAINTAINER directive. Dockter extracts this meta-data when "compiling" a handwritten Dockerfile. This allows project authors to publish a codemeta.json file for their Docker based project.

We should define and document the meta-data conversion that Docker does using a crosswalk table that correlates common Dockerfile labels (https://docs.docker.com/engine/reference/builder/#label) and http://label-schema.org/rc1/ labels to https://schema.org and https://codemeta.github.io terms.

It might look something like this:

Label	Parent Type	Property	Type	Description
`maintainer`	codemeta:SoftwareSourceCode	maintainer	Person	Individual responsible for maintaining the software (usually includes an email contact address
`version` or `org.label-schema.version`	schema:SoftwareApplication	softwareVersion	Text	Version of the software instance

See https://github.com/codemeta/codemeta/blob/master/crosswalk.csv (be sure to scroll right!) for similar crosswalk table for other software packaging systems.

docs(Python): add note on PythonSystemDependencies.json

Add some notes in CONTRIBUTING.md on how to know when you need to, and how to, add entries into PythonSystemDependencies.json. Doesn't need a lot of explanation, just some quick pointers.

RGenerator: should download installation script from unpkg.com

Currently RGenerator creates a Dockerfile that downloads src/install.R from Github pages i.e. https://stencila.github.io/dockter/install.R.

Once this is published as an NPL package #17 it should use https://unpkg.com/@stencila/dockter/src/install.R (which will redirect to the latest version)

DockerBuilder: allow for remote Docker daemons

Currently, Dockter requires Docker to be installed on the system. That's because we're using:

const docker = new Docker() # Connects to local Docker daemon via '/var/run/docker.sock'

But, in theory, we could allow for a remote Docker daemon to build images (it would require that the folder contents be zipped up and send across the network). e.g.

const docker = new Docker({
  protocol: 'https',
  host: 'some.docker-server.org',
  port: process.env.DOCKER_PORT || 2375,
  ca: fs.readFileSync('ca.pem'),
  cert: fs.readFileSync('cert.pem'),
  key: fs.readFileSync('key.pem')
})

Not a high priority, but could be useful for users who don't have Docker installed and don't want to.

feat(Generator): add comments in Dockerfile

We're trying to make the generated Dockerfile conform to best practices and to be useful for learning how to write your own Dockerfiles. So adding comments to each section of the generated file could be useful. e.g.

# This section installs required system packages into the image and then cleans up
# If you need extra system packages add them here
RUN apt-get update \
 && DEBIAN_FRONTEND=noninteractive apt-get install -y \
      libxml2-dev \
      r-base \
...

# It's best practice to run Docker images as a non-root user
# This section creates a new user, sets it as the user for the image and their
# home directory as the working directory
RUN useradd --create-home --uid 1001 -s /bin/bash dockteruser
USER dockteruser
WORKDIR /home/dockteruser

Binary doesn't work on ubuntu 16.04 ?

$ wget https://github.com/stencila/dockter/releases/download/v0.2.2/dockter-linux-x64.tar.gz
$ tar xvfz Downloads/dockter-linux-x64.tar.gz 
dockter
$ ./dockter build
events.js:183
      throw er; // Unhandled 'error' event
      ^

Error: ENOENT: no such file or directory, lstat 'snapshot'
$ ./dockter -v
0.2.1

fix(DockerBuilder): implement WORKDIR instruction

Docker image builds are currently broken because the WORKDIR instruction which we added for #25 is not fulfilled. Need to keep track of last WORKDIR and set path here https://github.com/stencila/dockter/blob/v0.2.6/src/DockerBuilder.ts#L188

PythonParser: should parse requirements.txt files

The PythonParser class currently has an empty parse() method. As a start it should parse requirements.txt files in a similar way to how the RParser handles DESCRIPTION files.

See https://warehouse.pypa.io/api-reference/json/ for obtaining package meta-data including system requirements.

RParser: include package system dependencies

We should use https://github.com/r-hub/sysreqs to install system dependencies for R packages.

fix(DockerBuilder): when doing `RUN` abort after an amount of time?

Currently, we wait forever for the RUN instruction to finish: https://github.com/stencila/dockter/blob/v0.2.6/src/DockerBuilder.ts#L208

Probably should have some sort of maximum time limit.

README: add a GIF of a console session

An animated GIF of a console session helps give a quick illustration of what a CLI tool like this one does. A good example is https://github.com/pypa/pipenv#readme

feat(CLI): add option to allow date to be specified

Currently, the datePublished of the generated SoftwarePackage is the current date. @finlay suggested it could be useful to allow users to override that at the command line jinstead of editing any generated files).

Clarify stateful #dockter comment

The README says:

Dockter does this by looking for a special # dockter comment in a Dockerfile. Instead of throwing away layers, it executes all instructions after this comment in the same layer - thus reusing packages that were previously installed.

This explanation was a bit difficult to understand, as I thought it was an alternate implementation of the --squash option. That would create a new single layer in which to execute all images, but a new one for each build. This instead will reuse an existing layer, overwriting the contents.

Some thoughts to polish this feature:

# dockter does not convey the meaning of this, a more specific annotation like # dockter: stateful could make it easier to search for documentation about this option
clarify in README this is going to overwrite an existing, single layer
clarify where this state is kept, in the Docker daemon's state files, I guess
clarify rolling back to a previous version of the dependencies will be slower as the trade-off, having to modify the existing, single layer

Chore: activate Travis CI for this project

There's already a .travis.yml file but we need to activate this project and add a GITHUB_TOKEN env var so that docs can get pushed to GithubPages.

Chore: add code of conduct

Our standard CoC from https://github.com/stencila/policies/blob/master/CONDUCT.md should be added to this repo and linked to from README in the Develop section.

feat(cli): add option not to install Stencila packages

Currently, we install Stencila packages for R, Python, Node.js so that editor UI clients can execute code within the container. This is done independently of the project's package dependencies.

However, not all users may need this functionality so a --stencila=false option would be useful to speed up builds and reduce image sizes.

fix(RParser): scan for .r and .rmd files

Currently, the RParser scans for .R and .Rmd files only. It should scan for lower case equivalents too.

fix(Windows): error when running compile

{ Error: File 'C:\**\dockter\dist\PythonBuiltins.txt' was not included into executable at compilation stage. Please recompile adding it as asset or script.
    at error_ENOENT (pkg/prelude/bootstrap.js:422:17)
    at readFileFromSnapshot (pkg/prelude/bootstrap.js:650:29)
    at Object.fs.readFileSync (pkg/prelude/bootstrap.js:693:18)
    at PythonParser.generateRequirementsFromSource (C:\snapshot\dockter\dist\PythonParser.js:115:48)
    at PythonParser.parse (C:\snapshot\dockter\dist\PythonParser.js:67:33)
    at DockerCompiler.compile (C:\snapshot\dockter\dist\DockerCompiler.js:56:46)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)
    at Function.Module.runMain (pkg/prelude/bootstrap.js:1308:13)
    at startup (bootstrap_node.js:227:16)
  errno: -4058,
  code: 'ENOENT',
  path: 'C:\\snapshot\\dockter\\dist\\PythonBuiltins.txt',
  pkg: true }

RParser: SyntaxError: Unexpected end of JSON input

When compiling R lesson for Software Carpentry I get an error:

SyntaxError: Unexpected end of JSON input
    at JSON.parse (<anonymous>)
    at CachedRequest.parseHeaders (/snapshot/dockter/node_modules/cached-request/lib/cached-request.js:112:15)
    at ReadStream.<anonymous> (/snapshot/dockter/node_modules/cached-request/lib/cached-request.js:194:37)
    at emitNone (events.js:111:20)
    at ReadStream.emit (events.js:208:7)
    at endReadableNT (_stream_readable.js:1064:12)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at process._tickCallback (internal/process/next_tick.js:180:9)
SyntaxError: Unexpected end of JSON input
    at JSON.parse (<anonymous>)
    at CachedRequest.parseHeaders (/snapshot/dockter/node_modules/cached-request/lib/cached-request.js:112:15)
    at ReadStream.<anonymous> (/snapshot/dockter/node_modules/cached-request/lib/cached-request.js:194:37)
    at emitNone (events.js:111:20)
    at ReadStream.emit (events.js:208:7)
    at endReadableNT (_stream_readable.js:1064:12)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at process._tickCallback (internal/process/next_tick.js:180:9)

fix(JavascriptParser): implement package meta data translation

See TODO at https://github.com/stencila/dockter/blob/98b882fd26f31b44631f054e0bf85d7e4c6e23cc/src/JavascriptParser.ts#L61

feat(RParser): improve handling of crandb package fields

crandb provides plenty of meta data for each package e.g. http://crandb.r-pkg.org/ggplot2

Currently, we don't handle every field or could handle them better e.g.:

Title: assign to CreativeWork.headline ?
Licence : should be parse the raw string into a schema.org URL or CreativeWork
URL : if the URLS which point to a repo e.g. github.com then should be assigned to SoftwareSourceCode.codeRepository

Context: split off into separate repo

Currently the file src/context.ts has temporary Typescript representations of JSON-LD schema types. This should be split off into the stencila/specs repo (to be renamed stencila/schema) so that it can be reused in other projects as well.

But while we are still working out what those types should be and what they will look like we'll leave them in here.

JsParser: should parse package.json files

The JsParser class should load any package.json files in a folder and create a SoftwarePackage instance from it using the CodeMeta crosswalk: https://codemeta.github.io/crosswalk/node/

PythonParser: should parse .py files

The PythonParser class scan for parse Python files in a folder and extract a list of unique (non-standard-library) package dependencies from import statements.

See browserify/detective#8 as an example of using JS regexes to parse for requires. A similar approach could be used for Python import statements (since we wont have a python AST to walk in TypeScript) - there are probably already regexes out there for parsing Python import statements

PythonParser: collect metadata on package dependencies and system dependencies

See https://warehouse.pypa.io/api-reference/json/ for obtaining Python package meta-data including system requirements.

See https://github.com/stencila/dockter/blob/e8521a30e0e069cdfacd46a758370c36ffb24a69/src/RParser.ts#L103 for how we use this data to populate softwareRequirements property.

An in-range update of @types/dockerode is breaking the build 🚨

The devDependency @types/dockerode was updated from `2.5.5` to `2.5.6`.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

@types/dockerode is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details

❌ continuous-integration/travis-ci/push: The Travis CI build failed (Details).

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.

Your Greenkeeper Bot 🌴

bug: Project folder name upper case not converted(?) to lower when compiling

When compiling a project folder with name starting with uppercase, dockter gives an error:

server error - invalid reference format: repository name must be lowercase

This is probably related to: https://stackoverflow.com/questions/48522615/docker-error-invalid-reference-format-repository-name-must-be-lowercase

Changing the name of the project folder fixed it but we should reflect that in the documentation or deal with the naming.

Tests: improve coverage

We should improve test coverage prior to 1.0.0. I'd suggest we target 95% (noting of course that there is more to test quality than the %!)

@beneboy: can you work on the parser related tests and I can do the generator and builder tests

PythonParser: should parse Pipfiles

The PythonParser should be able to parse Pipfiles https://github.com/pypa/pipfile

feat(DockerBuilder): make ADD and COPY behaviour consistent with Docker

We support both ADD and COPY in the # dockter section of a Dockerfile

https://github.com/stencila/dockter/blob/v0.2.6/src/DockerBuilder.ts#L167

Docker's ADD and COPY instructions are very similar but COPY is generally recommended. Nonetheless, if we support ADD in the # dockter section of a Dockerfile then we should provide consistent behaviour.

docs(how-to): finish

Current draft in how-to.md at top level.

(Just creating a list of issues to be completed for https://github.com/stencila/dockter/milestone/2)

Chore: Remove TODOs and FIXMEs

There are quite a lot of TODOs and FIXMEs in the code. They should be removed into Github issues.

JupyterParser: should scan `.ipynb` files

Implement a JupyterParser class which scans the folder for any ipynb files, scan the code cells in those files for any Python import or R library statements, extract a list of package dependencies, and create a SoftwarePackage instance for it.

Should also extract relevant meta-data from the .ipynb files and add it to the SoftwarePackage. There currently isn't any CodeMeta crosswalk for ipynb meta data, https://codemeta.github.io/crosswalk/, maybe we could contribute one.

fix: Certificate for Windows

Windows gives warning when trying to run the Docker binary which may prevent some users from actually launching the app.

feature: add "dockter who" command

As per result of the discussion in #37 we could add a command to dockter - suggestion by @nokome

*But how about we leverage the fact that we generate a JSON-LD document with all the software requirements for a project...including their authors. The who subcommand could be a "standing on the shoulders of giants" list of the authors of the current project (if available) and all of the software packages it relies on. e.g. *

$ dockter who
Roger Bivand (rgdal, sp), Tim Keitt (rgdal), Barry Rowlingson (rgdal), Edzer Pebesma (sp)

Which for reference is based on filtering the .environ.jsonld file for one of our test fixtures

jq '.softwareRequirements[0].softwareRequirements[] | .author[] | .name' tests/fixtures/r-spatial/.environ.jsonld

bug: /bin/bash: no such file or directory

Running build results in an error:

 Error: (HTTP code 400) unexpected - OCI runtime create failed: container_linux.go:348: starting container process caused "exec: \"/bin/bash\": stat /bin/bash: no such file or directory": unknown

It is probably because the location of bash in the container is in /usr/local/bin/bash
see here

Looks like the solution is to replace

RUN useradd --create-home --uid 1001 -s /bin/bash dockteruser

with

RUN useradd --create-home --uid 1001 -s  /usr/local/bin/bash dockteruser

@nokome Can you have a look? I can change that and push.

README: Add a logo

Surely, we gotta have a cute 🐳 whale + doctor (medical or PhD) combination logo at the top of the README: a whale with a graduation cap, a whale with a stethoscope, a whale with a test tube?

docs: provide docs on how to add support for a new language

In CONTRIBUTING.md (but linked to under Contribute in README.md) we should have a brief guide to getting started on adding support for a new language.

how to write a Parser and add to parsers.ts
how to write a Generator and add to generators.ts
note on adding tests based on existing ones

@apawlik : I'll start on this next week and then pink you to review it.

JsParser: should parse .js files

The JsParser class should be able to parse any .js files in the folder and extract a list of package dependencies from import or requires statements.

Use https://github.com/browserify/detective for extracting required packages from source code.

See #5 and #6 for corresponding issues for other languages.

RParser: TypeError: Cannot read property 'split' of undefined

When trying to compile folder with only one R file https://raw.githubusercontent.com/swcarpentry/r-novice-inflammation/master/tic.R (not your typical R file but the error should not occur)

TypeError: Cannot read property 'split' of undefined
    at Parser_1.default.parse.environ.softwareRequirements.Promise.all.packages.map (/snapshot/dockter/dist/RParser.js:122:27)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)
undefined

RParser: should parse .R and .Rmd files

The RParser class should scan for .R and .Rmd files in the folder and extract a list of package dependencies from library (and requires?) statements.

See #5 for the corresponding PythonParser issue.

Name: Dockter, Docktor, Dokta, Tudock, Dave?

The current name of this tool is Dockter. It's a portmanteau of doctor (either the medical or PhD kind) and Docker (the container platform) because it's aim is to make it easier for researchers (either the medical or PhD kind 🙂 ) to create Docker containers.

But there are five issues with the name:

🗣️ Hard to distinguish in verbal communications: "Was that Docker or Dockter?"
✍️ Ambiguity in spelling: "Dockter" or "Docktor"?
🖥️ Confusion at the command line: dockter build . is very similar docker build . and looking for that extra t can consume an extra brain cycle or two to figure out which tool you are using.
💻 Autocomplete at the command line: many people use autocomplete in the shell and having the first four letters the same makes this less useful.

As the famous joke says (well actually a riff on the original joke):

There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.
Leon Bambrick

Please add your thoughts and suggestions in comments below 🙏. If you like someone's suggested name please 👍 it

JupyterGenerator: should install packages needed to run Jupyter notebooks

Implement a JupyterGenerator class which, if there is a .ipynb file in the folder, in the generated Dockerfile installs the jupyter Python package and runs the notebook server (CMD instruction).

stencila / dockta Goto Github PK

dockta's Introduction

👋 Introduction

🚴 Roadmap

Schema

Storage and synchronization

Formats

Kernels

Tools

SDKs

Testing and auditing

📜 Documentation

📥 Install

CLI

SDKs

🛠️ Develop

Code organization

Continuous integration and deployment

🙏 Acknowledgements

💖 Supporters

🙌 Contributors

dockta's People

Contributors

Stargazers

Watchers

Forkers

dockta's Issues

The devDependency @types/dockerode was updated from 2.5.5 to 2.5.6.

Recommend Projects

Recommend Topics

Recommend Org

The devDependency @types/dockerode was updated from `2.5.5` to `2.5.6`.