Git Product home page Git Product logo

airbnb / knowledge-repo Goto Github PK

View Code? Open in Web Editor NEW
5.4K 173.0 686.0 75.88 MB

A next-generation curated knowledge sharing platform for data scientists and other technical professions.

License: Apache License 2.0

Python 42.72% Mako 0.08% CSS 3.28% JavaScript 26.11% HTML 9.93% Jupyter Notebook 16.49% Shell 0.61% Batchfile 0.17% Dockerfile 0.14% TypeScript 0.47%
data data-science knowledge data-analysis

knowledge-repo's Introduction

Knowledge Repo

PyPI license PyPI version Python Build Status PyPI downloads PyPI monthly downloads

About the Knowledge Repo

The Knowledge Repo project aims to streamline the sharing of knowledge among data scientists and other technical roles by utilizing data formats and tools that are commonly used in these professions. Our platform offers various options for storing and managing "knowledge posts", with a focus on utilizing notebooks (such as R Markdown and Jupyter/IPython Notebook) to better promote reproducible research.

Content Submission Options

1. Github Integration:

Easily submit your posts in markdown format directly through Github. Our platform will automatically detect and publish your new content in a timely manner.

2. Built-in Editor:

Utilize our user-friendly editor to compose and upload your posts in various formats including Jupyter Notebook, R markdown, and Google document link. We securely store your content on our internal storage for easy access and management. Users can make updates, delete, share, and add comments to their posts after publishing.

Getting Started

For more information about the motivation and inspiration behind this project, we encourage you to read our Medium Post.

Contact

Screenshots

Name Screenshot(s)
Feed
Example Post
New Post
Post Form
Stats

knowledge-repo's People

Contributors

arinarmo avatar biogeek avatar bulam avatar clabroy avatar csharplus avatar danfrankj avatar danhper avatar davidlundgren avatar dependabot[bot] avatar dorianbrown avatar earthmancash2 avatar evie404 avatar jihonrado avatar jjj000 avatar john-bodley avatar jtv8 avatar kelvins avatar matthewwardrop avatar mengting1010 avatar mpekalski avatar naoyak avatar niharikaray avatar npelikan avatar perryism avatar rmhogervorst avatar serenajiang avatar srowen avatar sryza avatar tcbegley avatar wooddar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

knowledge-repo's Issues

Improve the URL schema

Currently all knowledge-repo urls are top-level. We will be refactoring the url schema shortly.

running web app gets stuck in alembic migration

running knowledge_repo --repo datasci_knowledge/ runserver gives the following output.

I've been stuck on the last log for about ~10 mins. I figure it shouldn't take this long because it's either an empty migration or it is initializing the db?

INFO:knowledge_repo.repositories.gitrepository:Fetching updates to the knowledge repository...
INFO:alembic.runtime.migration:Context impl SQLiteImpl.
INFO:alembic.runtime.migration:Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running stamp_revision  -> 1b385158fd32

This is the page I am presented with at localhost:7000

image

Adding a long commit message

Auto-reviewers: @NiharikaRay @matthewwardrop @earthmancash @danfrankj

Hey, reviewers! I'm gathering several questions that I had while testing the repo out. Please let me know if you prefer an issue for each of them.

  1. Is there a way of adding a long message to the commits instead of the default commit -m style?

  2. Why are so many pages displayed even if I have only 2 posts?

image

  1. How do I change the logo? Actually, is not appearing in the server, but when I run the exact same thing locally it does show (the /static/images/logo-white.svg one)

  2. What are your suggestions in order to change the complete blog theme? Fork this repo and change it?, can it be done with a local config file that I have missed so I always point to the main repository instead of a forked one?

  3. My posts are all showing the default thumbnail, how to select which graph to show?

  4. Can a button be added (or the one that is there already changed) to toggle the .Rmd file instead of the .md? This could address #118 in a more elegant way. You write your post hiding all the code for the non-technical readers but allow the curious one to see how it was created by toggling to the raw .Rmd

Thanks for the hard work, I'm really enjoying testing it out.
Kael

Support for GitLab as the basis for a Knowledge Repo?

I may be mistaken, but does Knowledge Repo have support for GitHub repos specifically or can you use it with any Git repo?

If the former, it'd be awesome to have support for Git repos agnostic of their origin. If the latter, the README should be clarified.

'NoneType' is not iterable

I'm getting again this error that I've seen in other issues.

My complete log is here:



    File "/Users/kael/toptal/knowledge-repo/env/lib/python2.7/site-packages/flask/app.py", line 2000, in __call__

    return self.wsgi_app(environ, start_response)

    File "/Users/kael/toptal/knowledge-repo/env/lib/python2.7/site-packages/flask/app.py", line 1991, in wsgi_app

    response = self.make_response(self.handle_exception(e))

    File "/Users/kael/toptal/knowledge-repo/env/lib/python2.7/site-packages/flask/app.py", line 1567, in handle_exception

    reraise(exc_type, exc_value, tb)

    File "/Users/kael/toptal/knowledge-repo/env/lib/python2.7/site-packages/flask/app.py", line 1988, in wsgi_app

    response = self.full_dispatch_request()

    File "/Users/kael/toptal/knowledge-repo/env/lib/python2.7/site-packages/flask/app.py", line 1641, in full_dispatch_request

    rv = self.handle_user_exception(e)

    File "/Users/kael/toptal/knowledge-repo/env/lib/python2.7/site-packages/flask/app.py", line 1544, in handle_user_exception

    reraise(exc_type, exc_value, tb)

    File "/Users/kael/toptal/knowledge-repo/env/lib/python2.7/site-packages/flask/app.py", line 1639, in full_dispatch_request

    rv = self.dispatch_request()

    File "/Users/kael/toptal/knowledge-repo/env/lib/python2.7/site-packages/flask/app.py", line 1625, in dispatch_request

    return self.view_functions[rule.endpoint](**req.view_args)

    File "/Users/kael/toptal/knowledge-repo/env/lib/python2.7/site-packages/knowledge_repo/app/models.py", line 102, in __call__

    raise_with_traceback(e)

    File "/Users/kael/toptal/knowledge-repo/env/lib/python2.7/site-packages/knowledge_repo/app/models.py", line 98, in __call__

    return self._route(*args, **kwargs)

    File "/Users/kael/toptal/knowledge-repo/env/lib/python2.7/site-packages/knowledge_repo/app/routes/render.py", line 71, in render

    if post.contains_excluded_tag:

    File "/Users/kael/toptal/knowledge-repo/env/lib/python2.7/site-packages/knowledge_repo/app/models.py", line 316, in contains_excluded_tag

    if tag.name in excluded_tags:

    TypeError: argument of type 'NoneType' is not iterable

The debugger caught an exception in your WSGI application. You can now look at the traceback which led to the error.

To switch between the interactive traceback and the plaintext one, you can click on the "Traceback" headline. From the text traceback you can also create a paste of it. For code execution mouse-over the frame you want to debug and click on the console icon on the right side.

You can execute arbitrary Python code in the stack frames and there are some extra helpers available for introspection:

    dump() shows all variables in the frame
    dump(obj) dumps all that's known about the object

I'm testing it out, so it may be an error on my part. Yet, I don't see any database been created.

Thanks for such an awesome tool!

Error when adding an Rmd post

I got that error when I try to add the Rmd template:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

Works fine with the ipynb and md templates.

SSH Key

Maybe I have missed something, but is it possible to connect with knowledge repo to a git repository using ssh (on a different port than 22)? I would like to use ssh key and not have to write username and password every time, but it does not seem to work at the moment.

Auto-reviewers: @NiharikaRay @matthewwardrop @earthmancash @danfrankj

chunk option is not supported for Rmd?

Trying to avoid the result print of some of the chunks by using following chunk options. But it seems the option setup was not taking effect.

{r results='hide', message=FALSE, warning=FALSE}
library(RJSONIO)
library(AnotherPackage)

Documentation: extra files

Hi, I'm still testing out the knowledge repo and I have a question:

I did not see anywhere what is the prefered way to add extra files (ex: excel files, sql files,...) in the commit when you submit your blog post.

Is there also any way to automatically have a list of all the extra files directly in the blog post?

Thank you !

bug: running web app on v0.6.5

First, on v0.6.5 deploying the web app does not work correctly. Pages are not rendering with an error message displayed but moving back down to v0.6.3 the bug disappears. When I run a preview of a Knowledge Post it renders correctly, the issue only occurs when I initiate a server, either via deploy or runserver.

Second, thanks for a great application. Is raising an issue the best way to raise any bugs or other feedback? I found the documentation very clear and easy to step through to get up and running very quickly.

Documentation: How to set users

Hi, I'm still testing out the knowledge repo and I have a question.

I'm getting always the knowledge_default user.
How to set up users and emails?

Could you elaborate more on this on the documentation?

Thanks a lot!

Also, where to leave feedback? as an issue on Github?

500 errors when accessing knowledge repo through gunicorn deploy

I am able to successfully use the runserver command to start a development server and view the knowledge repo app through my web browser.

However, when I use the deploy command and kick off gunicorn workers successfully, I am unable to load any pages of the knowledge repo in my web browser and instead receive a 500 response.

image

Unfortunately, I am unable to provide a useful stack trace since the application logger doesn't seem to print to stdout/stderr when using gunicorn.

Disable embedded tooling by default

The fact that this is enabled by default seems to be tripping a few people up. Since this setup only makes sense in larger establishments, it should be optionally enabled. This will also have the benefit that at least someone associated with each knowledge repository knows why the local version of knowledge_repo is not being used.

Documentation around Posts

We need to have a better system of documentation around the headers available in posts, and what they do. Specifically important for the addition of permissions to posts.

knowledge_repo script review

The knowledge_repo script has grown organically since it was created, and there are a few confusing properties of the script (such as when to use the -p option with the add action). It will be thoroughly reviewed, and updated accordingly.

[Windows] Allow specification of R executable

Windows users often don't have R on their path, so having some way to set it by an environment variable (for example) would probably help quite a bit in terms of usability on this platform.

knowledge repo content refresh

The only way I found at the moment to have a content up to date served by the webapp is to stop it and redeploy it. Is there a workaround to get a fresh content, like could we force the webapp to refresh the content every x hours?

Improve documentation around the web editor

Auto-reviewers: @NiharikaRay @matthewwardrop @earthmancash @danfrankj

Hi knowledge repo folks! I'm trying to understand the publish semantics and recommended procedure. Playing around has lead me to the conclusion that there's no documented way to publish a post and have it show up on the main UI without a restart of the web server. Jotting down my understanding here:

  • There are two different data stores relevant to the knowledge repo: (1) the git repository or database that contains a permanent record of posts and (2) the database used by the web server.
  • The web server only pulls from the permanent store the first time a request comes in, so updates to the permanent store only get propagated to the UI on restart.
  • There's a web editor that allows publishing posts in real time, but not documentation on how to launch and/or access it.

Is that understanding correct? I'd be happy to contribute code or documentation around this if anyone has recommendations on what the right path is. Thanks again for this awesome piece of software.

Expose source files for download and/or linking in the web ui

I was wondering if you had a system for sharing academic papers within the knowledge repo or do you use some different internal system?

I'm thinking specifically if a notebook has references, and having a smooth way to link to them that gives not only the journal article itself, but perhaps a summary/relevant parts writeup somewhere else.

Git checkout requires public key be configured with Github

Really excited to try this guys - I have been following your project since you posted about it on the blog and I feel like this solves a lot of problems with sharing insights!

Hoping to try it on macOS sierra, the instructions on the readme threw errors each step of the way.

Step 1 - Git SSH not setup, easier to use HTTPS right?

$ pip install git+ssh://[email protected]/airbnb/knowledge-repo.git[all]
Collecting git+ssh://[email protected]/airbnb/knowledge-repo.git[all]
  Cloning ssh://[email protected]/airbnb/knowledge-repo.git[all] to /var/folders/b6/vb_3jn_n3019dl8tc78wv2cw0000gn/T/pip-pIGqKj-build
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
Command "git clone -q ssh://[email protected]/airbnb/knowledge-repo.git[all] /var/folders/b6/vb_3jn_n3019dl8tc78wv2cw0000gn/T/pip-pIGqKj-build" failed with error code 128 in None

Easier to use HTTPS right?:

$ pip install git+https://github.com/airbnb/knowledge-repo.git

Then again, judging from error messages down the track, I suspect Git SSH is mandatory elsewhere?

Step 2 - Should point to some basic instructions for setting up the Git repo

Running this throws an error.

$ knowledge_repo --repo ./example_repo init
Traceback (most recent call last):
  File "/Users/robertkingston/Documents/knowledge-repo/venv/bin/knowledge_repo", line 207, in <module>
    kr = knowledge_repo.KnowledgeRepository.create_for_uri(args.repo, embed_tooling=embed_tooling)
  File "/Users/robertkingston/Documents/knowledge-repo/venv/lib/python2.7/site-packages/knowledge_repo/repository.py", line 65, in create_for_uri
    return cls._get_subclass_for(scheme).create(uri, **kwargs)
  File "/Users/robertkingston/Documents/knowledge-repo/venv/lib/python2.7/site-packages/knowledge_repo/repositories/gitrepository.py", line 51, in create
    sm = repo.create_submodule(name='embedded_knowledge_repo', path='.resources', url=tooling_repo, branch=tooling_branch)
  File "/Users/robertkingston/Documents/knowledge-repo/venv/lib/python2.7/site-packages/git/repo/base.py", line 306, in create_submodule
    return Submodule.add(self, *args, **kwargs)
  File "/Users/robertkingston/Documents/knowledge-repo/venv/lib/python2.7/site-packages/git/objects/submodule/base.py", line 389, in add
    mrepo = cls._clone_repo(repo, url, path, name, **kwargs)
  File "/Users/robertkingston/Documents/knowledge-repo/venv/lib/python2.7/site-packages/git/objects/submodule/base.py", line 251, in _clone_repo
    clone = git.Repo.clone_from(url, module_checkout_path, **kwargs)
  File "/Users/robertkingston/Documents/knowledge-repo/venv/lib/python2.7/site-packages/git/repo/base.py", line 966, in clone_from
    return cls._clone(git, url, to_path, GitCmdObjectDB, progress, **kwargs)
  File "/Users/robertkingston/Documents/knowledge-repo/venv/lib/python2.7/site-packages/git/repo/base.py", line 912, in _clone
    finalize_process(proc, stderr=stderr)
  File "/Users/robertkingston/Documents/knowledge-repo/venv/lib/python2.7/site-packages/git/util.py", line 155, in finalize_process
    proc.wait(**kwargs)
  File "/Users/robertkingston/Documents/knowledge-repo/venv/lib/python2.7/site-packages/git/cmd.py", line 335, in wait
    raise GitCommandError(self.args, status, errstr)
git.exc.GitCommandError: 'git clone -b master --separate-git-dir=/Users/robertkingston/Documents/knowledge-repo/example_repo/.git/modules/embedded_knowledge_repo -v [email protected]:airbnb/knowledge-repo.git /Users/robertkingston/Documents/knowledge-repo/example_repo/.resources' returned with exit code 128
stderr: 'Cloning into '/Users/robertkingston/Documents/knowledge-repo/example_repo/.resources'...
Warning: Permanently added the RSA host key for IP address 'x.x.x.x' to the list of known hosts.
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
'

Advice on setting up a free private git repo would be helpful for less technical users like myself. e.g.:

https://confluence.atlassian.com/bitbucket/create-a-git-repository-759857290.html

This might be good to add to the start of the guide as a "Prerequisites" list, along with a notice at step 2.

Add metadata template to existing content files

Hello, I've just started looking at knowledge-repo and I think we'll be adopting it at my organization.

Feature request:
If the user already has a content file (e.g. ipynb notebook) they would like to add to the repo, they have to manually add the metadata. It would be nice to have something like this, which could add the sample metadata template to the file, so the user can fill it in:

knowledge_repo add_tags /path/to/file.ipynb

pip install command throws error

Running pip install git+ssh://[email protected]/airbnb/knowledge-repo.git[all] gives the following error

Collecting git+ssh://[email protected]/airbnb/knowledge-repo.git[all]
  Cloning ssh://[email protected]/airbnb/knowledge-repo.git[all] to /var/folders/l4/2hk3q0sj5xx3j19r0ktrf23r0000gn/T/pip-xewbmm-build
fatal: remote error:
  %s is not a valid repository name
  Email [email protected] for help
Command "git clone -q ssh://[email protected]/airbnb/knowledge-repo.git[all] /var/folders/l4/2hk3q0sj5xx3j19r0ktrf23r0000gn/T/pip-xewbmm-build" failed with error code 128 in None

[Windows] error in knitting process Rmd

I modified step 5

5. Add your post to the repo with path project/example

knowledge_repo --repo ./example_repo add example_post.ipynb -p project/example

To work with Rmd

knowledge_repo --repo ./example_repo add example_post.Rmd -p project/example

However I recieved the folowing error

Error: '\R' is an unrecognized escape in character string starting "'D:\R"
Execution halted
Traceback (most recent call last):
  File "C:\Users\roel\AppData\Local\Programs\Python\Python35-32\Scripts\knowledge_repo", line 248, in <module>
    kp = knowledge_repo.KnowledgePost.from_file(args.filename, src_paths=args.src)
  File "c:\users\roel\appdata\local\programs\python\python35-32\lib\site-packages\knowledge_repo\post.py", line 347, in from_file
    kp = KnowledgePostConverter.for_file(cls(), filename, format=format, postprocessors=postprocessors).from_file(filename, **opts)
  File "c:\users\roel\appdata\local\programs\python\python35-32\lib\site-packages\knowledge_repo\converter.py", line 43, in wrapped
    f(*args, **kwargs)
  File "c:\users\roel\appdata\local\programs\python\python35-32\lib\site-packages\knowledge_repo\converters\rmd.py", line 24, in from_file
    subprocess.check_output(runcmd, shell=True)
  File "c:\users\roel\appdata\local\programs\python\python35-32\lib\subprocess.py", line 626, in check_output
    **kwargs).stdout
  File "c:\users\roel\appdata\local\programs\python\python35-32\lib\subprocess.py", line 708, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'R --vanilla --slave -e "library(knitr); setwd('D:\RmhDocs\Documents\docs\actief\knowledge-repo');                         x = knit('D:\RmhDocs\Documents\docs\actief\knowledge-repo\example_post.Rmd', 'C:\Users\roel\AppData\Local\Temp\tmpwa3xe977', quiet=T)" ' returned non-zero exit status 1

I think somewhere in the script knowledge_repo\converters\rmd.py the problem resides

Intra-linking format

Could you elaborate on how to point from a notebook/md file to the ressources in orig_src?

Let's say my knowledge repo is foo, my project is bar, my post is post. What url should I use in order to render an image? I've tried to copy the format suggested in #45 but this doesn't work:

![text](foo:bar/post.kp/orig_src/ressource.png)

PyPDF2 not included in pip dependencies

PyPDF2 is not included in the pip dependencies so the following quick start line fails (if you don't have this library installed):

knowledge_repo --repo ./example_repo preview project/example

Params do no get passed correctly to pagination

If you click on an author in a knowledge post, you arrive at a feed for the author. Clicking the next arrow returns you to the second page of the feed, without the author being set.

This is a bigger issue with how to deal with parameters (currently done hackily), I'll take this fix on.

Unable to render knowledge post due to JSON parse syntax error

I'm following the docs line-by-line and when attempting to preview a knowledge post using the following command:

knowledge_repo preview projects/test_knowledge

I get the following error:

image

Which is being raised by the following call to JSON.parse in the base.html template on line 138.

Appears that doing a GET on the /ajax_post_typeahead route is returning the full HTML contents of the base.html template instead of a JSON payload of the typeahead data described here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.