Git Product home page Git Product logo

Comments (18)

danielquinn avatar danielquinn commented on June 9, 2024 11

Wow, it's so pretty! This is some really nice work Jonas. I don't know what your preference is here, whether you'd like paperless-ng to supplant this project (take over the name, merge into this repo, etc) or if you're just promoting the project as a literal next-generation, but I just wanted to congratulate you on a nice job.

I haven't had time for a technical assessment (assuming you wanted one?) as I've got my hands full with presentations, another side project, and a 2 year old, but as far as I'm concerned this is a community project now. If there's strong support for full adoption of paperless-ng over the current core for v3.0, I'm cool with it. The one thing I'd mention though is that one of the strengths of the current system is that it runs well on low-powered (read Raspberry Pi) systems. If -ng requires more than that and can't be stripped down for such cases, that'd be a good argument for keeping yours as a separate fork.

from paperless.

shamoon avatar shamoon commented on June 9, 2024 3

Lets not derail the conversation too much. The discussion of "proper" encryption is a big (separate) one but I think anyone who looks at this closely would agree the encryption as it stands in paperless is in fact a false sense of security, which is why @jonaswinkler chose to remove it (a decision I agree with). The point is IMHO that -ng having removed encryption should not be a barrier to using -ng as the continuation of the project, its not a feature removal if the feature wasnt truly implemented in the first place.

As for the other apparent issue, does someone who uses a RPi as their primary host want to try it out?? Seems like we're so worried about low-resourced systems but most of the people commenting here aren't actually using one πŸ˜„. If its a major part of the user base then we should be able to find some folks and find out?!

from paperless.

OliveiraHermogenes avatar OliveiraHermogenes commented on June 9, 2024 2

All NG is missing is the userbase of paperless. I kind of feel sorry for all the users who find paperless today and start with it, not knowing there's NG.

Well, I am doing that right now... :-) I am fully aware of the existence of NG, however.

As for the other apparent issue, does someone who uses a RPi as their primary host want to try it out?? Seems like we're so worried about low-resourced systems but most of the people commenting here aren't actually using one. If its a major part of the user base then we should be able to find some folks and find out?!

Or would there any reason for anybody to prefer paperless over paperless-ng?

Just after learning about paperless, I found this issue and decided to try out NG directly on my RPi4. I didn't manage to set it up, however. Tried it with and without a virtual environment. Version 0.9.11 would not work at all, some python dependency hell, apparently. The dependency problems disappeared in versions 0.9.12 and later, but it throws a missing module in PIL when importing documents. After battling with it for some time, I gave up and installed paperless instead. I was able to get it up and running perfectly within 10 minutes. Now I should say that this RPi4 is running Debian sid and python 3.9. So this might be the source of the problems with NG. Paperless works perfectly, however. So, I am sticking with it for the time being. I am not well versed in python programming, but It seems strict PR review does have its benefits after all.

from paperless.

shamoon avatar shamoon commented on June 9, 2024 2

See: jonaswinkler/paperless-ng#456 (comment)

from paperless.

mr-onion-2 avatar mr-onion-2 commented on June 9, 2024 2

Just wanted to add I love this :) Big thanks for sharing @jonaswinkler πŸ‘

I've been using Paperless OG for quite a while but have just switched. Running on a RPi4 via the latest multi-arch image through K8S and working perfectly.

from paperless.

jonaswinkler avatar jonaswinkler commented on June 9, 2024 1

Up to you. This fork will see some active development in the foreseeable future and I'm pushing for a first stable release. The last thing I want to get into there before that is the ability to add selectable text to scanned documents, both for new documents as well as documents that are already in the system.

from paperless.

totti4ever avatar totti4ever commented on June 9, 2024 1

So all the things said here make me support the idea of paperless-ng replacing this project, which of course would mean to make Jonas owner.
Paperless really was the basis of my motivation to get rid of all the papers, but paperless-ng was the thing still missing seeing paperless only approving PRs slowly and not really having changes frequently.

All NG is missing is the userbase of paperless. I kind of feel sorry for all the users who find paperless today and start with it, not knowing there's NG.

Or would there any reason for anybody to prefer paperless over paperless-ng?

from paperless.

CkuT avatar CkuT commented on June 9, 2024

Hey !

I am personally very excited by paperless-ng. I was wondering several weeks ago if I would migrate from paperless to papermerge (https://github.com/ciur/papermerge), but your project makes seems to be a good competitor (and will avoid me to write a papermerge/paperless mapping) !

Thanks for your amazing work !

from paperless.

shamoon avatar shamoon commented on June 9, 2024

My opinion is just as an end user and not a dev (Edit: am now contributing, still feel strongly should some day become the next version of paperless) on this project but I have to say Jonas’ work and enthusiasm suggest to me paperless-ng should be merged into the core. There’s a lot of work on that fork under the hood that I think is important to the longevity of the project too.

Very valid concern regarding low powered devices but just my +1 for adopting paperless-ng for v3.0. Bravo Jonas πŸ‘πŸΌ

from paperless.

jonaswinkler avatar jonaswinkler commented on June 9, 2024

Thank you :)

The entire process of making this pretty has been incredibly fun. Also learned a couple things. I've never done any kind of UX work or front end design, I just took a couple libraries, mixed them together and tried to make it work. This bootstrap css framework has some pretty nifty stuff.

Oh, I certainly did not expect a technical assessment, that would be quite a task. I should have made that clear.

I'd rather want to get a feel for what the community feels is best for the future of the project and respect that. I'm fine either way!

Edit for the statement above: This is especially true since the new project does a couple things quite differently and I've chopped off a few things, such as encryption.

Regarding low-powered devices. I've got some good and some not-so-good news. The good news is that the new front end runs entirely in the browser and just uses the API to fetch data. Therefore, the server has to do much less work when serving the pages. The not-so-good news is that one of the new features does occasionally require a little bit more computing power, but that could be scheduled to run during the night. I've made this with the RPi in mind, but haven't extensively tested it on that platform.

Someone got it running on an RPi 4, but I haven't heard anything about performance yet.

from paperless.

kohfuchs avatar kohfuchs commented on June 9, 2024

Hi @jonaswinkler
Thank you so much for your work and effort.
I will put papereless-ng to the test and report to your repo.

from paperless.

jonaswinkler avatar jonaswinkler commented on June 9, 2024

Thanks. I really need some more feedback on what's workable and what need improvement. We're currently working on making the central filtering tools nice, the present implementation is rather bulky.

from paperless.

tido- avatar tido- commented on June 9, 2024

but as far as I'm concerned this is a community project now. ... one of the strengths of the current system is that it runs well on low-powered (read Raspberry Pi) systems.

This is a little bit cheese, isn't it?

@danielquinn , you set the rule that two (2) people have to approve a pull-request. How many people in your 'community' project have the permission to approve? You included three (3) but two of you never approve.
Strength (RPi), the one and only IF the software runs - because of lack of approving of fixing PR's.

Calling it community, doesn't make it so. I think this is unfair towards people who spent time writing PRs.

from paperless.

MasterofJOKers avatar MasterofJOKers commented on June 9, 2024

In total 8 people can approve, as I see it. But I've got the same feeling. I'd like to write a PR at times, but since I feel like we can't make it over the limit of 2 people if one of 2 (sometimes) active reviewers writes the PR, I refrain from doing so. So yeah, it's not so much fun, if you can't fix anything yourself and are limited to only looking at other people's code all the time.

from paperless.

tohn avatar tohn commented on June 9, 2024

Or would there any reason for anybody to prefer paperless over paperless-ng?

Maybe only the better (?) support of low-powered devices and the use of encryption via GPG?

from paperless.

totti4ever avatar totti4ever commented on June 9, 2024

Yes, we should figure out if it's really better in every meaning:

  • low-powered device
    @jonaswinkler, what is the referred function? Some AI bit? Would it be possible to deactivate that in case anybody doesn't it to block the Pi at night?
  • encryption
    That's what I thought, too, when I read that Jonas removed it. But then I looked at the reason and understood that the solution currently implemented by paperless is not really a secure thing, rather a bit pseudo-secure (key under doormat). And from what I understood, too, Jonas would be willing to bring in encryption again once there is a working idea on how to do it

I hope, Daniel, you don't get me wrong when I say that NG might be better in every meaning! I absolutely adore what you have created, but I am super happy that Jonas continued your work instead of starting from scratch like many others. I am sure that is why this is the best solution from my point of view.

from paperless.

jonaswinkler avatar jonaswinkler commented on June 9, 2024
  • low-powered device
    @jonaswinkler, what is the referred function? Some AI bit? Would it be possible to deactivate that in case anybody doesn't it to block the Pi at night?

If you don't use "Auto" matching, the logic in question won't be invoked at all. I don't run this on a Pi, so I have no idea about performance. My gut feeling is that the web UI should be much more responsive.

  • encryption
    That's what I thought, too, when I read that Jonas removed it. But then I looked at the reason and understood that the solution currently implemented by paperless is not really a secure thing, rather a bit pseudo-secure (key under doormat). And from what I understood, too, Jonas would be willing to bring in encryption again once there is a working idea on how to do it

Apart from that, the database stores unencrypted content for searching, even if encryption was enabled. That contains all your personal information from your documents, credit card numbers, addresses, maybe even passwords if sent via postal mail, all the things you purchased, your bank account history, etc.

The way you'd implement security in a system like this would be as follows

  • Encrypt all information with a public/private key system, where documents are encrypted with a public key, and the private key is only ever temporarily provided by the user when doing requests and the private key is never sent to the server. All decryption is done in the browser on the client. This is how lastpass works, for example.
  • However, this would mean that even the server itself does not have access to clear text information. This in turn means that
    • No auto matching, since the server cannot access clear text content to update the algorithm.
    • No full text search index, searching will be slow (always decrypt all content on every request and search within there)

A system like that has to be designed with this concept in mind from the very beginning. It's very unlikely I'll add something like that to paperless. For example, we can't just encrypt all the database fields as well, since

  • This still allows someone to figure out how many documents there are, how many documents from one particular (yet unknown) correspondent. It's possible to derive information even from encrypted data. This is similar to how its possible to derive information from improperly encrypted file systems by examining unused areas.
  • How do we handle file names? These need to be encrypted as well.

There's lots of things involved in doing this properly.

from paperless.

cpfeiffer avatar cpfeiffer commented on June 9, 2024

I have only started reading up on paerless and intend to start using it, but I'd like to comment on the encryption topic.

There are multiple attack vectors; here are four from the top of my head:

  1. someone getting access to the hardware (e.g. computer stolen)
  2. someone getting access to the file system (by attaching a keyboard to your RasPi, through a remote shell, ...)
  3. someone getting access to the database (locally or remotely)
  4. someone getting access to your documents by privilege escalation (i.e. a bug in paperless)
    There's also the posibility of transport-level attacks (e.g. MITM) or malicious admins, but these are separate topics.

To protect against 1), you could use an encrypted filesystem so that someone stealing your computer could not mount it to read the contents. This can be done by everyone already without needing any change in paperless.

For 2) however, an encrypted filesystem does not help, because when the filesystem is mounted, the contents is nicely decrypted. To protect against this, you would need to encrypt the files themselves separately (also the database storage). You would need to decrypt them in-memory only and you would need to make sure that the encryption key is not available to the attacker, e.g. by keeping the key only in memory (if at all). It might still be possible to read the key from memory, but that's a different topic. You would need to ask for the key on every start of paperless, of course.

To protect against 3), you could encrypt the database, so that the contents are unreadable without access to the key. This also covers the database part of 2). See e.g. https://stackoverflow.com/a/5877130 for sqlite encryption.

Protection against 4) on encryption-level is hard. You would need to use separate keys per user, essentially making it impossible for paperless itself to access the data (as you mentioned yourself).

IMHO, an encrypted filesystem (e.g. https://en.wikipedia.org/wiki/EncFS) for the documents and an encrypted database would be sensible options with a "master key" to be provided on startup. If you don't want to protect against 2), you could even store the password for the database encryption inside the encrypted filesystem. That way the user would not need to provide the password for starting paperless (only when mounting the encrypted filesystem). encFS also encrypts filenames, btw.

Good encryption also comes with the price of making sure to never lose the master key, of course.

from paperless.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.