Git Product home page Git Product logo

gmailattachmentsextractor's People

Contributors

tewu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

drupov nitrogt7

gmailattachmentsextractor's Issues

limit of 100 emails?

It seems that if the Gmail filter returns more than 100 emails, GmailAttachmentsExtractor only finds and processes the first 100 (latest by date?).

java -jar GmailAttachmentsExtractor.jar --min-size 100k -l larger10MB  larger:10M larger10MB

Feb 21, 2021 7:28:26 PM com.google.api.client.util.store.FileDataStoreFactory setPermissionsToOwnerOnly
WARNING: unable to change permissions for everybody: C:\Users\...\GmailAttachmentsExtractor_v1.0.1\tokens
Feb 21, 2021 7:28:26 PM com.google.api.client.util.store.FileDataStoreFactory setPermissionsToOwnerOnly
WARNING: unable to change permissions for owner: C:\Users\...\GmailAttachmentsExtractor_v1.0.1\tokens

Starting Gmail Attachments Extractor v1.0.1
Parameters:
    Query string: larger:10M
    Output directory: C:\Users\...\GmailAttachmentsExtractor_v1.0.1\larger10MB
    Output labels prefix: larger10MB
    Attachment filter:
        File size: min 100,000 bytes
Creating output labels 'larger10MB [pre]' and 'larger10MB [post]'
Query 'larger:10M' matched 100 email messages
1/100 (1%) | Processing email '...'
...

But I can see in Gmail that "larger:10M" filter returns ~170 emails.

Thank you!

creates multiple copies of the email as it gets tripped by its own .yml attachment

Czesc Tomasz,
Thank you for this lifesaver. However, the program stops in the middle for various reasons (if there are say weird characters in the file attachments it is downloading; e.g. found valid base64 character after a padding character (=)).
It has happened many times that when I restart it, after removing the labels and renaming the download folder, of course - it seems to process the already processed emails even when though it does not clearly match the parameters (e.g. size larger than 1M) and it seems to be creating duplicates of already processed emails with empty attachment files ending in .yml.yml.yml etc.

I had around 10.5k emails, now it is showing 3 times more around 30k emails. Is there a way to dedupe this? or should I have to manually search and delete any email with a more than one .yml (i.e., .yml.yml, or .yml.yml.yml)?

It seems kind of silly that it does not recognize that it has processed an email if it already has a .yml file with 0 or whatever small arbitrary size.

Hope you can help me out. Dziękuję

Support for Unicode characters

I wonder if email subjects containing Unicode characters (from non-latin languages) can be preserved in the corresponding name of the folder for the attachments? I think Unicode characters are substituted by "_" in the folder name at the moment.

Just to be clear - the attachment files containing Unicode characters in the name are saved fine, with the same name as in the email.

Otherwise I am really happy with v1.0.2 - works very well, I didn't encounter any problems so far.
Thank you for your efforts!

Missing 'Subject' header causes NullPointerException

I used Gmail API to find this message and it has no subject.

268/~657 (40%) | Processing email 14df9d22d4610a74
java.lang.NullPointerException: Cannot invoke "String.length()" because "name" is null
        at pl.geek.tewu.gmail_attachments_extractor.Utils.sanitizeFSName(Utils.java:76)
        at pl.geek.tewu.gmail_attachments_extractor.Utils.sanitizeDirName(Utils.java:72)
        at pl.geek.tewu.gmail_attachments_extractor.GmailAttachmentsExtractor.createDirForAttachments(GmailAttachmentsExtractor.java:255)
        at pl.geek.tewu.gmail_attachments_extractor.GmailAttachmentsExtractor.extractAttachments(GmailAttachmentsExtractor.java:142)
        at pl.geek.tewu.gmail_attachments_extractor.Main.call(Main.java:45)
        at pl.geek.tewu.gmail_attachments_extractor.Main.call(Main.java:14)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1783)
        at picocli.CommandLine.access$900(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2150)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2144)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2108)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:1975)
        at picocli.CommandLine.execute(CommandLine.java:1904)
        at pl.geek.tewu.gmail_attachments_extractor.Main.main(Main.java:29)

Read timed out

Can there be a retry on this error?

proceeding to the next email
java.net.SocketTimeoutException: Read timed out
        at java.base/sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:283)
        at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:309)
        at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:350)
        at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:803)
        at java.base/java.net.Socket$SocketInputStream.read(Socket.java:981)
        at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:478)
        at java.base/sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:472)
        at java.base/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:70)
        at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1434)
        at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1038)
        at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
        at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:343)
        at java.base/sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:754)
        at java.base/sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:689)
        at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1623)
        at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1528)
        at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527)
        at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:308)
        at com.google.api.client.http.javanet.NetHttpResponse.<init>(NetHttpResponse.java:37)
        at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:105)
        at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
        at pl.geek.tewu.gmail_attachments_extractor.GmailAttachmentsExtractor.extractAttachments(GmailAttachmentsExtractor.java:111)
        at pl.geek.tewu.gmail_attachments_extractor.Main.call(Main.java:45)
        at pl.geek.tewu.gmail_attachments_extractor.Main.call(Main.java:14)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1783)
        at picocli.CommandLine.access$900(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2150)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2144)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2108)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:1975)
        at picocli.CommandLine.execute(CommandLine.java:1904)
        at pl.geek.tewu.gmail_attachments_extractor.Main.main(Main.java:29)

ParseException: expected ';', got "["

First thanks a lot for your tool! I got the following error when calling java -jar GmailAttachmentsExtractor.jar larger:5M ~/Dokumente/EMailAttachments using the precompiled 1.0.3 JAR from github.

517/~611 (84%) | Processing email '[C.A.R.] Thank you!'
    Extracting 3 attachment(s) to directory '2008.05.12 12_59_25 [C.A.R.] Thank you!'
javax.mail.internet.ParseException: In parameter list <; filename=IMG_0907[1]>, expected ';', got "["
        at javax.mail.internet.ParameterList.<init>(ParameterList.java:314)
        at javax.mail.internet.ContentDisposition.<init>(ContentDisposition.java:109)
        at javax.mail.internet.MimeBodyPart.getFileName(MimeBodyPart.java:1294)
        at javax.mail.internet.MimeBodyPart.getFileName(MimeBodyPart.java:549)
        at pl.geek.tewu.gmail_attachments_extractor.Utils.getPartFileName(Utils.java:208)
        at pl.geek.tewu.gmail_attachments_extractor.GmailAttachmentsExtractor.extractAttachments(GmailAttachmentsExtractor.java:154)
        at pl.geek.tewu.gmail_attachments_extractor.Main.call(Main.java:45)
        at pl.geek.tewu.gmail_attachments_extractor.Main.call(Main.java:14)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1783)
        at picocli.CommandLine.access$900(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2150)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2144)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2108)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:1975)
        at picocli.CommandLine.execute(CommandLine.java:1904)
        at pl.geek.tewu.gmail_attachments_extractor.Main.main(Main.java:29)

❯ java -jar GmailAttachmentsExtractor.jar --version
1.0.3
❯ java --version
openjdk 17.0.3 2022-04-19
OpenJDK Runtime Environment (build 17.0.3+7-Debian-1deb11u1)
OpenJDK 64-Bit Server VM (build 17.0.3+7-Debian-1deb11u1, mixed mode, sharing)

only process 500 emails?

I ran the latest extractor and it only does max. 500 messages. Seems it only does 1 page of the search result?

C:\GmailAttachmentsExtractor_v1.0.1> java -jar GmailAttachmentsExtractor.jar --mime-type 'image|video|audio' larger:1M

=== SUMMARY ===
Processed 500 email(s)
Extracted attachments from 0 email(s)
Extracted 0 attachment(s)
Total extracted attachments size: 0 bytes
Extracted attachments types: []
NOT extracted (filtered) attachments types: [application/octet-stream x 300]

Please enhance your code to look at response for the presence of nextPageToken:

{
  "messages": [
    {
      "id": "177f47bb76a1f474",
      "threadId": "177f47bb75a1f484"
    },
    {
      "id": "177acb8f79ef9158",
      "threadId": "177acb8f79df9152"
    }
  ],
  "nextPageToken": "08833574449120401647",
  "resultSizeEstimate": 94
}

If it exists, please submit a new api call including the value as parameter pageToken, and loop until the response has no nextPageToken:

{
  "messages": [
    {
      "id": "177562523a3c9083",
      "threadId": "177562523a2c9086"
    },
    {
      "id": "1773c736f8b81694",
      "threadId": "1773a736f8c81694"
    }
  ],
  "resultSizeEstimate": 10
}

WARNING: unable to change permissions for everybody

D:\Programs\GmailAttachmentsExtractor>java -jar GmailAttachmentsExtractor.jar --mime-type 'video' has:attachment v
Sep 26, 2020 10:19:03 PM com.google.api.client.util.store.FileDataStoreFactory setPermissionsToOwnerOnly
WARNING: unable to change permissions for everybody: D:\Programs\GmailAttachmentsExtractor\tokens
Sep 26, 2020 10:19:03 PM com.google.api.client.util.store.FileDataStoreFactory setPermissionsToOwnerOnly
WARNING: unable to change permissions for owner: D:\Programs\GmailAttachmentsExtractor\tokens

Starting Gmail Attachments Extractor v1.0.0
Parameters:
Query string: has:attachment
Output directory: D:\Programs\GmailAttachmentsExtractor\v
Output labels prefix: Cleanup
Attachment filter:
MIME type regex: ^('video').*

Labels 'Cleanup [pre]' and/or 'Cleanup [post]' already exist. Running this program when this labels already exist might lead to confusing results. Please provide different output labels prefix and try again. Note that removing those labels is probably not a good solution, as it may prevent you from distinguishing between emails with attachments and its copies without attachments - Terminating.

Items from in:sent should not be tagged to in:inbox

I am deleting attachments that I sent out, so the emails in question are those from in:sent.

However, the output of the program adds the 'inbox' label to all cleanup[pre] and cleanup[post].

The program output is the first two. The original email is the 3rd in the picture.

Screenshot 2023-08-21 at 14 27 49

Unfortunately, we cannot batch deselect the inbox label from the gmail web app, because I think inbox is not part of the label list. So the only solution would be to open each email and remove the inbox tag.

Edit: SORRY, I just ran it again, and this issue didn't come up. I think it might have been my mistake when I first tried it. Sorry! You can close this issue. Apologies!

.eml .ics files support

Thank you for the great tool.
Is it possible to add support for attached emails (.eml) and calender invites (.ics)?

Java run crashes when there is no subject, or subject starts with a space

When GmailAttachmentsExtractor encounters an email with no subject or a subject starting with a space, it produces the following output and stops:

8/87 (9%) | Processing email ''
java.nio.file.InvalidPathException: Trailing char < > at index 19: 2020.03.08 17_04_46
        at sun.nio.fs.WindowsPathParser.normalize(Unknown Source)
        at sun.nio.fs.WindowsPathParser.parse(Unknown Source)
        at sun.nio.fs.WindowsPathParser.parse(Unknown Source)
        at sun.nio.fs.WindowsPath.parse(Unknown Source)
        at sun.nio.fs.WindowsFileSystem.getPath(Unknown Source)
        at sun.nio.fs.AbstractPath.resolve(Unknown Source)
        at pl.geek.tewu.gmail_attachments_extractor.GmailAttachmentsExtractor.createDirForAttachments(GmailAttachmentsExtractor.java:241)
        at pl.geek.tewu.gmail_attachments_extractor.GmailAttachmentsExtractor.extractAttachments(GmailAttachmentsExtractor.java:140)
        at pl.geek.tewu.gmail_attachments_extractor.Main.call(Main.java:45)
        at pl.geek.tewu.gmail_attachments_extractor.Main.call(Main.java:14)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1783)
        at picocli.CommandLine.access$900(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2150)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2144)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2108)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:1975)
        at picocli.CommandLine.execute(CommandLine.java:1904)
        at pl.geek.tewu.gmail_attachments_extractor.Main.main(Main.java:29)

The command that was used is the following:

java -jar GmailAttachmentsExtractor.jar --min-size 100k -l larger20MB_again2  larger:20M larger20MB_again2

Since I have quite a few emails with photos that don't have a subject, this bug unfortunately prevents me from using GmailAttachmentsExtractor for automatic extraction of attachments, since it stops every time on an email with empty subject field and I have to deal with it manually. Otherwise it is a great idea and I hope it is not too difficult to fix this issue.

Thank you!

NullPointer exception

First of all, thanks a million for your script. It really helped me clean my inbox.
I've come across this error: https://i.imgur.com/Jxd907C.png
I don't know if the issue is caused by the ampersand and double // in the mail subject, no idea about it. I hope the above screenshot helps you pinpoint the problem. I also understand that it might not be worth investigating. No prob at all.
Cheers,
Marco

java.lang.NullPointerException

GmailAttachmentsExtractor crashes on some emails. It does download attachments, but the copy email without attachments is not created, and the whole run is stopped. If I run the same command again it will crash on exactly the same email. I don't see any pattern in the emails that cause the crash.

72/74 (97%) | Processing email 'Re: 10 March ?'
    Extracting 2 attachment(s) to directory '2008.02.06 18_51_17 Re_ 10 March _'
    Attachment saved: KTE_FINAL.pdf
    Attachment saved: KText.pdf
    Inserting copy of email without extracted attachments to Gmail
java.lang.NullPointerException
        at pl.geek.tewu.gmail_attachments_extractor.GmailAttachmentsExtractor.extractAttachments(GmailAttachmentsExtractor.java:196)
        at pl.geek.tewu.gmail_attachments_extractor.Main.call(Main.java:45)
        at pl.geek.tewu.gmail_attachments_extractor.Main.call(Main.java:14)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1783)
        at picocli.CommandLine.access$900(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2150)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2144)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2108)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:1975)
        at picocli.CommandLine.execute(CommandLine.java:1904)
        at pl.geek.tewu.gmail_attachments_extractor.Main.main(Main.java:29)

Also as a general idea, it would be good to introduce some error handling, so that the run doesn't stop on an error, but continues processing the remaining emails.

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.