Git Product home page Git Product logo

mairix's Introduction

mairix is a program for indexing and searching email messages stored in
Maildir, MH, MMDF or mbox folders.

* Indexing is fast.  It runs incrementally on new messages - any particular
  message only gets scanned once in the lifetime of the index file.

* The search mode populates a "virtual" folder with symlinks(*) which
  point to the real messages.  This folder can be opened as usual in your mail
  program.

* The search mode is very fast.

* Indexing and searching works on the basis of words.  The index file tabulates
  which words occur in which parts (particular headers + body) of which
  messages.

The program is a very useful complement to mail programs like mutt
(http://www.mutt.org/, which supports Maildir, MH and mbox folders) and
Sylpheed (which supports MH folders).

The original author of mairix is Richard P. Curnow <[email protected]>.
It is maintained since 2017 by Kim Vandry <[email protected]>.

[(*) where the input or output folder is an mbox, a copy of the message is made
instead of symlinking.]

*********************************************************************
 Copyright (C) Richard P. Curnow  2002-2004
 Copyright (C) Richard P. Curnow & Kim Vandry & contributors  2017-
 
 This program is free software; you can redistribute it and/or modify
 it under the terms of version 2 of the GNU General Public License as
 published by the Free Software Foundation.
 
 This program is distributed in the hope that it will be useful, but
 WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 General Public License for more details.
 
 You should have received a copy of the GNU General Public License along
 with this program; if not, write to the Free Software Foundation, Inc.,
 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
 
*********************************************************************

Suggestions, bug reports, experiences, praise, complaints are welcom on
the mailing list or as issues or pull requests at
https://github.com/vandry/mairix

Since July 2006, there is a mairix-users mailing list.  To subscribe or to view
the archives, visit

   https://lists.sourceforge.net/lists/listinfo/mairix-users 

The main website for mairix is

   https://github.com/vandry/mairix

ACKNOWLEDGEMENTS
================

See the ACKNOWLEDGEMENTS file

mairix's People

Contributors

clausa avatar dfandrich avatar dscho avatar edgewood avatar foxharp avatar jikamens avatar jsagarribay avatar makoshark avatar mika-fischer avatar mlichvar avatar okapia avatar peterjeremy avatar psoberoi avatar radhermit avatar rc0 avatar rhertzog avatar samueltardieu avatar slumos avatar snarkophilus avatar spwhitton avatar vandry avatar weisslj avatar yurivict avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mairix's Issues

Odd numbering of search results with mformat=mh

Hi there,

[using Debian's mairix 0.24-2]

mairix looks amazingly useful; I'm embarrassed to have not tried it many years ago.

One question, on something I can't find any information on in the man pages: search results in MH format ("mformat=mh") produce a folder with really wild message numbers โ€” e.g. my last search got results numbered 7091 and 7094. There's nothing to stop me renumbering the folder after running a search, but it seems odd; is there a reason for this, or a way to make mairix number them consecutively from 1?

Best,
Conrad

"Out of memory" error on broken mailbox

I'm getting an out of memory error on a mailbox that mutt messed up. I searched for an hour to identify the message
that causes the bug and then I condensed it to the attached file test-mailbox that triggers the error below (unzip the file of course).

This is on a fresh compilation from github HEAD.

$ rm /tmp/mairix.database.*; mairix -v -F -p -f /tmp/.mairixrc 2>&1 
mairix DEVELOPMENT, Copyright (C) 2002-2010 Richard P. Curnow
mairix comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions; see the GNU General Public License for details.

Finding all currently existing messages...
Reading existing database...
Checking message path integrity
Checking to
Checking cc
Checking from
Checking subject
Checking body
Checking attachment_name
Loaded 1 existing messages
Scanning mbox /tmp/test-mailbox : 100% done
1 newly dead messages, 1 messages now dead in total
Out of memory (at rfc822.c:465, -203 bytes)

test-mailbox.zip

Assertion in db.c fails, leading to abort trap

Hi

 $ mairix -v -p

mairix DEVELOPMENT, Copyright (C) 2002-2010 Richard P. Curnow
mairix comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions; see the GNU General Public License for details.

Finding all currently existing messages...
Reading existing database...
Assertion failed: (nt->match1.highest < n_msgs), function import_toktable2, file db.c, line 412.
Abort trap: 6

The db in question is approx. 24M, if that's relevant; here's what it looks like after a fresh indexing run:

Wrote 32603 messages (652060 bytes of tables, 2899041 bytes of text)
Wrote 0 mbox headers (0 bytes of tables, 0 bytes of paths)
Wrote 0 bytes of mbox message checksums
To: Wrote 11795 tokens (94360 bytes of tables, 170867 bytes of text, 259613 bytes of hit encoding)
Cc: Wrote 3563 tokens (28504 bytes of tables, 48724 bytes of text, 41836 bytes of hit encoding)
From: Wrote 7844 tokens (62752 bytes of tables, 106980 bytes of text, 225559 bytes of hit encoding)
Subject: Wrote 11748 tokens (93984 bytes of tables, 88691 bytes of text, 206157 bytes of hit encoding)
Body: Wrote 579794 tokens (4638352 bytes of tables, 5200170 bytes of text, 7717808 bytes of hit encoding)
Attachment Name: Wrote 5522 tokens (44176 bytes of tables, 129209 bytes of text, 23229 bytes of hit encoding)
(Threading): Wrote 34405 tokens (275240 bytes of tables, 1705835 bytes of text, 301234 bytes of hit encoding)

changing max number of mailboxes or max number of messages per mailbox

Back in August 2016 in the rc0/mairix git repository, spwhitton asked about increasing the number of mailboxes. I just replied:

The number of mailboxes and the number of messages in each mailbox are stored in the same unsigned integer (32 bits), with 16 bits used for each number. The number of mailboxes is in the upper 16 bits; the number of messages per mailbox is in the lower 16 bits.

It's not hard to re-proportion the number of bits used for each, i.e. by decreasing the number of mailboxes to increase the number of messages or vice-versa. You only need to modify encode_mbox_indices() and decode_mbox_indices() in mbox.c.

I did this in an earlier version of mairix, reducing the number of mailboxes to 8 bits and increasing the number of messages to 24 bits.

Of course you have to rebuild all your mairix index files if you do this.

Here's the approximate diff I used for mairix 0.22 mbox.c:

1027c1059,1063
< unsigned int encode_mbox_indices(unsigned int mb, unsigned int msg)/*{{{*/
---
> #define SHIFTBITS 24  /* how many bits to use to count messages */
> #define SHIFTMASKMBS ((1<<(32-SHIFTBITS))-1)
> #define SHIFTMASKMSGS ((1<<SHIFTBITS)-1)
>
> inline unsigned int encode_mbox_indices(unsigned int mb, unsigned int msg)/*{{{*/
1029,1031c1065
<   unsigned int result;
<   result = ((mb & 0xffff) << 16) | (msg & 0xffff);
<   return result;
---
>   return (mb << SHIFTBITS) | msg;
1034c1068
< void decode_mbox_indices(unsigned int index, unsigned int *mb, unsigned int *msg)/*{{{*/
---
> inline void decode_mbox_indices(unsigned int myindex, unsigned int *mb, unsigned int *msg)/*{{{*/
1036,1037c1070,1071
<   *mb = (index >> 16) & 0xffff;
<   *msg = (index & 0xffff);
---
>   *mb = (myindex >> SHIFTBITS) & SHIFTMASKMBS;
>   *msg = (myindex & SHIFTMASKMSGS);
1044,1045c1078,1080
<   if (db->n_mboxen > 65536) {
<     fprintf(stderr, "Too many mboxes (max 65536, you have %d)\n", db->n_mboxen);
---
>   if (db->n_mboxen >= SHIFTMASKMBS) {
>     fprintf(stderr, "Too many mboxes (max %d, you have %d)\n",
>               SHIFTMASKMBS, db->n_mboxen);
1050,1052c1085,1087
<     if (db->mboxen[i].n_msgs > 65536) {
<       fprintf(stderr, "Too many messages in mbox %s (max 65536, you have %d)\n",
<               db->mboxen[i].path, db->mboxen[i].n_msgs);
---
>     if (db->mboxen[i].n_msgs >= SHIFTMASKMSGS) {
>       fprintf(stderr, "Too many messages in mbox %s (max %d, you have %d)\n",
>               db->mboxen[i].path, SHIFTMASKMSGS, db->mboxen[i].n_msgs);

Segfault in make_nvp

The following minimal example will trigger a segfault during indexing:

Content-Type: application/pdf; name*=UTF-8''filename.pdf;

foo

Backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x000055555556b085 in make_nvp (src=src@entry=0x555555778ac0 <result>, 
    s=0x55555577f43d " application/pdf; name*=UTF-8''filename.pdf;", 
    s@entry=0x55555577f430 "Content-Type: application/pdf; name*=UTF-8''filename.pdf;", pfx=pfx@entry=0x555555570a1c "content-type:") at nvp.c:279

#0  0x000055555556b085 in make_nvp (src=src@entry=0x555555778ac0 <result>, 
    s=0x55555577f43d " application/pdf; name*=UTF-8''filename.pdf;", 
    s@entry=0x55555577f430 "Content-Type: application/pdf; name*=UTF-8''filename.pdf;", pfx=pfx@entry=0x555555570a1c "content-type:") at nvp.c:279
        current_state = 2
        tok = <optimized out>
        q = 0x55555577f469 ""
        tempsrc = 0x0
        tempdst = 0x0
        qq = <optimized out>
        name = 0x0
        minor = 0x55555577f520 "UTF-8"
        value = 0x0
        copy_start = 0x55555577f468 ";"
        last_action = GOT_NAMEVALUE_CSET
        current_action = <optimized out>
        last_copier = COPY_NOWHERE
        result = <optimized out>
        pfxlen = <optimized out>
#1  0x000055555555d114 in data_to_rfc822 (
    src=src@entry=0x555555778ac0 <result>, 
    data=0x7fcadd6b5000 "Content-Type: application/pdf; name*=UTF-8''filename.pdf;\n\nfoo\n\n", length=64, error=error@entry=0x0) at rfc822.c:1031
        body_start = 0x7fcadd6b503b "foo\n\n"
        header = {next = 0x55555577f480, prev = 0x55555577f480, 
          text = 0x3499309 <error: Cannot access memory at address 0x3499309>}
        x = 0x55555577f480
        nx = <optimized out>
        ct_nvp = 0x0
        cte_nvp = 0x0
        cd_nvp = 0x0
        nvp = <optimized out>
        body_len = <optimized out>
#2  0x000055555555ead4 in make_rfc822 (
    filename=filename@entry=0x55555577f410 "/tmp/bug.mh/1") at rfc822.c:1435
        len = 64
        data = 0x7fcadd6b5000 "Content-Type: application/pdf; name*=UTF-8''filename.pdf;\n\nfoo\n\n"
        result = 0x0
#3  0x000055555555b9e3 in scan_new_messages (imapc=<optimized out>, 
    start_at=<optimized out>, db=<optimized out>) at db.c:755
        msg = 0x0
        len = <optimized out>
        i = 0
#4  update_database (db=0x55555577b6a0, sorted_paths=<optimized out>, 
    n_msgs=<optimized out>, do_fast_index=<optimized out>, 
    imapc=<optimized out>) at db.c:1085
        matched_index = <optimized out>
        i = <optimized out>
        any_new = 1
        n_newly_pruned = <optimized out>
        n_already_dead = <optimized out>
        __PRETTY_FUNCTION__ = "update_database"
#5  0x00005555555574e7 in main (argc=<optimized out>, argv=<optimized out>)

Note that value is 0.

BTW, thank you very much for picking up maintainership! mairix is such a useful tool, and still makes the smallest indexes I know of.

Tagging a release

hi kim. could you tag a release so that matrix can be packaged easily?
Something like

git tag -a v0.24 -m "kim's version"

thanks

Path searching is not documented in mairix.1

I discovered mairix supports searching for paths using "p:..." while looking at the source code. Although it appears in "mairix --help", it isn't documented in the manual, and I've been using mairix for years occasionally bemoaning the inability to refine my search using filenames when the feature has apparently existed for a very long time.

Lots of headers that can't be parsed

I just downloaded the .zip file and compiled mairix. When I run it (and this is the same as V0.24), I get many complaints about headers that can't be parsed. For example:

Header 'content-type: image/*; name="20221017_130844_resized.jpg"' in [89420989,90670144) could not be parsed

I'm not a mail wizard, but that looks OK to me.

A more lengthy example:

Header 'content-disposition: inline; filename="image004.png"; size=79197; creation-date=Fri, 06 May 2022 16:51:48 GMT; modification-date=Fri, 06 May 2022 20:09:01 GMT' in [28093802,28267769) could not be parsed

Q1: Is it just me, or is this happening to other people?

Q2: Are these complaints valid, or are they spurious?

Thanks.

Release 0.25?

Is there a roadmap or list of work to be done prior to a 0.25 release? I would love to have some of the new features added since 0.24, particularly the XZ archive support added last year. (Especially since I started compressing mbox files several years ago and just now noticed that mairix doesn't support them when I went looking for an email I knew was there and mairix couldn't find it.)

If there are tasks that an unprivileged volunteer could do to hasten a new release, I'd be willing to help out.

Explicitly handle SIGPIPE

When running mairix in a pipeline, mairix does not delete the lock file after receiving SIGPIPE, so running something like mairix --excerpt-output a:ericpruitt | less results in Database .../mairixdb appears to be locked by (pid,node,user)=(6387,sinister,ericpruitt) the next time around if less(1) was closed before mairix finished writing data the standard output.

Index "BCC" headers

I think "BCC" email headers should be explicitly indexed. I can't think of any major email service providers or user agents that don't allow users to BCC recipients. I propose using "B:" as the search pattern prefix and modifying the "a:" so that it also implies "B:".

Update NEWS file

Can the NEWS file be updated with a new release? The last release was in 2017, version 0.24. Since then, there have been many new commits. The version 2017 has many bugs that have since been fixed. Many platforms such as Archlinux go by the official release number. So if the NEWS is not updated, then they will point to the last official release, which in this case is OLD.

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.