Git Product home page Git Product logo

w3m's People

Contributors

acli avatar akinori-ito avatar arakiken avatar artoria2e5 avatar barbeque avatar bptato avatar crrodriguez avatar dafyddcrosby avatar hattya avatar htrb avatar kcwu avatar mackyle avatar markwright avatar micahcowan avatar mlh85386 avatar n-r-k avatar olafhering avatar pnemade avatar richq avatar rkta avatar rokuyama avatar satodainu avatar sertacyildiz-zz avatar shinra-jp avatar sorairolake avatar tats avatar ttdoda avatar yashlala avatar yshl avatar zhouyangjia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

w3m's Issues

segfault for malform <input> tag

How to reproduce

$ echo '<input type=">">' |  ./w3m -T text/html -dump
segmentation fault (core dumped)
#0  0x000000000044f153 in formUpdateBuffer (a=0xe30000, buf=0xe22e00, form=0xe2ff80) at form.c:458
458                 p = form->value->ptr;
(gdb) p form
$1 = (FormItemList *) 0xe2ff80
(gdb) p form->value
$2 = (Str) 0x0
(gdb) bt
#0  0x000000000044f153 in formUpdateBuffer (a=0xe30000, buf=0xe22e00, form=0xe2ff80) at form.c:458
#1  0x000000000044e9cc in formResetBuffer (buf=0xe22e00, formitem=0xe210e0) at form.c:268
#2  0x000000000042c54e in loadHTMLBuffer (f=0x7ffca7a4db90, newBuf=0xe22e00) at file.c:6750
#3  0x000000000042ec9b in openGeneralPagerBuffer (stream=0xdbf1b0) at file.c:7765
#4  0x0000000000406bcd in main (argc=4, argv=0x7ffca7a4ddb8, envp=0x7ffca7a4dde0) at main.c:923

This is found by afl-fuzz.

Errors building from source

Just now I downloaded a zip file of the sources, and the 'make' has failed twice so far:

1] html.h is missing an "include" for <time.h>

2] terms.c is missing includes for "<term.h>" and "<curses.h>"

3] terms.c is still failing to compile. Here is the current output of my "make".


hgcc  -I. -I. -g -O2 -I./libwc  -DHAVE_CONFIG_H -DAUXBIN_DIR=\"/home/optimum/Projects/w3m/w3m-installed/libexec/w3m\" -DCGIBIN_DIR=\"/home/optimum/Projects/w3m/w3m-installed/libexec/w3m/cgi-bin\" -DHELP_DIR=\"/home/optimum/Projects/w3m/w3m-installed/share/w3m\" -DETC_DIR=\"/home/optimum/Projects/w3m/w3m-installed/etc\" -DCONF_DIR=\"/home/optimum/Projects/w3m/w3m-installed/etc/w3m\" -DRC_DIR=\"~/.w3m\" -DLOCALEDIR=\"/home/optimum/Projects/w3m/w3m-installed/share/locale\"   -c -o terms.o terms.c
In file included from terms.c:17:0:
hash.h:15:27: error: expected ‘:’, ‘,’, ‘;’, ‘}’ or ‘__attribute__’ before ‘->’ token
   struct HashItem_##sym **tab; \
                           ^
hash.h:21:1: note: in expansion of macro ‘defhash’
 defhash(char *, int, si)
 ^
hash.h:15:27: error: expected ‘:’, ‘,’, ‘;’, ‘}’ or ‘__attribute__’ before ‘->’ token
   struct HashItem_##sym **tab; \
                           ^
hash.h:22:1: note: in expansion of macro ‘defhash’
 defhash(char *, char *, ss)
 ^
hash.h:15:27: error: expected ‘:’, ‘,’, ‘;’, ‘}’ or ‘__attribute__’ before ‘->’ token
   struct HashItem_##sym **tab; \
                           ^
hash.h:23:1: note: in expansion of macro ‘defhash’
 defhash(char *, void *, sv)
 ^
hash.h:15:27: error: expected ‘:’, ‘,’, ‘;’, ‘}’ or ‘__attribute__’ before ‘->’ token
   struct HashItem_##sym **tab; \
                           ^
hash.h:24:1: note: in expansion of macro ‘defhash’
 defhash(int, void *, iv)
 ^
proto.h:288:41: error: expected ‘;’, ‘,’ or ‘)’ before ‘->’ token
 extern TabBuffer *deleteTab(TabBuffer * tab);
                                         ^
In file included from fm.h:1242:0,
                 from terms.c:56:
proto.h:470:12: error: conflicting types for ‘initscr’
 extern int initscr(void);
            ^
In file included from terms.c:18:0:
/usr/include/curses.h:646:33: note: previous declaration of ‘initscr’ was here
 extern NCURSES_EXPORT(WINDOW *) initscr (void);    /* implemented */
                                 ^
proto.h:471:13: error: expected ‘)’ before ‘(’ token
 extern void move(int line, int column);
             ^
proto.h:475:13: error: expected ‘)’ before ‘(’ token
 extern void addch(char c);
             ^
In file included from fm.h:1242:0,
                 from terms.c:56:
proto.h:478:26: error: macro "standout" passed 1 arguments, but takes just 0
 extern void standout(void);
                          ^
proto.h:478:13: error: ‘standout’ redeclared as different kind of symbol
 extern void standout(void);
             ^
In file included from terms.c:18:0:
/usr/include/curses.h:776:28: note: previous declaration of ‘standout’ was here
 extern NCURSES_EXPORT(int) standout (void);    /* generated */
                            ^
In file included from fm.h:1242:0,
                 from terms.c:56:
proto.h:479:26: error: macro "standend" passed 1 arguments, but takes just 0
 extern void standend(void);
                          ^
proto.h:479:13: error: ‘standend’ redeclared as different kind of symbol
 extern void standend(void);
             ^
In file included from terms.c:18:0:
/usr/include/curses.h:777:28: note: previous declaration of ‘standend’ was here
 extern NCURSES_EXPORT(int) standend (void);    /* generated */
                            ^
In file included from fm.h:1242:0,
                 from terms.c:56:
proto.h:493:25: error: macro "refresh" passed 1 arguments, but takes just 0
 extern void refresh(void);
                         ^
proto.h:493:13: error: ‘refresh’ redeclared as different kind of symbol
 extern void refresh(void);
             ^
In file included from terms.c:18:0:
/usr/include/curses.h:743:28: note: previous declaration of ‘refresh’ was here
 extern NCURSES_EXPORT(int) refresh (void);    /* generated */
                            ^
In file included from fm.h:1242:0,
                 from terms.c:56:
proto.h:494:23: error: macro "clear" passed 1 arguments, but takes just 0
 extern void clear(void);
                       ^
proto.h:494:13: error: ‘clear’ redeclared as different kind of symbol
 extern void clear(void);
             ^
In file included from terms.c:18:0:
/usr/include/curses.h:603:28: note: previous declaration of ‘clear’ was here
 extern NCURSES_EXPORT(int) clear (void);    /* generated */
                            ^
In file included from fm.h:1242:0,
                 from terms.c:56:
proto.h:502:26: error: macro "clrtoeol" passed 1 arguments, but takes just 0
 extern void clrtoeol(void);
                          ^
proto.h:502:13: error: ‘clrtoeol’ redeclared as different kind of symbol
 extern void clrtoeol(void);
             ^
In file included from terms.c:18:0:
/usr/include/curses.h:606:28: note: previous declaration of ‘clrtoeol’ was here
 extern NCURSES_EXPORT(int) clrtoeol (void);    /* generated */
                            ^
In file included from fm.h:1242:0,
                 from terms.c:56:
proto.h:504:26: error: macro "clrtobot" passed 1 arguments, but takes just 0
 extern void clrtobot(void);
                          ^
proto.h:504:13: error: ‘clrtobot’ redeclared as different kind of symbol
 extern void clrtobot(void);
             ^
In file included from terms.c:18:0:
/usr/include/curses.h:605:28: note: previous declaration of ‘clrtobot’ was here
 extern NCURSES_EXPORT(int) clrtobot (void);    /* generated */
                            ^
In file included from terms.c:18:0:
proto.h:507:13: error: expected ‘)’ before ‘(’ token
 extern void addstr(char *s);
             ^
proto.h:508:13: error: expected ‘)’ before ‘(’ token
 extern void addnstr(char *s, int n);
             ^
In file included from fm.h:1242:0,
                 from terms.c:56:
proto.h:510:24: error: macro "crmode" passed 1 arguments, but takes just 0
 extern void crmode(void);
                        ^
proto.h:511:26: error: macro "nocrmode" passed 1 arguments, but takes just 0
 extern void nocrmode(void);
                          ^
proto.h:520:23: error: macro "getch" passed 1 arguments, but takes just 0
 extern char getch(void);
                       ^
proto.h:520:13: error: ‘getch’ redeclared as different kind of symbol
 extern char getch(void);
             ^
In file included from terms.c:18:0:
/usr/include/curses.h:631:28: note: previous declaration of ‘getch’ was here
 extern NCURSES_EXPORT(int) getch (void);    /* generated */
                            ^
In file included from terms.c:17:0:
proto.h:521:13: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘->’ token
 extern void bell(void);
             ^
terms.c:452:12: error: conflicting types for ‘tgetent’
 extern int tgetent(char *, char *);
            ^
In file included from terms.c:17:0:
/usr/include/term.h:765:28: note: previous declaration of ‘tgetent’ was here
 extern NCURSES_EXPORT(int) tgetent (char *, const char *);
                            ^
terms.c:453:12: error: conflicting types for ‘tgetnum’
 extern int tgetnum(char *);
            ^
In file included from terms.c:17:0:
/usr/include/term.h:767:28: note: previous declaration of ‘tgetnum’ was here
 extern NCURSES_EXPORT(int) tgetnum (NCURSES_CONST char *);
                            ^
terms.c:454:12: error: conflicting types for ‘tgetflag’
 extern int tgetflag(char *);
            ^
In file included from terms.c:17:0:
/usr/include/term.h:766:28: note: previous declaration of ‘tgetflag’ was here
 extern NCURSES_EXPORT(int) tgetflag (NCURSES_CONST char *);
                            ^
terms.c:455:14: error: conflicting types for ‘tgetstr’
 extern char *tgetstr(char *, char **);
              ^
In file included from terms.c:17:0:
/usr/include/term.h:763:31: note: previous declaration of ‘tgetstr’ was here
 extern NCURSES_EXPORT(char *) tgetstr (NCURSES_CONST char *, char **);
                               ^
terms.c:456:14: error: conflicting types for ‘tgoto’
 extern char *tgoto(char *, int, int);
              ^
In file included from terms.c:17:0:
/usr/include/term.h:764:31: note: previous declaration of ‘tgoto’ was here
 extern NCURSES_EXPORT(char *) tgoto (const char *, int, int);
                               ^
terms.c:457:12: error: conflicting types for ‘tputs’
 extern int tputs(char *, int, int (*)(char));
            ^
In file included from terms.c:17:0:
/usr/include/term.h:768:28: note: previous declaration of ‘tputs’ was here
 extern NCURSES_EXPORT(int) tputs (const char *, int, int (*)(int));
                            ^
terms.c:458:1: warning: parameter names (without types) in function declaration
 void clear(), wrap(), touch_line(), touch_column(int);
 ^
In file included from terms.c:18:0:
terms.c:458:6: error: conflicting types for ‘wclear’
 void clear(), wrap(), touch_line(), touch_column(int);
      ^
/usr/include/curses.h:815:28: note: previous declaration of ‘wclear’ was here
 extern NCURSES_EXPORT(int) wclear (WINDOW *);    /* implemented */
                            ^
terms.c:462:19: error: macro "clrtoeol" passed 1 arguments, but takes just 0
 void clrtoeol(void);  /* conflicts with curs_clear(3)? */
                   ^
terms.c:462:6: error: ‘clrtoeol’ redeclared as different kind of symbol
 void clrtoeol(void);  /* conflicts with curs_clear(3)? */
      ^
In file included from terms.c:18:0:
/usr/include/curses.h:606:28: note: previous declaration of ‘clrtoeol’ was here
 extern NCURSES_EXPORT(int) clrtoeol (void);    /* generated */
                            ^
terms.c:1138:1: error: conflicting types for ‘initscr’
 initscr(void)
 ^
In file included from terms.c:18:0:
/usr/include/curses.h:646:33: note: previous declaration of ‘initscr’ was here
 extern NCURSES_EXPORT(WINDOW *) initscr (void);    /* implemented */
                                 ^
terms.c:1161:1: error: expected ‘)’ before ‘(’ token
 move(int line, int column)
 ^
terms.c:1207:1: error: expected ‘)’ before ‘(’ token
 addch(char c)
 ^
terms.c:1413:14: error: macro "standout" passed 1 arguments, but takes just 0
 standout(void)
              ^
terms.c:1414:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
 {
 ^
terms.c:1419:14: error: macro "standend" passed 1 arguments, but takes just 0
 standend(void)
              ^
terms.c:1420:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
 {
 ^
terms.c:1527:13: error: macro "refresh" passed 1 arguments, but takes just 0
 refresh(void)
             ^
terms.c:1528:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
 {
 ^
terms.c:1726:11: error: macro "clear" passed 1 arguments, but takes just 0
 clear(void)
           ^
terms.c:1727:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
 {
 ^
terms.c:1856:14: error: macro "clrtoeol" passed 1 arguments, but takes just 0
 clrtoeol(void)
              ^
terms.c:1857:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
 {    /* Clear to the end of line */
 ^
terms.c:1927:14: error: macro "clrtobot" passed 1 arguments, but takes just 0
 clrtobot(void)
              ^
terms.c:1928:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
 {
 ^
In file included from terms.c:18:0:
terms.c:1950:1: error: expected ‘)’ before ‘(’ token
 addstr(char *s)
 ^
terms.c:1967:1: error: expected ‘)’ before ‘(’ token
 addnstr(char *s, int n)
 ^
terms.c:2013:12: error: macro "crmode" passed 1 arguments, but takes just 0
 crmode(void)
            ^
terms.c:2015:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
 {
 ^
terms.c:2031:14: error: macro "nocrmode" passed 1 arguments, but takes just 0
 nocrmode(void)
              ^
terms.c:2033:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
 {
 ^
terms.c:2134:11: error: macro "getch" passed 1 arguments, but takes just 0
 getch(void)
           ^
terms.c:2135:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
 {
 ^
In file included from terms.c:17:0:
terms.c:2234:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘->’ token
 bell(void)
 ^
<builtin>: recipe for target 'terms.o' failed
make: *** [terms.o] Error 1

Stack seems smashed with large image inside table

How to reproduce

$ echo '<table>0<td rowspan=0 colspan=30><img width=900000 src=0 height=0>'  | ./w3m -T text/html -dump > /dev/null
*** stack smashing detected ***: ./w3m terminated

The behavior is not stable. w3m sometimes crashes and sometimes doesn't.
Usually It just segfault and sometimes stack protector says stack smashed.

I haven't debugged it, so I don't know why it's unstable and how the stack smashed. Following is my steps to compile w3m:

env AFL_HARDEN=1 AFL_USE_ASAN=1 CC=afl-clang-fast ./configure --enable-image=no
make

This is found by afl-fuzz.

invalid write in popValue (textlist.c)

# w3m -version
w3m version w3m/0.5.3+git20161120, options lang=en,m17n,image,color,ansi-color,mouse,menu,cookie,ssl,ssl-verify,external-uri-loader,w3mmailer,nntp,ipv6,alarm,mark
# w3m -T text/html -dump $FILE
ASAN:DEADLYSIGNAL
=================================================================
==23833==ERROR: AddressSanitizer: SEGV on unknown address 0x5bea00000011 (pc 0x000000723b5c bp 0x7ffc49de9380 sp 0x7ffc49de9380 T0)
==23833==The signal is caused by a WRITE memory access.
    #0 0x723b5b in popValue /tmp/w3m-0.5.3-git20161120/textlist.c:58:18
    #1 0x618c93 in print_item /tmp/w3m-0.5.3-git20161120/table.c:602:9
    #2 0x62fa8d in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1944:4
    #3 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #4 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #5 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #6 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #7 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #8 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #9 0x5b2020 in HTMLlineproc0 /tmp/w3m-0.5.3-git20161120/file.c:6452:3
    #10 0x5c1dc6 in completeHTMLstream /tmp/w3m-0.5.3-git20161120/file.c:7022:2
    #11 0x5663b3 in loadHTMLstream /tmp/w3m-0.5.3-git20161120/file.c:7258:5
    #12 0x57aebd in loadHTMLBuffer /tmp/w3m-0.5.3-git20161120/file.c:6781:5
    #13 0x56fadf in loadSomething /tmp/w3m-0.5.3-git20161120/file.c:224:16
    #14 0x56fadf in loadGeneralFile /tmp/w3m-0.5.3-git20161120/file.c:2241
    #15 0x5106a5 in main /tmp/w3m-0.5.3-git20161120/main.c:1017:12
    #16 0x7f1dde66b61f in __libc_start_main /var/tmp/portage/sys-libs/glibc-2.22-r4/work/glibc-2.22/csu/libc-start.c:289
    #17 0x41bd08 in _start (/usr/bin/w3m+0x41bd08)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /tmp/w3m-0.5.3-git20161120/textlist.c:58:18 in popValue
==23833==ABORTING

Testcase: https://github.com/asarubbo/poc/blob/master/00080-w3m-invalidwrite-popValue

infinite recursion in HTMLlineproc0

00000000: 3c74 6162 6c65 3e3c 646c 3e3c 646c 3e3c  <table><dl><dl><
00000010: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000020: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000030: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000040: 646c 3e3c 646c 3e3c 646c 3e3c 446c 3e3c  dl><dl><dl><Dl><
00000050: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000060: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000070: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000080: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000090: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
000000a0: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
000000b0: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
000000c0: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
000000d0: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
000000e0: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
000000f0: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000100: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000110: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000120: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000130: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000140: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000150: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000160: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000170: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000180: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000190: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
000001a0: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
000001b0: 646c 3e3c 646c 3e3c 646c 3e3c 446c 3e3c  dl><dl><dl><Dl><
000001c0: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
000001d0: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
000001e0: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
000001f0: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000200: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000210: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000220: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000230: 646c 3e3c 646c 3e3c 646c 3e3c 646c 3e3c  dl><dl><dl><dl><
00000240: 646c 3e3c 646c 3e3c 646c 3e3c 7461 626c  dl><dl><dl><tabl
00000250: 653e 303c 6361 7074 696f 6e3e 3c64 743e  e>0<caption><dt>
00000260: 3c64 6c3e 3c64 6c3e 3c64 6c3e 3030       <dl><dl><dl>00

gdb --args w3m -T text/html -dump file

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff78800fe in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
(gdb) bt 30
#0  0x00007ffff78800fe in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#1  0x00007ffff7880116 in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#2  0x00007ffff7880116 in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#3  0x00007ffff7880116 in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#4  0x00007ffff7880116 in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#5  0x00007ffff7880116 in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#6  0x00007ffff7880116 in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#7  0x00007ffff7880116 in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#8  0x00007ffff7880116 in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#9  0x00007ffff7880116 in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#10 0x00007ffff7880116 in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#11 0x00007ffff787cdcc in GC_generic_malloc_many () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#12 0x00007ffff7885ab9 in GC_malloc () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#13 0x0000000000479960 in Strnew_charp (p=0x1232cc0 "<b>") at Str.c:67
#14 0x000000000041e0c6 in flushline (h_env=0x7fffffffb5e0, obuf=0x7fffffffb770, indent=-128, force=0, width=1) at file.c:2789
#15 0x000000000042c3ce in HTMLlineproc0 (line=0x190cf72 "", h_env=0x7fffffffb5e0, internal=1) at file.c:6615
#16 0x000000000042c3ec in HTMLlineproc0 (line=0x190cfc2 "", h_env=0x7fffffffb5e0, internal=1) at file.c:6619
#17 0x000000000042c3ec in HTMLlineproc0 (line=0x18fb002 "", h_env=0x7fffffffb5e0, internal=1) at file.c:6619
#18 0x000000000042c3ec in HTMLlineproc0 (line=0x18fb052 "", h_env=0x7fffffffb5e0, internal=1) at file.c:6619
#19 0x000000000042c3ec in HTMLlineproc0 (line=0x18fb092 "", h_env=0x7fffffffb5e0, internal=1) at file.c:6619
#20 0x000000000042c3ec in HTMLlineproc0 (line=0x18fb0e2 "", h_env=0x7fffffffb5e0, internal=1) at file.c:6619
#21 0x000000000042c3ec in HTMLlineproc0 (line=0x18fb122 "", h_env=0x7fffffffb5e0, internal=1) at file.c:6619
#22 0x000000000042c3ec in HTMLlineproc0 (line=0x18fb172 "", h_env=0x7fffffffb5e0, internal=1) at file.c:6619
#23 0x000000000042c3ec in HTMLlineproc0 (line=0x18fb1b2 "", h_env=0x7fffffffb5e0, internal=1) at file.c:6619
#24 0x000000000042c3ec in HTMLlineproc0 (line=0x18fb202 "", h_env=0x7fffffffb5e0, internal=1) at file.c:6619
#25 0x000000000042c3ec in HTMLlineproc0 (line=0x18fb242 "", h_env=0x7fffffffb5e0, internal=1) at file.c:6619
#26 0x000000000042c3ec in HTMLlineproc0 (line=0x18fb292 "", h_env=0x7fffffffb5e0, internal=1) at file.c:6619
#27 0x000000000042c3ec in HTMLlineproc0 (line=0x18fb2d2 "", h_env=0x7fffffffb5e0, internal=1) at file.c:6619
#28 0x000000000042c3ec in HTMLlineproc0 (line=0x18fb322 "", h_env=0x7fffffffb5e0, internal=1) at file.c:6619
#29 0x000000000042c3ec in HTMLlineproc0 (line=0x18fb362 "", h_env=0x7fffffffb5e0, internal=1) at file.c:6619
(More stack frames follow...)

found by afl-fuzz

heap-buffer-overflow read in wtf_strwidth()

input (xxd cases/tats-w3m-57)

00000000: 3c74 6162 6c65 3e26 ad                   <table>&.

how to reproduce:

ASAN_OPTIONS=abort_on_error=1:detect_leaks=0 LD_LIBRARY_PATH=./notgc ./w3m-tats.asan -T text/html -dump cases/tats-w3m-57

stderr:

=================================================================
==3694265==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300000d0e0 at pc 0x000000751c95 bp 0x7ffe85924da0 sp 0x7ffe85924d98
READ of size 1 at 0x60300000d0e0 thread T0
    #0 0x751c94 in wtf_strwidth /targets/w3m-tats/libwc/wtf.c:124:12
    #1 0x5f7879 in visible_length /targets/w3m-tats/table.c:476:28
    #2 0x616721 in feed_table /targets/w3m-tats/table.c:3248:23
    #3 0x591bcc in HTMLlineproc0 /targets/w3m-tats/file.c:6424:14
    #4 0x5a5a0c in loadHTMLstream /targets/w3m-tats/file.c:7252:2
    #5 0x55bbf8 in loadHTMLBuffer /targets/w3m-tats/file.c:6781:5
    #6 0x55ee64 in loadSomething /targets/w3m-tats/file.c:224:16
    #7 0x5535ac in loadGeneralFile /targets/w3m-tats/file.c:2241:6
    #8 0x4f9202 in main /targets/w3m-tats/main.c:1017:12
    #9 0x7fc71cfc3f44 in __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:287
    #10 0x41bf25 in _start (/w3m-tats.asan+0x41bf25)

0x60300000d0e0 is located 0 bytes to the right of 32-byte region [0x60300000d0c0,0x60300000d0e0)
allocated by thread T0 here:
    #0 0x4c6288 in __interceptor_malloc (/w3m-tats.asan+0x4c6288)
    #1 0x7fc71e711c21 in GC_malloc_atomic /notgc/notgc.c:275
    #2 0x5f6e4d in visible_length /targets/w3m-tats/table.c:428:18
    #3 0x616721 in feed_table /targets/w3m-tats/table.c:3248:23
    #4 0x591bcc in HTMLlineproc0 /targets/w3m-tats/file.c:6424:14
    #5 0x5a5a0c in loadHTMLstream /targets/w3m-tats/file.c:7252:2
    #6 0x55bbf8 in loadHTMLBuffer /targets/w3m-tats/file.c:6781:5
    #7 0x55ee64 in loadSomething /targets/w3m-tats/file.c:224:16
    #8 0x5535ac in loadGeneralFile /targets/w3m-tats/file.c:2241:6
    #9 0x4f9202 in main /targets/w3m-tats/main.c:1017:12
    #10 0x7fc71cfc3f44 in __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:287

SUMMARY: AddressSanitizer: heap-buffer-overflow /targets/w3m-tats/libwc/wtf.c:124:12 in wtf_strwidth
Shadow bytes around the buggy address:
  0x0c067fff99c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff99d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff99e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff99f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c067fff9a10: fa fa fa fa fa fa fa fa 00 00 00 00[fa]fa 00 00
  0x0c067fff9a20: 00 00 fa fa 00 00 00 00 fa fa 00 00 00 00 fa fa
  0x0c067fff9a30: 00 00 00 01 fa fa 00 00 00 fa fa fa 00 00 00 00
  0x0c067fff9a40: fa fa 00 00 00 00 fa fa 00 00 00 00 fa fa fd fd
  0x0c067fff9a50: fd fd fa fa 00 00 00 00 fa fa 00 00 00 00 fa fa
  0x0c067fff9a60: 00 00 00 fa fa fa 00 00 00 00 fa fa 00 00 02 fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==3694265==ABORTING

This is detected with help of dummy libgc wrapper. See http://github.com/kcwu/fuzzing-w3m/notgc for detail.
More detail to reproduce please see http://github.com/kcwu/fuzzing-w3m

For your convenience,
gdbline:
LD_LIBRARY_PATH=./notgc ASAN_OPTIONS=abort_on_error=1:detect_leaks=0 gdb --args ./w3m-tats.asan -T text/html -dump cases/tats-w3m-57

This is found by afl-fuzz.

segfault with incorrect form_int fid

$ echo "000<button value='\"><form_int fid=4'>00000000000000000000000000000000000000000000000000000000000000000000000000000"  | ./w3m -T text/html -dump
Program received signal SIGSEGV, Segmentation fault.
0x000000000042a90e in HTMLlineproc2body (buf=0x7cee00, feed=0x427fa1 <textlist_feed>, llimit=-1) at file.c:6117
6117            forms[form_id]->next = forms[form_id - 1];
(gdb) l 6116, 6117
6116        for (form_id = 1; form_id <= form_max; form_id++)
6117            forms[form_id]->next = forms[form_id - 1];
(gdb) p form_max
$4 = 4
(gdb) p form_id
$1 = 1
(gdb) p forms[1]
$2 = (FormList *) 0x0
(gdb) bt
#0  0x000000000042a90e in HTMLlineproc2body (buf=0x7cee00, feed=0x427fa1 <textlist_feed>, llimit=-1) at file.c:6117
#1  0x000000000042aba1 in HTMLlineproc2 (buf=0x7cee00, tl=0x7cc5e0) at file.c:6173
#2  0x000000000042dd6e in loadHTMLstream (f=0x7fffffffd120, newBuf=0x7cee00, src=0x0, internal=0) at file.c:7258
#3  0x000000000042c597 in loadHTMLBuffer (f=0x7fffffffd120, newBuf=0x7cee00) at file.c:6755
#4  0x0000000000416a40 in loadSomething (f=0x7fffffffd120, loadproc=0x42c4b2 <loadHTMLBuffer>, defaultbuf=0x7cee00) at file.c:224
#5  0x000000000041c7e6 in loadGeneralFile (path=0x7c3ae0 "/tmp/zshrj3HcP", current=0x0, referer=0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>, flag=0, request=0x0) at file.c:2241
#6  0x00000000004070d1 in main (argc=5, argv=0x7fffffffd448, envp=0x7fffffffd478) at main.c:1020

this is found by afl-fuzz

crash due to `bcopy` with negative size

How to reproduce

$ echo -e '<table><title><listing><body><table></internal>00000/000\n<td>000000<textarea rows=2>' | ./w3m -T text/html -dump
Program received signal SIGSEGV, Segmentation fault.
__memmove_ssse3_back () at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:1653
1653    ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S: No such file or directory.
(gdb) bt
#0  __memmove_ssse3_back () at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:1653
#1  0x000000000044f003 in form_update_line (line=0x7d0ea0, str=0x7fffffffcc80, spos=10, epos=30, width=5, newline=1, password=0) at form.c:399
#2  0x000000000044f480 in formUpdateBuffer (a=0x7e7000, buf=0x7cee00, form=0x7e6f80) at form.c:482
#3  0x000000000044ea69 in formResetBuffer (buf=0x7cee00, formitem=0x7e4bc0) at form.c:268
#4  0x000000000042c5eb in loadHTMLBuffer (f=0x7fffffffd120, newBuf=0x7cee00) at file.c:6761
#5  0x0000000000416a40 in loadSomething (f=0x7fffffffd120, loadproc=0x42c4b2 <loadHTMLBuffer>, defaultbuf=0x7cee00) at file.c:224
#6  0x000000000041c7e6 in loadGeneralFile (path=0x7c3ae0 "", current=0x0, referer=0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>, flag=0, request=0x0) at file.c:2241
#7  0x00000000004070d1 in main (argc=5, argv=0x7fffffffd448, envp=0x7fffffffd478) at main.c:1020
(gdb) up
#1  0x000000000044f003 in form_update_line (line=0x7d0ea0, str=0x7fffffffcc80, spos=10, epos=30, width=5, newline=1, password=0) at form.c:399
399         bcopy((void *)&line->lineBuf[epos], (void *)&buf[pos],
(gdb) l
394             if (*p == '\n')
395                 p++;
396         }
397         *str = p;
398
399         bcopy((void *)&line->lineBuf[epos], (void *)&buf[pos],
400               (line->len - epos) * sizeof(char));
401         bcopy((void *)&line->propBuf[epos], (void *)&prop[pos],
402               (line->len - epos) * sizeof(Lineprop));
403         line->lineBuf = buf;
(gdb) p line->len
$1 = 15
(gdb) p epos
$2 = 30

crash because bcopy with negative size.

this is found by afl-fuzz

NULL pointer dereference in Strcat_charp_n (Str.c)

# w3m -version
w3m version w3m/0.5.3+git20161120, options lang=en,m17n,image,color,ansi-color,mouse,menu,cookie,ssl,ssl-verify,external-uri-loader,w3mmailer,nntp,ipv6,alarm,mark
# w3m -T text/html -dump $FILE
ASAN:DEADLYSIGNAL
=================================================================
==21198==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f2dbc9c6b50 bp 0x7ffc74d43d70 sp 0x7ffc74d43d28 T0)
==21198==The signal is caused by a READ memory access.
==21198==Hint: address points to the zero page.
    #0 0x7f2dbc9c6b4f  /var/tmp/portage/sys-libs/glibc-2.22-r4/work/glibc-2.22/string/../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:79
    #1 0x71685a in Strcat_charp_n /tmp/w3m-0.5.3-git20161120/Str.c:199:5
    #2 0x6184df in align /tmp/w3m-0.5.3-git20161120/table.c:578:2
    #3 0x618d3d in print_item /tmp/w3m-0.5.3-git20161120/table.c:615:2
    #4 0x62fa8d in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1944:4
    #5 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #6 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #7 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #8 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #9 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #10 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #11 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #12 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #13 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #14 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #15 0x5b2020 in HTMLlineproc0 /tmp/w3m-0.5.3-git20161120/file.c:6452:3
    #16 0x5c1dc6 in completeHTMLstream /tmp/w3m-0.5.3-git20161120/file.c:7022:2
    #17 0x5663b3 in loadHTMLstream /tmp/w3m-0.5.3-git20161120/file.c:7258:5
    #18 0x57aebd in loadHTMLBuffer /tmp/w3m-0.5.3-git20161120/file.c:6781:5
    #19 0x56fadf in loadSomething /tmp/w3m-0.5.3-git20161120/file.c:224:16
    #20 0x56fadf in loadGeneralFile /tmp/w3m-0.5.3-git20161120/file.c:2241
    #21 0x5106a5 in main /tmp/w3m-0.5.3-git20161120/main.c:1017:12
    #22 0x7f2dbc8b261f in __libc_start_main /var/tmp/portage/sys-libs/glibc-2.22-r4/work/glibc-2.22/csu/libc-start.c:289
    #23 0x41bd08 in _start (/usr/bin/w3m+0x41bd08)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /var/tmp/portage/sys-libs/glibc-2.22-r4/work/glibc-2.22/string/../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:79 
==21198==ABORTING

Testcase: https://github.com/asarubbo/poc/blob/master/00086-w3m-nullptr-Strcat_charp_n

heap buffer out-of-bounds write in addMultirowsForm()

input

00000000: 3c74 6162 6c65 3e30 3c62 7574 746f 6e20  <table>0<button
00000010: 7661 6c75 653d 2722 3e30 3030 3030 3030  value='">0000000
00000020: 3030 3027 3e30 3030 3030 3030 3030 3030  000'>00000000000
00000030: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000040: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000050: 3030 3030 3030 3030 3030 3030 3030 3c74  00000000000000<t
00000060: 6578 7461 7265 6120 726f 7773 3d32 3030  extarea rows=200
00000070: 3030 3e                                  00>
Program received signal SIGSEGV, Segmentation fault.
0x0000000000438b33 in calcPosition (l=0x9c1500 " ]          [         ", pr=0x5b009c7f60, len=22, pos=1, bpos=0, mode=0) at etc.c:515
515         if (pr[i] & PC_WCHAR2) {
(gdb) p pr
$1 = (Lineprop *) 0x5b009c7f60
(gdb) p i
$2 = 0
(gdb) p pr[i]
Cannot access memory at address 0x5b009c7f60

pr is corrupted. Note the correct address of pr is 0x9c7f60 but its highest byte is overwritten by 0x5b=='['

This is because buffer overflows earlier in addMultirowsForm (several times). With following assertion, it can catch the overflow easier.

diff --git a/anchor.c b/anchor.c
index 2d21bfa..3eb30d2 100644
--- a/anchor.c
+++ b/anchor.c
@@ -1,4 +1,5 @@
 /* $Id: anchor.c,v 1.33 2006/04/08 11:33:16 inu Exp $ */
+#include <assert.h>
 #include "fm.h"
 #include "myctype.h"
 #include "regex.h"
@@ -686,6 +687,7 @@ addMultirowsForm(Buffer *buf, AnchorList *al)
            a->y = a_form.y;
            a->end.pos = pos + ecol - col;
            l->lineBuf[pos - 1] = '[';
+           assert(a->end.pos < l->size);
            l->lineBuf[a->end.pos] = ']';
            for (k = pos; k < a->end.pos; k++)
                l->propBuf[k] |= PE_FORM;
w3m: anchor.c:690: addMultirowsForm: Assertion `a->end.pos < l->size' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff6c75c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff6c75c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff6c79028 in __GI_abort () at abort.c:89
#2  0x00007ffff6c6ebf6 in __assert_fail_base (fmt=0x7ffff6dbf3b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x49cfc2 "a->end.pos < l->size",
    file=file@entry=0x49cfb9 "anchor.c", line=line@entry=690, function=function@entry=0x49d0f0 <__PRETTY_FUNCTION__.19282> "addMultirowsForm") at assert.c:92
#3  0x00007ffff6c6eca2 in __GI___assert_fail (assertion=0x49cfc2 "a->end.pos < l->size", file=0x49cfb9 "anchor.c", line=690,
    function=0x49d0f0 <__PRETTY_FUNCTION__.19282> "addMultirowsForm") at assert.c:101
#4  0x0000000000475a5c in addMultirowsForm (buf=0x7d3e00, al=0x7ea7c0) at anchor.c:690
#5  0x000000000042acde in HTMLlineproc2body (buf=0x7d3e00, feed=0x4282b8 <textlist_feed>, llimit=-1) at file.c:6142
#6  0x000000000042aeeb in HTMLlineproc2 (buf=0x7d3e00, tl=0x7cc2e0) at file.c:6195
#7  0x000000000042e0d5 in loadHTMLstream (f=0x7fffffffd130, newBuf=0x7d3e00, src=0x0, internal=0) at file.c:7281
#8  0x000000000042c8fe in loadHTMLBuffer (f=0x7fffffffd130, newBuf=0x7d3e00) at file.c:6778
#9  0x0000000000416ae0 in loadSomething (f=0x7fffffffd130, loadproc=0x42c7fc <loadHTMLBuffer>, defaultbuf=0x7d3e00) at file.c:224
#10 0x000000000041c8fe in loadGeneralFile (path=0x7bdf00 "triage.debug/min/3", current=0x0, referer=0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>,
    flag=0, request=0x0) at file.c:2245
#11 0x0000000000407171 in main (argc=5, argv=0x7fffffffd458, envp=0x7fffffffd488) at main.c:1020
(gdb) frame 4
#4  0x0000000000475a5c in addMultirowsForm (buf=0x7d3e00, al=0x7ea7c0) at anchor.c:690
690                 assert(a->end.pos < l->size);
(gdb) p a->end.pos
$1 = 33
(gdb) p l->size
$2 = 22

This is found by afl-fuzz.

Null pointer dereference with input_alt tag

Null pointer dereference

$  echo -e '<table>000000000000<b<>\x00<listing><input_alt>0' |  w3m -T text/html -dump
Program received signal SIGSEGV, Segmentation fault.
flushline (h_env=0x7fffffffc2f0, obuf=0x7fffffffc480, indent=0, force=0, width=12) at file.c:3025
3025            tmp = Sprintf("<INPUT_ALT hseq=\"%d\" fid=\"%d\" name=\"%s\" type=\"%s\" value=\"%s\">",
(gdb) l
3020        }
3021        if (!hidden_input && obuf->input_alt.in) {
3022            Str tmp;
3023            if (obuf->input_alt.hseq > 0)
3024                obuf->input_alt.hseq = - obuf->input_alt.hseq;
3025            tmp = Sprintf("<INPUT_ALT hseq=\"%d\" fid=\"%d\" name=\"%s\" type=\"%s\" value=\"%s\">",
3026                         obuf->input_alt.hseq,
3027                         obuf->input_alt.fid,
3028                         obuf->input_alt.name->ptr,
3029                         obuf->input_alt.type->ptr,
(gdb) p obuf->input_alt
$1 = {
  hseq = 0, 
  fid = -1, 
  in = 1, 
  type = 0x0, 
  name = 0x0, 
  value = 0x0
}
(gdb) bt
#0  flushline (h_env=0x7fffffffc2f0, obuf=0x7fffffffc480, indent=0, force=0, width=12) at file.c:3025
#1  0x000000000042bf99 in HTMLlineproc0 (line=0x7c3a72 "", h_env=0x7fffffffc2f0, internal=1) at file.c:6591
#2  0x00000000004423d3 in do_refill (tbl=0x7bf000, row=0, col=0, maxlimit=79) at table.c:798
#3  0x000000000044627f in renderTable (t=0x7bf000, max_width=79, h_env=0x7fffffffcb00) at table.c:1800
#4  0x000000000042b617 in HTMLlineproc0 (line=0x494fe1 "", h_env=0x7fffffffcb00, internal=1) at file.c:6426
#5  0x000000000042d1a8 in completeHTMLstream (h_env=0x7fffffffcb00, obuf=0x7fffffffcc90) at file.c:6995
#6  0x000000000042dbb3 in loadHTMLstream (f=0x7fffffffd120, newBuf=0x7cee00, src=0x0, internal=0) at file.c:7227
#7  0x000000000042c597 in loadHTMLBuffer (f=0x7fffffffd120, newBuf=0x7cee00) at file.c:6755
#8  0x0000000000416a40 in loadSomething (f=0x7fffffffd120, loadproc=0x42c4b2 <loadHTMLBuffer>, defaultbuf=0x7cee00) at file.c:224
#9  0x000000000041c7e6 in loadGeneralFile (path=0x7bdf00 "triage.debug/min/75", current=0x0, referer=0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>, flag=0, request=0x0) at file.c:2241
#10 0x00000000004070d1 in main (argc=5, argv=0x7fffffffd448, envp=0x7fffffffd478) at main.c:1020

this is found by afl-fuzz

dereference near-null pointer in formUpdateBuffer

input

00000000: 3c74 6162 6c65 3e30 3c6e 6f62 722f 3c3e  <table>0<nobr/<>
00000010: 303c 786d 703e 3c69 6e74 6572 6e61 6c3e  0<xmp><internal>
00000020: 3c73 656c 6563 7420 6d75 6c74 6970 6c65  <select multiple
00000030: 3e3c 6f70 7469 6f6e 3e                   ><option>

gdb --args ./w3m -T text/html -dump file

Program received signal SIGSEGV, Segmentation fault.
0x000000000044f741 in formUpdateBuffer (a=0x7e1000, buf=0x7d4e00, form=0x7e0f80) at form.c:445
445             if (spos >= buf->currentLine->len || spos < 0)
(gdb) p buf
$1 = (Buffer *) 0x7d4e00
(gdb) p buf->currentLine
$2 = (Line *) 0x0
(gdb) bt
#0  0x000000000044f741 in formUpdateBuffer (a=0x7e1000, buf=0x7d4e00, form=0x7e0f80) at form.c:445
#1  0x000000000044f04c in formResetBuffer (buf=0x7d4e00, formitem=0x7db500) at form.c:272
#2  0x000000000042c9cc in loadHTMLBuffer (f=0x7fffffffca80, newBuf=0x7d4e00) at file.c:6781
#3  0x0000000000416ae0 in loadSomething (f=0x7fffffffca80, loadproc=0x42c85e <loadHTMLBuffer>, defaultbuf=0x7d4e00) at file.c:224
#4  0x000000000041c952 in loadGeneralFile (path=0x7c4b00 "min/32", current=0x0, referer=0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>, flag=0, request=0x0) at file.c:2241
#5  0x0000000000407171 in main (argc=5, argv=0x7fffffffcda8, envp=0x7fffffffcdd8) at main.c:1020

found by afl-fuzz

deref null pointer in shiftAnchorPosition()

input

00000000: 3c74 6162 6c65 3e30 3c62 7220 3c3e 303c  <table>0<br <>0<
00000010: 786d 703e c8ab 3c64 6976 3e3c 696e 7465  xmp>..<div><inte
00000020: 526e 616c 3e3c 696e 7075 745f 616c 7420  Rnal><input_alt
00000030: 6669 643d 303e 3c64 6c3e 303c 666f 726d  fid=0><dl>0<form
00000040: 3e                                       >
Program received signal SIGSEGV, Segmentation fault.
0x0000000000475b0e in shiftAnchorPosition (al=0x7db2c0, hl=0x0, line=3, pos=0, shift=-1) at anchor.c:554
554                 if (hl->marks[a->hseq].line == line)
(gdb) p hl
$1 = (HmarkerList *) 0x0

found by afl-fuzz

global-buffer-overflow in parseURL()

input

00000000: 3c41 2068 7265 663d 2f2f 3e30 3030 3030  <A href=//>00000
00000010: 3030 3c62 6173 6520 6872 6566 3d3a 3e30  00<base href=:>0
00000020: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000030: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000040: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000050: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000060: 3030 3030 3030 3030                      00000000

build with Address sanitizer. the run result:

==1331653==ERROR: AddressSanitizer: global-buffer-overflow on address 0x0000007903fc at pc 0x0000006a2b14 bp 0x7ffda749a3b0 sp 0x7ffda749a3a8
READ of size 4 at 0x0000007903fc thread T0
    #0 0x6a2b13 in parseURL /home/kcwu/w3m/url.c:844:16
    #1 0x6a43db in parseURL2 /home/kcwu/w3m/url.c:999:5
    #2 0x6b0ec0 in url_to_charset /home/kcwu/w3m/url.c:2278:2
    #3 0x6b0ec0 in url_encode /home/kcwu/w3m/url.c:2293
    #4 0x5a83e3 in HTMLlineproc2body /home/kcwu/w3m/file.c:5684:8
    #5 0x5afe54 in HTMLlineproc2 /home/kcwu/w3m/file.c:6198:5
    #6 0x5afe54 in loadHTMLstream /home/kcwu/w3m/file.c:7289
    #7 0x56b9ec in loadHTMLBuffer /home/kcwu/w3m/file.c:6781:5
    #8 0x560a80 in loadSomething /home/kcwu/w3m/file.c:224:16
    #9 0x560a80 in loadGeneralFile /home/kcwu/w3m/file.c:2241
    #10 0x4f901a in main /home/kcwu/w3m/main.c:1020:12
    #11 0x7f3210f82f44 in __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:287
    #12 0x41c095 in _start (/home/kcwu/w3m/w3m+0x41c095)

0x0000007903fc is located 36 bytes to the left of global variable '<string literal>' defined in 'url.c:1747:10' (0x790420) of size 6
  '<string literal>' is ascii string 'url.c'
0x0000007903fc is located 5 bytes to the right of global variable '<string literal>' defined in 'url.c:1747:10' (0x7903e0) of size 23
  '<string literal>' is ascii string 'isprint(w3m_reqlog[0])'
SUMMARY: AddressSanitizer: global-buffer-overflow /home/kcwu/w3m/url.c:844:16 in parseURL
Shadow bytes around the buggy address:
  0x0000800ea020: f9 f9 f9 f9 01 f9 f9 f9 f9 f9 f9 f9 06 f9 f9 f9
  0x0000800ea030: f9 f9 f9 f9 02 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9
  0x0000800ea040: f9 f9 f9 f9 05 f9 f9 f9 f9 f9 f9 f9 05 f9 f9 f9
  0x0000800ea050: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9
  0x0000800ea060: f9 f9 f9 f9 02 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9
=>0x0000800ea070: f9 f9 f9 f9 02 f9 f9 f9 f9 f9 f9 f9 00 00 07[f9]
  0x0000800ea080: f9 f9 f9 f9 06 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
  0x0000800ea090: 00 00 00 00 00 00 00 00 00 00 00 07 f9 f9 f9 f9
  0x0000800ea0a0: 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 f9 f9 f9 f9 f9
  0x0000800ea0b0: 00 00 00 04 f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9
  0x0000800ea0c0: 00 03 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 02
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==1331653==ABORTING
#11 0x0000000000463cfc in parseURL (url=0xd8d750 "//", p_url=0x7ffc377805d0, current=0xda2de0) at url.c:844
844             p_url->port = DefaultPort[p_url->scheme];
(rr) p p_url->scheme
$1 = 255

p_url->scheme=255=SCM_UNKNOWN, but length of DefaultPort is 13 or 14.

This is found by afl-fuzz.

heap-buffer-overflow read in feed_table_tag()

input (xxd cases/tats-w3m-60)

00000000: 3c74 6162 6c65 3e3c 7468 2063 6f6c 7370  <table><th colsp
00000010: 616e 3d39 3030 3e3c 7468 3e              an=900><th>

how to reproduce:

ASAN_OPTIONS=abort_on_error=1:detect_leaks=0 LD_LIBRARY_PATH=./notgc ./w3m-tats.asan -T text/html -dump cases/tats-w3m-60

stderr:

=================================================================
==3694446==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61500000fc80 at pc 0x000000618fc4 bp 0x7ffe43efa570 sp 0x7ffe43efa568
READ of size 2 at 0x61500000fc80 thread T0
    #0 0x618fc3 in feed_table_tag /targets/w3m-tats/table.c:2595:9
    #1 0x6145d5 in feed_table /targets/w3m-tats/table.c:3145:14
    #2 0x591bcc in HTMLlineproc0 /targets/w3m-tats/file.c:6424:14
    #3 0x5a5a0c in loadHTMLstream /targets/w3m-tats/file.c:7252:2
    #4 0x55bbf8 in loadHTMLBuffer /targets/w3m-tats/file.c:6781:5
    #5 0x55ee64 in loadSomething /targets/w3m-tats/file.c:224:16
    #6 0x5535ac in loadGeneralFile /targets/w3m-tats/file.c:2241:6
    #7 0x4f9202 in main /targets/w3m-tats/main.c:1017:12
    #8 0x7f1bf5abef44 in __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:287
    #9 0x41bf25 in _start (/w3m-tats.asan+0x41bf25)

0x61500000fc80 is located 0 bytes to the right of 512-byte region [0x61500000fa80,0x61500000fc80)
allocated by thread T0 here:
    #0 0x4c6288 in __interceptor_malloc (/w3m-tats.asan+0x4c6288)
    #1 0x7f1bf720cc21 in GC_malloc_atomic /notgc/notgc.c:275
    #2 0x618e1c in feed_table_tag /targets/w3m-tats/table.c:2594:2
    #3 0x6145d5 in feed_table /targets/w3m-tats/table.c:3145:14
    #4 0x591bcc in HTMLlineproc0 /targets/w3m-tats/file.c:6424:14
    #5 0x5a5a0c in loadHTMLstream /targets/w3m-tats/file.c:7252:2
    #6 0x55bbf8 in loadHTMLBuffer /targets/w3m-tats/file.c:6781:5
    #7 0x55ee64 in loadSomething /targets/w3m-tats/file.c:224:16
    #8 0x5535ac in loadGeneralFile /targets/w3m-tats/file.c:2241:6
    #9 0x4f9202 in main /targets/w3m-tats/main.c:1017:12
    #10 0x7f1bf5abef44 in __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:287

SUMMARY: AddressSanitizer: heap-buffer-overflow /targets/w3m-tats/table.c:2595:9 in feed_table_tag
Shadow bytes around the buggy address:
  0x0c2a7fff9f40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c2a7fff9f50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2a7fff9f60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2a7fff9f70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2a7fff9f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c2a7fff9f90:[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c2a7fff9fa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2a7fff9fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2a7fff9fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2a7fff9fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa fa
  0x0c2a7fff9fe0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==3694446==ABORTING

This is detected with help of dummy libgc wrapper. See http://github.com/kcwu/fuzzing-w3m/notgc for detail.
More detail to reproduce please see http://github.com/kcwu/fuzzing-w3m

For your convenience,
gdbline:
LD_LIBRARY_PATH=./notgc ASAN_OPTIONS=abort_on_error=1:detect_leaks=0 gdb --args ./w3m-tats.asan -T text/html -dump cases/tats-w3m-60

This is found by afl-fuzz.

Null pointer dereference in formUpdateBuffer

Input file

00000000: 3c62 7574 746f 6e20 7661 6c75 653d 2722  <button value='"
00000010: 3e3c 696e 7075 745f 616c 7420 6669 643d  ><input_alt fid=
00000020: 3020 7479 7065 3d27 3e3c 2f68 323e       0 type='></h2>

gdb log

Program received signal SIGSEGV, Segmentation fault.
0x000000000044f6dd in formUpdateBuffer (a=0x7de000, buf=0x7d3e00, form=0x7dde00) at form.c:462
462                 p = form->value->ptr;
(gdb) p form
$1 = (FormItemList *) 0x7dde00
(gdb) p form->value
$2 = (Str) 0x0
(gdb) bt
#0  0x000000000044f6dd in formUpdateBuffer (a=0x7de000, buf=0x7d3e00, form=0x7dde00) at form.c:462
#1  0x000000000044ef56 in formResetBuffer (buf=0x7d3e00, formitem=0x7d8940) at form.c:272
#2  0x000000000042c952 in loadHTMLBuffer (f=0x7fffffffd110, newBuf=0x7d3e00) at file.c:6784
#3  0x0000000000416ae0 in loadSomething (f=0x7fffffffd110, loadproc=0x42c7fc <loadHTMLBuffer>, defaultbuf=0x7d3e00) at file.c:224
#4  0x000000000041c8fe in loadGeneralFile (path=0x7c3b00 "min/1", current=0x0, referer=0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>, flag=0, request=0x0) at file.c:2245
#5  0x0000000000407171 in main (argc=5, argv=0x7fffffffd438, envp=0x7fffffffd468) at main.c:1020

found by afl-fuzz

NULL pointer dereference in Strnew_size (Str.c)

# w3m -version
w3m version w3m/0.5.3+git20161120, options lang=en,m17n,image,color,ansi-color,mouse,menu,cookie,ssl,ssl-verify,external-uri-loader,w3mmailer,nntp,ipv6,alarm,mark
# w3m -T text/html -dump $FILE
ASAN:DEADLYSIGNAL
=================================================================
==21899==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fa05d9afd49 bp 0x7ffc9e6796d0 sp 0x7ffc9e6796a0 T0)
==21899==The signal is caused by a READ memory access.
==21899==Hint: address points to the zero page.
    #0 0x7fa05d9afd48 in GC_malloc /tmp/portage/dev-libs/boehm-gc-7.4.2/work/gc-7.4.2/malloc.c:272
    #1 0x715019 in Strnew_size /tmp/w3m-0.5.3-git20161120/Str.c:50:13
    #2 0x583f2f in flushline /tmp/w3m-0.5.3-git20161120/file.c:2973:18
    #3 0x57f89e in push_render_image /tmp/w3m-0.5.3-git20161120/file.c:2625:2
    #4 0x62fd6b in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1956:6
    #5 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #6 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #7 0x5b2020 in HTMLlineproc0 /tmp/w3m-0.5.3-git20161120/file.c:6452:3
    #8 0x5c1dc6 in completeHTMLstream /tmp/w3m-0.5.3-git20161120/file.c:7022:2
    #9 0x5663b3 in loadHTMLstream /tmp/w3m-0.5.3-git20161120/file.c:7258:5
    #10 0x57aebd in loadHTMLBuffer /tmp/w3m-0.5.3-git20161120/file.c:6781:5
    #11 0x56fadf in loadSomething /tmp/w3m-0.5.3-git20161120/file.c:224:16
    #12 0x56fadf in loadGeneralFile /tmp/w3m-0.5.3-git20161120/file.c:2241
    #13 0x5106a5 in main /tmp/w3m-0.5.3-git20161120/main.c:1017:12
    #14 0x7fa05c3ea61f in __libc_start_main /var/tmp/portage/sys-libs/glibc-2.22-r4/work/glibc-2.22/csu/libc-start.c:289
    #15 0x41bd08 in _start (/usr/bin/w3m+0x41bd08)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /tmp/portage/dev-libs/boehm-gc-7.4.2/work/gc-7.4.2/malloc.c:272 in GC_malloc
==21899==ABORTING

Testcase: https://github.com/asarubbo/poc/blob/master/00087-w3m-nullptr-Strnew_size

stackoverflow in deleteFrameSet() on malform input

How to reproduce:

echo '<U><frameset><frameset>0000000000000000000000000<button type=>000<i></button>00000000000000000000000000000000000000000000' | ./w3m -T text/html -dump

ASAN output:

ASAN:SIGSEGV
=================================================================
==3819287==ERROR: AddressSanitizer: stack-overflow on address 0x7ffd7abd6ff8 (pc 0x0000006331a5 bp 0x0000006b65e5 sp 0x7ffd7abd7000 T0)
    #0 0x6331a4  (/w3m/run/w3m.afl-asan+0x6331a4)
    #1 0x633330  (/w3m/run/w3m.afl-asan+0x633330)
    #2 0x633330  (/w3m/run/w3m.afl-asan+0x633330)
last line repeats....

This is found by afl-fuzz.

heap corruption due to integer overflow in renderTable()

This bug is interesting since it triggered libgc's issue ivmai/bdwgc#135 as well.

How to reproduce

$ echo '<table width=333330000%><table width=300000><textarea rows=2>' > file.in
$ ./w3m -T text/html -dump file.in

gdb

(gdb) b Strnew_size if n < 0
Breakpoint 1 at 0x4794b8: file Str.c, line 50.
(gdb) r

Breakpoint 1, Strnew_size (n=-3414) at Str.c:50
50          Str x = GC_MALLOC(sizeof(struct _Str));
(gdb) n
51          x->ptr = GC_MALLOC_ATOMIC(n + 1);
(gdb) n
52          x->ptr[0] = '\0';
(gdb) p x->ptr
$1 = 0x7df000 ""

This demonstrate libgc's bug. n+1 == -3413. libgc treat it as unsigned long == 18446744073709548203. The allocation should be failed (either return NULL or abort the program). But it returns 0x7df000.

If continue to run

(gdb) c
Continuing.
Duplicate large block deallocation

Program received signal SIGABRT, Aborted.
0x00007ffff6c70c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.

With further investigation, w3m's negative size comes from table.c, renderTable(), line 1733

1733            t->tabwidth[0] = max_width;

where max_width=5632662 but tabwidth[0] is short. After assignment, tabwidth[0]=-3434

found by afl-fuzz

malform html tag may crash w3m

How to reproduce

 echo '0000000000000000000000000000000000000000000000000000000000000>000000000000000000<button type=>0<i></button><div>0' | ./w3m -T text/html -dump

gdb log

Program received signal SIGSEGV, Segmentation fault.
0x0000000000473b09 in onAnchor (a=0x3030303030, line=2, pos=18) at anchor.c:109
109         if (bpcmp(bp, a->start) < 0)
(gdb) p a
$1 = (Anchor *) 0x3030303030
(gdb) bt
#0  0x0000000000473b09 in onAnchor (a=0x3030303030, line=2, pos=18) at anchor.c:109
#1  0x0000000000474f53 in shiftAnchorPosition (al=0x7d5c40, hl=0x7d5ca0, line=2, pos=18, shift=18) at anchor.c:538
#2  0x000000000044f54f in formUpdateBuffer (a=0x7d9000, buf=0x7cde00, form=0x7d8f80) at form.c:490
#3  0x000000000044ea36 in formResetBuffer (buf=0x7cde00, formitem=0x7d5c40) at form.c:268
#4  0x000000000042c5b8 in loadHTMLBuffer (f=0x7fffffffd140, newBuf=0x7cde00) at file.c:6752
#5  0x0000000000416a40 in loadSomething (f=0x7fffffffd140, loadproc=0x42c47f <loadHTMLBuffer>, defaultbuf=0x7cde00) at file.c:224
#6  0x000000000041c7e6 in loadGeneralFile (path=0x7bcf00 "button-type.html", current=0x0, referer=0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>, flag=0, request=0x0) at file.c:2241
#7  0x00000000004070d1 in main (argc=3, argv=0x7fffffffd468, envp=0x7fffffffd488) at main.c:1017

Looks like something overflow and overwrite "a" pointer.

This is found by afl-fuzz

invalid read in Strnew (Str.c)

# w3m -version
w3m version w3m/0.5.3+git20161120, options lang=en,m17n,image,color,ansi-color,mouse,menu,cookie,ssl,ssl-verify,external-uri-loader,w3mmailer,nntp,ipv6,alarm,mark
# w3m -T text/html -dump $FILE
ASAN:DEADLYSIGNAL
=================================================================
==15200==ERROR: AddressSanitizer: SEGV on unknown address 0x000000202020 (pc 0x7f8c02463ccd bp 0x7ffc6c1cb8d0 sp 0x7ffc6c1cb8b0 T0)
==15200==The signal is caused by a READ memory access.
    #0 0x7f8c02463ccc in GC_malloc_atomic /tmp/portage/dev-libs/boehm-gc-7.4.2/work/gc-7.4.2/malloc.c:242
    #1 0x714f33 in Strnew /tmp/w3m-0.5.3-git20161120/Str.c:40:14
    #2 0x618132 in align /tmp/w3m-0.5.3-git20161120/table.c:565:11
    #3 0x618d3d in print_item /tmp/w3m-0.5.3-git20161120/table.c:615:2
    #4 0x62fa8d in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1944:4
    #5 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #6 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #7 0x5b2020 in HTMLlineproc0 /tmp/w3m-0.5.3-git20161120/file.c:6452:3
    #8 0x5660f6 in loadHTMLstream /tmp/w3m-0.5.3-git20161120/file.c:7252:2
    #9 0x57aebd in loadHTMLBuffer /tmp/w3m-0.5.3-git20161120/file.c:6781:5
    #10 0x56fadf in loadSomething /tmp/w3m-0.5.3-git20161120/file.c:224:16
    #11 0x56fadf in loadGeneralFile /tmp/w3m-0.5.3-git20161120/file.c:2241
    #12 0x5106a5 in main /tmp/w3m-0.5.3-git20161120/main.c:1017:12
    #13 0x7f8c00e9e61f in __libc_start_main /var/tmp/portage/sys-libs/glibc-2.22-r4/work/glibc-2.22/csu/libc-start.c:289
    #14 0x41bd08 in _start (/usr/bin/w3m+0x41bd08)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /tmp/portage/dev-libs/boehm-gc-7.4.2/work/gc-7.4.2/malloc.c:242 in GC_malloc_atomic
==15200==ABORTING

Testcase: https://github.com/asarubbo/poc/blob/master/00088-w3m-invalidread-Strnew

global-buffer-overflow in wc_any_to_ucs()

00000000: 3c6d 6574 6120 6368 6172 7365 743d 7669  <meta charset=vi
00000010: 7363 6969 3e3c 7461 626c 653e 3c62 3c3e  scii><table><b<>
00000020: 003c 6c69 7374 696e 673e 3c74 6162 6c65  .<listing><table
00000030: 3e3c 7468 3e30 3030 3030 3030 0a30 3030  ><th>0000000.000
00000040: 3030 3030 3030 3030 3030 3030 3030 0430  00000000000000.0
00000050: 3030 3030 3030 3020 3030 3030 3082 3030  0000000 00000.00
00000060: ffff e530 3030 3030 3c74 643e 303c 7461  ...00000<td>0<ta
00000070: 626c 653e 3c74 643e 303c 7072 653e 3030  ble><td>0<pre>00
00000080: 3030 3030 3030 303c 6973 696e 6465 783e  0000000<isindex>
00000090: 3030 3030 3030 3002 3030 3030 3030 3c74  0000000.000000<t
000000a0: 643e 303c 7461 626c 653e 303c 7461 626c  d>0<table>0<tabl
000000b0: 653e 3d30 3030 3030 3030 3030 3030 3030  e>=0000000000000
000000c0: 3030 3c2f 696e 7465 726e 616c 3e30 3030  00</internal>000
000000d0: 3030 3030 3030 3030 3030 3030 3c70 3e30  000000000000<p>0

how to reproduce

  1. build w3m with AddressSanitizer (-fsanitize=address)
  2. w3m.asan -T text/html -dump file

Asan output

==3207532==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000000aef600 at pc 0x000000735d1b bp 0x7fff1be69eb0 sp 0x7fff1be69ea8
READ of size 2 at 0x000000aef600 thread T0
    #0 0x735d1a in wc_any_to_ucs /home/kcwu/w3m/libwc/ucs.c:281:15
    #1 0x74dc71 in wc_push_to_utf8 /home/kcwu/w3m/libwc/utf8.c:276:14
    #2 0x6fdcbd in wc_conv_to_ces /home/kcwu/w3m/libwc/conv.c:93:6
    #3 0x6fcf70 in wc_Str_conv /home/kcwu/w3m/libwc/conv.c:23:9
    #4 0x5a90ee in _saveBuffer /home/kcwu/w3m/file.c:7654:8
    #5 0x5a8e2a in saveBuffer /home/kcwu/w3m/file.c:7672:5
    #6 0x4fdca8 in do_dump /home/kcwu/w3m/main.c:1360:2
    #7 0x4f9c7a in main /home/kcwu/w3m/main.c:1066:6
    #8 0x7ffb4e428f44 in __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:287
    #9 0x41bfc5 in _start (/home/kcwu/w3m/w3m.asan+0x41bfc5)

0x000000aef600 is located 32 bytes to the left of global variable 'vps1_ucs_map' defined in './map/vps_ucs.map:3:18' (0xaef620) of size 256
0x000000aef600 is located 0 bytes to the right of global variable 'viscii112_ucs_map' defined in './map/viscii11_ucs.map:22:18' (0xaef5c0) of size 64
SUMMARY: AddressSanitizer: global-buffer-overflow /home/kcwu/w3m/libwc/ucs.c:281:15 in wc_any_to_ucs
Shadow bytes around the buggy address:
  0x000080155e70: 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9
  0x000080155e80: f9 f9 f9 f9 00 00 00 00 00 00 00 00 f9 f9 f9 f9
  0x000080155e90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080155ea0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080155eb0: f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00
=>0x000080155ec0:[f9]f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080155ed0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080155ee0: 00 00 00 00 f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
  0x000080155ef0: 00 00 00 00 f9 f9 f9 f9 00 00 00 00 00 00 00 00
  0x000080155f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080155f10: 00 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 f9 f9
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==3207532==ABORTING

ASAN_OPTIONS=abort_on_error=1 gdb --args w3m.asan -T text/html -dump file

Program received signal SIGABRT, Aborted.
0x00007ffff642ec37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) frame 6
#6  0x0000000000735d1b in wc_any_to_ucs (cc=...) at ucs.c:281
281         cc.code = map[cc.code];
(gdb) p cc.code
$1 = 32

map is viscii112_ucs_map, which is size=32.

This is found by afl-fuzz.

heap-buffer-overflow read in HTMLlineproc0()

input (xxd cases/tats-w3m-59)

00000000: 3c6d 6574 6120 6368 6172 7365 743d 6762  <meta charset=gb
00000010: 3138 3033 303e 0a80                      18030>..

how to reproduce:

ASAN_OPTIONS=abort_on_error=1:detect_leaks=0 LD_LIBRARY_PATH=./notgc ./w3m-tats.asan -T text/html -dump cases/tats-w3m-59

stderr:

=================================================================
==3694383==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300000d0b0 at pc 0x000000592ab5 bp 0x7ffcaf2930b0 sp 0x7ffcaf2930a8
READ of size 1 at 0x60300000d0b0 thread T0
    #0 0x592ab4 in HTMLlineproc0 /targets/w3m-tats/file.c:6498:9
    #1 0x5a5a0c in loadHTMLstream /targets/w3m-tats/file.c:7252:2
    #2 0x55bbf8 in loadHTMLBuffer /targets/w3m-tats/file.c:6781:5
    #3 0x55ee64 in loadSomething /targets/w3m-tats/file.c:224:16
    #4 0x5535ac in loadGeneralFile /targets/w3m-tats/file.c:2241:6
    #5 0x4f9202 in main /targets/w3m-tats/main.c:1017:12
    #6 0x7f09caf77f44 in __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:287
    #7 0x41bf25 in _start (/w3m-tats.asan+0x41bf25)

0x60300000d0b0 is located 0 bytes to the right of 32-byte region [0x60300000d090,0x60300000d0b0)
allocated by thread T0 here:
    #0 0x4c6288 in __interceptor_malloc (/w3m-tats.asan+0x4c6288)
    #1 0x7f09cc6c5c21 in GC_malloc_atomic /notgc/notgc.c:275
    #2 0x590275 in HTMLlineproc0 /targets/w3m-tats/file.c:6320:14
    #3 0x5a5a0c in loadHTMLstream /targets/w3m-tats/file.c:7252:2
    #4 0x55bbf8 in loadHTMLBuffer /targets/w3m-tats/file.c:6781:5
    #5 0x55ee64 in loadSomething /targets/w3m-tats/file.c:224:16
    #6 0x5535ac in loadGeneralFile /targets/w3m-tats/file.c:2241:6
    #7 0x4f9202 in main /targets/w3m-tats/main.c:1017:12
    #8 0x7f09caf77f44 in __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:287

SUMMARY: AddressSanitizer: heap-buffer-overflow /targets/w3m-tats/file.c:6498:9 in HTMLlineproc0
Shadow bytes around the buggy address:
  0x0c067fff99c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff99d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff99e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff99f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c067fff9a10: fa fa 00 00 00 00[fa]fa 00 00 01 fa fa fa 00 00
  0x0c067fff9a20: 00 00 fa fa 00 00 00 00 fa fa 00 00 00 00 fa fa
  0x0c067fff9a30: 00 00 00 00 fa fa 00 00 00 fa fa fa 00 00 00 00
  0x0c067fff9a40: fa fa fd fd fd fd fa fa 00 00 00 00 fa fa fd fd
  0x0c067fff9a50: fd fd fa fa 00 00 00 00 fa fa 00 00 00 00 fa fa
  0x0c067fff9a60: 00 00 00 fa fa fa 00 00 00 00 fa fa 00 00 02 fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==3694383==ABORTING

This is detected with help of dummy libgc wrapper. See http://github.com/kcwu/fuzzing-w3m/notgc for detail.
More detail to reproduce please see http://github.com/kcwu/fuzzing-w3m

For your convenience,
gdbline:
LD_LIBRARY_PATH=./notgc ASAN_OPTIONS=abort_on_error=1:detect_leaks=0 gdb --args ./w3m-tats.asan -T text/html -dump cases/tats-w3m-59

This is found by afl-fuzz.

heap buffer overflow and bad pointer deref in conv_symbol()

input

00000000: 3c74 6162 6c65 3e1b 3c61 3e30 3c74 6578  <table>.<a>0<tex
00000010: 7461 7265 6100 636f 6c73 3d32 3030 3e3c  tarea.cols=200><
00000020: 713c 7461 626c 653e 303c 7020 3d3e 303c  q<table>0<p =>0<
00000030: 6852 3e30 3c70 203d 3e30 3c70 0d3e 303c  hR>0<p =>0<p.>0<
00000040: 703e 303c 6852 3e30 3c70 3e30 3c68 523e  p>0<hR>0<p>0<hR>
00000050: 303c 7464 3e30 3c68 5220 616c 6967 6e3d  0<td>0<hR align=
00000060: 6d69 6464 6c65 3e30 3030 3030 1e30 3030  middle>00000.000
00000070: 3030 e430 3030 30ff 3030 3030 30a5 3030  00.0000.00000.00
00000080: 3030 303c 303c 3030 3030 3d30 3030 3030  000<0<0000=00000
00000090: 2f30 3030 3030 ff30 3030 3030            /00000.00000

gdb --args w3m -T text/html -dump file

Program received signal SIGSEGV, Segmentation fault.
strlen () at ../sysdeps/x86_64/strlen.S:106
106     ../sysdeps/x86_64/strlen.S: No such file or directory.
(gdb) bt
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
#1  0x0000000000479ffe in Strcat_charp (x=0x7f29a0, y=0x500000004 <error: Cannot access memory at address 0x500000004>) at Str.c:217
#2  0x000000000042ea11 in conv_symbol (l=0x7d6c60) at file.c:7614
#3  0x000000000042eb4c in _saveBuffer (buf=0x7d4e00, l=0x7d6c60, f=0x7ffff6ffe400 <_IO_2_1_stdout_>, cont=0) at file.c:7647
#4  0x000000000042eca3 in saveBuffer (buf=0x7d4e00, f=0x7ffff6ffe400 <_IO_2_1_stdout_>, cont=0) at file.c:7668
#5  0x0000000000408002 in do_dump (buf=0x7d4e00) at main.c:1360
#6  0x0000000000407433 in main (argc=5, argv=0x7fffffffcde8, envp=0x7fffffffce18) at main.c:1066
(gdb) frame 2
#2  0x000000000042ea11 in conv_symbol (l=0x7d6c60) at file.c:7614
7614                Strcat_charp(tmp, symbol[(int)c]);
(gdb) p symbol
$1 = (char **) 0x7ccd80
(gdb) p c
$2 = 81 'Q'

symbol is allocated in update_utf8_symbol(). Its size is only 46. So symbol[c] is heap buffer overflow read.

This is found by afl-fuzz.

unable to compile

Steps:

  1. wget https://github.com/tats/w3m/archive/debian/0.5.3-33.tar.gz
  2. tar xzf 0.5.3-33.tar.gz
  3. cd w3m-debian-0.5.3-33/
  4. ./configure && make -j1 V=1
gcc  -I. -I. -g -O2 -I./libwc   -DHAVE_CONFIG_H -DAUXBIN_DIR=\"/usr/local/libexec/w3m\" -DCGIBIN_DIR=\"/usr/local/libexec/w3m/cgi-bin\" -DHELP_DIR=\"/usr/local/share/w3m\" -DETC_DIR=\"/usr/local/etc\" -DCONF_DIR=\"/usr/local/etc/w3m\" -DRC_DIR=\"~/.w3m\" -DLOCALEDIR=\"/usr/local/share/locale\"   -c -o main.o main.c                                                                  
In file included from html.h:10:0,
                 from fm.h:39,
                 from main.c:3:
istream.h:23:8: error: redefinition of 'struct file_handle'
 struct file_handle {                                                                                                                                                                          
        ^
In file included from /usr/include/bits/fcntl.h:61:0,
                 from /usr/include/fcntl.h:35,
                 from istream.h:14,
                 from html.h:10,
                 from fm.h:39,
                 from main.c:3:
/usr/include/bits/fcntl-linux.h:333:8: note: originally defined here
 struct file_handle                                                                                                                                                                            
        ^
main.c: In function 'main':
main.c:836:23: error: void value not ignored as it ought to be
     orig_GC_warn_proc = GC_set_warn_proc(wrap_GC_warn_proc);                                                                                                                                  
                       ^
main.c: In function 'getChar':
main.c:2264:37: warning: passing argument 1 of 'wtf_parse1' from incompatible pointer type
     return wc_any_to_ucs(wtf_parse1(&p));                                                                                                                                                     
                                     ^
In file included from fm.h:44:0,
                 from main.c:3:
./libwc/wtf.h:71:19: note: expected 'wc_uchar **' but argument is of type 'char **'
 extern wc_wchar_t wtf_parse1(wc_uchar **p);
                   ^
main.c: In function 'execsh':
main.c:2091:2: warning: ignoring return value of 'system', declared with attribute warn_unused_result [-Wunused-result]
  system(cmd);
  ^
main.c: In function 'handleMailto':
main.c:2953:5: warning: ignoring return value of 'system', declared with attribute warn_unused_result [-Wunused-result]
     system(myExtCommand(Mailer, shell_quote(file_unquote(to->ptr)),
     ^
main.c: In function 'editBf':
main.c:2638:5: warning: ignoring return value of 'system', declared with attribute warn_unused_result [-Wunused-result]
     system(cmd->ptr);
     ^
main.c: In function 'editScr':
main.c:2661:5: warning: ignoring return value of 'system', declared with attribute warn_unused_result [-Wunused-result]
     system(myEditor(Editor, shell_quote(tmpf),
     ^
<builtin>: recipe for target 'main.o' failed
make: *** [main.o] Error 1

crash after allocate string of negative size

input

00000000: 3c74 6162 6c65 3e3c 7461 626c 6520 6365  <table><table ce
00000010: 6c6c 7061 6464 696e 673d 3636 3036 3030  llpadding=660600
00000020: 3030 3030 3e30 3c74 643e 3c74 6578 7461  0000>0<td><texta
00000030: 7265 6120 726f 7773 3d34 3e3c 2f74 6162  rea rows=4></tab
00000040: 6c65 3c74 643e 3030                      le<td>00

crash location

Program received signal SIGSEGV, Segmentation fault.
0x000000000047967f in Strnew_size (n=-58) at Str.c:53
53  x->ptr[0] = '\0';
(gdb) p x->ptr
$1 = 0x0
(gdb) up
#1  0x000000000041e244 in flushline (h_env=0x7fff1b76ea60, obuf=0x7fff1b76ebf0, indent=0, force=0, width=-78) at file.c:2829
2829  o.line = Strnew_size(width + 20);
(gdb) bt
#0  0x000000000047967f in Strnew_size (n=-58) at Str.c:53
#1  0x000000000041e244 in flushline (h_env=0x7fff1b76ea60, obuf=0x7fff1b76ebf0, indent=0, force=0, width=-78) at file.c:2829
#2  0x000000000042c300 in HTMLlineproc0 (line=0x10ce1da "", h_env=0x7fff1b76ea60, internal=1) at file.c:6636
#3  0x0000000000442790 in do_refill (tbl=0x10ca2d0, row=0, col=1, maxlimit=-78) at table.c:798
#4  0x000000000044667b in renderTable (t=0x10ca2d0, max_width=23, h_env=0x7fff1b76f150) at table.c:1804
#5  0x0000000000445ea7 in renderCoTable (tbl=0x10c7e10, maxlimit=79) at table.c:1653
#6  0x00000000004465e8 in renderTable (t=0x10c7e10, max_width=78, h_env=0x7fff1b76f950) at table.c:1797
#7  0x000000000042b826 in HTMLlineproc0 (line=0x495779 "", h_env=0x7fff1b76f950, internal=1) at file.c:6444
#8  0x000000000042d3ec in completeHTMLstream (h_env=0x7fff1b76f950, obuf=0x7fff1b76fae0) at file.c:7013
#9  0x000000000042ddf7 in loadHTMLstream (f=0x7fff1b76ff70, newBuf=0x10c7770, src=0x0, internal=0) at file.c:7245
#10 0x000000000042c7db in loadHTMLBuffer (f=0x7fff1b76ff70, newBuf=0x10c7770) at file.c:6773
#11 0x0000000000416951 in loadSomething (f=0x7fff1b76ff70, loadproc=0x42c6c1 <loadHTMLBuffer>, defaultbuf=0x10c7770) at file.c:224
#12 0x000000000041c7c3 in loadGeneralFile (path=0x10c5160 "min/2", current=0x0, referer=0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>, flag=0, request=0x0) at file.c:2241
#13 0x0000000000406fe2 in main (argc=5, argv=0x7fff1b770298, envp=0x7fff1b7702c8) at main.c:1020

With further debugging, I found the value -78 is coming from the result of LUsolve. At renderTable() line 1754 of table.c, the result of

1754             LUsolve(mat, pivot, t->vector, newwidth);
(gdb) p newwidth->ve[0]
$1 = -78.050371113549431

This is found by afl-fuzz.

HTMLlineproc0 infinite recursion

input

00000000: 3c74 6162 6c65 3e30 3c6e 6f62 722f 3c3e  <table>0<nobr/<>
00000010: 303c 786d 703e 3c74 6162 6c65 3e30 3030  0<xmp><table>000
00000020: 3030 3030 3030 303c 696e 7075 745f 616c  0000000<input_al
00000030: 7420 626f 7474 6f6d 5f6d 6172 6769 6e3d  t bottom_margin=
00000040: 3930 3030 3030 3e30 3030 3030 3030 3030  900000>000000000
00000050: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000060: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000070: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000080: 3030 3030 3030 3030 3030 3030 30         0000000000000

gdb --args w3m -T text/html -dump file

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff78800fe in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
(gdb) bt 50
#0  0x00007ffff78800fe in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#1  0x00007ffff7880116 in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#2  0x00007ffff7880116 in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#3  0x00007ffff7880116 in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#4  0x00007ffff7880116 in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#5  0x00007ffff7880116 in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#6  0x00007ffff7880116 in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#7  0x00007ffff7880116 in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#8  0x00007ffff7880116 in GC_clear_stack_inner () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#9  0x00007ffff787cdcc in GC_generic_malloc_many () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#10 0x00007ffff7885b89 in GC_malloc_atomic () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#11 0x0000000000479fa7 in Strgrow (x=0xc37380) at Str.c:239
#12 0x0000000000439530 in read_token (buf=0xc37380, instr=0x7fffff802c98, status=0x7fffffffb624, pre=0, append=0) at etc.c:825
#13 0x000000000042b583 in HTMLlineproc0 (line=0xc365a0 '0' <repeats 70 times>, h_env=0x7fffffffb420, internal=1) at file.c:6359
#14 0x000000000042c363 in HTMLlineproc0 (line=0xc36046 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#15 0x000000000042c363 in HTMLlineproc0 (line=0xc368b6 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#16 0x000000000042c363 in HTMLlineproc0 (line=0xc35c26 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#17 0x000000000042c363 in HTMLlineproc0 (line=0xc36d66 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#18 0x000000000042c363 in HTMLlineproc0 (line=0xc36a96 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#19 0x000000000042c363 in HTMLlineproc0 (line=0xc367c6 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#20 0x000000000042c363 in HTMLlineproc0 (line=0xc364f6 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#21 0x000000000042c363 in HTMLlineproc0 (line=0xc36226 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#22 0x000000000042c363 in HTMLlineproc0 (line=0xc36096 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#23 0x000000000042c363 in HTMLlineproc0 (line=0xc36186 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#24 0x000000000042c363 in HTMLlineproc0 (line=0xc36276 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#25 0x000000000042c363 in HTMLlineproc0 (line=0xc36366 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#26 0x000000000042c363 in HTMLlineproc0 (line=0xc36456 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#27 0x000000000042c363 in HTMLlineproc0 (line=0xc36546 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#28 0x000000000042c363 in HTMLlineproc0 (line=0xc36636 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#29 0x000000000042c363 in HTMLlineproc0 (line=0xc36726 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#30 0x000000000042c363 in HTMLlineproc0 (line=0xc36816 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#31 0x000000000042c363 in HTMLlineproc0 (line=0xc36906 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#32 0x000000000042c363 in HTMLlineproc0 (line=0xc369f6 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#33 0x000000000042c363 in HTMLlineproc0 (line=0xc36ae6 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#34 0x000000000042c363 in HTMLlineproc0 (line=0xc36bd6 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#35 0x000000000042c363 in HTMLlineproc0 (line=0xc36cc6 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#36 0x000000000042c363 in HTMLlineproc0 (line=0xc36db6 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#37 0x000000000042c363 in HTMLlineproc0 (line=0xc36ea6 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#38 0x000000000042c363 in HTMLlineproc0 (line=0xc36f96 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#39 0x000000000042c363 in HTMLlineproc0 (line=0xc353b6 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#40 0x000000000042c363 in HTMLlineproc0 (line=0xc35686 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#41 0x000000000042c363 in HTMLlineproc0 (line=0xc35ef6 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#42 0x000000000042c363 in HTMLlineproc0 (line=0xc35e06 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#43 0x000000000042c363 in HTMLlineproc0 (line=0xc35b36 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#44 0x000000000042c363 in HTMLlineproc0 (line=0xc35866 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#45 0x000000000042c363 in HTMLlineproc0 (line=0xc35596 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#46 0x000000000042c363 in HTMLlineproc0 (line=0xc352c6 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#47 0x000000000042c363 in HTMLlineproc0 (line=0xc35046 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#48 0x000000000042c363 in HTMLlineproc0 (line=0xc35136 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
#49 0x000000000042c363 in HTMLlineproc0 (line=0xc35226 "", h_env=0x7fffffffb420, internal=1) at file.c:6615
(More stack frames follow...)

found by afl-fuzz

null pointer dereference in HTMLlineproc2body

input

00000000: 3c74 6162 6c65 3e3c 623c 3e00 3c6c 6973  <table><b<>.<lis
00000010: 7469 6e67 3e3c 7365 6c65 6374 5f69 6e74  ting><select_int
00000020: 2073 656c 6563 746e 756d 6265 723d 303e   selectnumber=0>
00000030: 3c2f 7365 6c65 6374 5f69 6e74 3e         </select_int>

gdb log

$ gdb --args w3m -T text/html -dump file
Program received signal SIGSEGV, Segmentation fault.
0x000000000042a5c4 in HTMLlineproc2body (buf=0x774740, feed=0x42817d <textlist_feed>, llimit=-1) at file.c:6061
6061                                (FormItemList *)a_select[n_select]->url;
(gdb) p a_select
$1 = (Anchor **) 0x0
(gdb) p n_select
$2 = 0
(gdb) bt
#0  0x000000000042a5c4 in HTMLlineproc2body (buf=0x774740, feed=0x42817d <textlist_feed>, llimit=-1) at file.c:6061
#1  0x000000000042adb0 in HTMLlineproc2 (buf=0x774740, tl=0x774c00) at file.c:6191
#2  0x000000000042dfb2 in loadHTMLstream (f=0x7fffffffcb00, newBuf=0x774740, src=0x0, internal=0) at file.c:7276
#3  0x000000000042c7db in loadHTMLBuffer (f=0x7fffffffcb00, newBuf=0x774740) at file.c:6773
#4  0x0000000000416951 in loadSomething (f=0x7fffffffcb00, loadproc=0x42c6c1 <loadHTMLBuffer>, defaultbuf=0x774740) at file.c:224
#5  0x000000000041c7c3 in loadGeneralFile (path=0x772130 "min/1", current=0x0, referer=0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>, flag=0,
    request=0x0) at file.c:2241
#6  0x0000000000406fe2 in main (argc=5, argv=0x7fffffffce28, envp=0x7fffffffce58) at main.c:1020

This is found by afl-fuzz.

write access violation with '<button type=radio>'

$ echo '<button type=radio>' |  ./w3m -T text/html -dump
Program received signal SIGSEGV, Segmentation fault.
0x000000000044f19b in formUpdateBuffer (a=0x7d9000, buf=0x7cee00, form=0x7d8f80) at form.c:444
444                 buf->currentLine->lineBuf[spos] = ' ';
(gdb) p buf->currentLine->lineBuf
$1 = 0x495252 ""
(gdb) p spos
$2 = 0
(gdb) info files
....skip some lines...
       0x0000000000490b40 - 0x00000000004a0612 is .rodata

Writing to rodata section and crash.

This is found by afl-fuzz

global-buffer-overflow write in formUpdateBuffer

input

00000000: 303c 6275 7474 6f6e 2076 616c 7565 3d27  0<button value='
00000010: 223e 3c69 6e74 6572 6e61 6c3e 273e 3c41  "><internal>'><A
00000020: 2068 7265 663d 3e3c 6832 3e3c 696e 7075   href=><h2><inpu
00000030: 7420 7479 7065 3d22 7261 6469 6f22 3e    t type="radio">

How to reproduce

Build with ASan:
export CC=clang-3.6
export CFLAGS='-g -O0 -fsanitize=address'
export ASAN_OPTIONS='abort_on_error=1:detect_leaks=0'
./configure --enable-image=no
make clean all

./w3m -T text/html -dump input

AddressSanitizer output

==2826401==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000000a0eaa6 at pc 0x00000075f03c bp 0x7ffd06f92990 sp 0x7ffd06f92988
WRITE of size 1 at 0x000000a0eaa6 thread T0
    #0 0x75f03b in formUpdateBuffer /fuzz/w3m/form.c:448:6
    #1 0x7631ba in formResetBuffer /fuzz/w3m/form.c:272:2
    #2 0x5a4482 in loadHTMLBuffer /fuzz/w3m/file.c:6779:2
    #3 0x5aa4e0 in loadSomething /fuzz/w3m/file.c:224:16
    #4 0x595063 in loadGeneralFile /fuzz/w3m/file.c:2241:6
    #5 0x4f8bf9 in main /fuzz/w3m/main.c:1020:12
    #6 0x7f7afc164f44 in __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:287
    #7 0x447316 in _start (/fuzz/w3m/w3m.asan+0x447316)

0x000000a0eaa6 is located 58 bytes to the left of global variable '<string literal>' defined in 'buffer.c:62:21' (0xa0eae0) of size 7
  '<string literal>' is ascii string '*Null*'
0x000000a0eaa6 is located 5 bytes to the right of global variable '<string literal>' defined in 'buffer.c:18:18' (0xa0eaa0) of size 1
  '<string literal>' is ascii string ''
SUMMARY: AddressSanitizer: global-buffer-overflow /fuzz/w3m/form.c:448 formUpdateBuffer
Shadow bytes around the buggy address:
  0x000080139d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080139d10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080139d20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080139d30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080139d40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x000080139d50: 00 00 00 00[01]f9 f9 f9 f9 f9 f9 f9 07 f9 f9 f9
  0x000080139d60: f9 f9 f9 f9 07 f9 f9 f9 f9 f9 f9 f9 00 00 03 f9
  0x000080139d70: f9 f9 f9 f9 00 00 02 f9 f9 f9 f9 f9 02 f9 f9 f9
  0x000080139d80: f9 f9 f9 f9 02 f9 f9 f9 f9 f9 f9 f9 02 f9 f9 f9
  0x000080139d90: f9 f9 f9 f9 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9
  0x000080139da0: 00 00 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 04
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==2826401==ABORTING

gdb output

Program received signal SIGSEGV, Segmentation fault.
0x000000000044f4d9 in formUpdateBuffer (a=0x7e00d8, buf=0x7d3e00, form=0x7dfe00) at form.c:448
448                 buf->currentLine->lineBuf[spos] = ' ';
(gdb) p buf->currentLine
$1 = (Line *) 0x7d5de0
(gdb) p buf->currentLine->lineBuf
$2 = 0x495682 ""
(gdb) p spos
$3 = 6

found by afl-fuzz

NULL pointer dereference in Strclear (Str.c)

# w3m -version
w3m version w3m/0.5.3+git20161120, options lang=en,m17n,image,color,ansi-color,mouse,menu,cookie,ssl,ssl-verify,external-uri-loader,w3mmailer,nntp,ipv6,alarm,mark
# w3m -T text/html -dump $FILE
ASAN:DEADLYSIGNAL
=================================================================
==16911==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x0000007160f7 bp 0x7ffcd3c12640 sp 0x7ffcd3c12640 T0)
==16911==The signal is caused by a WRITE memory access.
==16911==Hint: address points to the zero page.
    #0 0x7160f6 in Strclear /tmp/w3m-0.5.3-git20161120/Str.c:118:15
    #1 0x5ece4d in read_token /tmp/w3m-0.5.3-git20161120/etc.c:804:2
    #2 0x5ad480 in HTMLlineproc0 /tmp/w3m-0.5.3-git20161120/file.c:6345:3
    #3 0x62e6b5 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1872:5
    #4 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #5 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #6 0x5b2020 in HTMLlineproc0 /tmp/w3m-0.5.3-git20161120/file.c:6452:3
    #7 0x5c1dc6 in completeHTMLstream /tmp/w3m-0.5.3-git20161120/file.c:7022:2
    #8 0x5663b3 in loadHTMLstream /tmp/w3m-0.5.3-git20161120/file.c:7258:5
    #9 0x57aebd in loadHTMLBuffer /tmp/w3m-0.5.3-git20161120/file.c:6781:5
    #10 0x56fadf in loadSomething /tmp/w3m-0.5.3-git20161120/file.c:224:16
    #11 0x56fadf in loadGeneralFile /tmp/w3m-0.5.3-git20161120/file.c:2241
    #12 0x5106a5 in main /tmp/w3m-0.5.3-git20161120/main.c:1017:12
    #13 0x7fdd2498161f in __libc_start_main /var/tmp/portage/sys-libs/glibc-2.22-r4/work/glibc-2.22/csu/libc-start.c:289
    #14 0x41bd08 in _start (/usr/bin/w3m+0x41bd08)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /tmp/w3m-0.5.3-git20161120/Str.c:118:15 in Strclear
==16911==ABORTING

Testcase: https://github.com/asarubbo/poc/blob/master/00084-w3m-nullptr-Strclear

MSan may incorrectly report buffer overflow due to undefined behavior

I cannot recall the exact case confusing MSan. Something like this:

short bar[N];
short foo[len];  // len=0;

MSan may report buffer overflow when access bar.

Build w3m with clang -fsanitize=undefined

case1

00000000: 3c74 6162 6c65 3e30 3c74 643e            <table>0<td>
$ w3m.ubsan -T text/html -dump case1
table.c:1222:18: runtime error: variable length array bound evaluates to non-positive value 0

case2

00000000: 3c74 6162 6c65 3e                        <table>
$ w3m.ubsan -T text/html -dump case2
table.c:1574:20: runtime error: variable length array bound evaluates to non-positive value 0

I don't know they are security related or not. But since MSan may be confused. I think they are worth to fix.

This is found by afl-fuzz.

'table cellpadding' consums lots of memory

echo '<table cellpadding=1070000000>0</table>' | ./w3m -T text/html -dump

this needs more than 1GB memory. Although w3m can finish render, I am wondering should w3m really need to allocate such amount of memory? maybe truncate the number instead?

echo '<table cellpadding=40700000000 border>0' | ./w3m -T text/html -dump

This needs more than 4GB and failed with following message:

GC Warning: Out of Memory! Heap size: 4539 MiB. Returning NULL!
Out of memory: 18446744073314787007 bytes unavailable!

18446744073314787007 looks like integer overflow?

My machine is 64bit ubuntu 14.04. w3m was compiled with default gcc.

This is found with afl-fuzz.

infinite recursion with nested table and textarea

$ xxd infinite-recursion
00000000: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000010: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000020: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000030: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000040: 3030 3030 3030 3030 3030 3030 3030 30e0  000000000000000.
00000050: 3030 3030 3030 3030 3c2f 3e30 3030 3030  00000000</>00000
00000060: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000070: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000080: 3030 3030 3030 3030 3030 303c 7461 626c  00000000000<tabl
00000090: 653e 3c74 6578 7461 7265 613e 3c74 6162  e><textarea><tab
000000a0: 6c65 2030 3030 303c 7465 7874 6172 6561  le 0000<textarea
000000b0: 3e3c 7461 626c 6520 3030 3030 3030 3030  ><table 00000000
000000c0: 3030 3030 3030 3030 303c 7465 7874 6172  000000000<textar
000000d0: 6561 3e3c 7461 626c 6520 303c 7465 7874  ea><table 0<text
000000e0: 6172 6561 3e3c 7461 626c 6520 3030 3030  area><table 0000
000000f0: 3030 3030 3030 303c 7465 7874 6172 6561  0000000<textarea
00000100: 3e3c 7461 626c 6520 3030 3030 3030 3030  ><table 00000000
00000110: 3030 3c74 6578 7461 7265 613e 3c74 6162  00<textarea><tab
00000120: 6c65 2030 3030 3030 3030 3030 3030 3c74  le 00000000000<t
00000130: 6578 7461 7265 613e 3c74 6162 6c65 3e3c  extarea><table><
00000140: 2f74 6162 6c65 3e                        /table>
$ gdb --args ./w3m -T text/html -dump infinite-recursion
Program received signal SIGSEGV, Segmentation fault.
0x0000000000445ad2 in renderCoTable (tbl=<error reading variable: Cannot access memory at address 0x7fffff7fedf8>, maxlimit=<error reading variable: Cannot access memory at address 0x7fffff7fedf4>) at table.c:1629
1629    {
(gdb) up
#1  0x000000000044643e in renderTable (t=0x7bf000, max_width=79, h_env=0x7fffff7ff500) at table.c:1794
1794        renderCoTable(t, h_env->limit);
(gdb)
#2  0x0000000000445d13 in renderCoTable (tbl=0x7bf000, maxlimit=79) at table.c:1654
1654            renderTable(t, maxwidth, &h_env);
(gdb)
#3  0x000000000044643e in renderTable (t=0x7bf000, max_width=79, h_env=0x7fffff7ffbe0) at table.c:1794
1794        renderCoTable(t, h_env->limit);

I found w3m called pushTable (tbl=0x7bf000, tbl1=0x7bf000) earlier.

This is found by afl-fuzz

DICT server dictionary lookup [script attached]

Hello Tats,

I noticed that this package included a cgi-bin script for querying 'dictionary.goo.ne.jp', so I thought I'd offer a second script that queries any DICT server and can prioritize any DICT database.

#!/bin/sh
# w3mdict.cgi - A dictd dictionary query cgi for w3m
#
# REQUIREMENTS:
# + dict client software
# + an address of a dict server, for variable ${DICT_SERVER}
# + a name of a favorite database on that server, for variable
#   ${FAVORITE_DATABASE}
# OPTIONALLY:
# + locally install a dict server (eg. dictd) and a collection
#   of dict databases (eg. wordnet, aka "wn")
# INSTALLATION:
#1] Copy this file to your ~/.w3m/cgi-bin folder
#2] Mark it executable (ie chmod +x ~/.w3m/cgi-bin/w3mdict.cgi)
#3] From inside w3m, type 'O' for the options page, and for the
#   setting "URL of dictionary lookup command",
#   enter "file:/cgi-bin/w3mdict.cgi"
DICT_SERVER="localhost"
FAVORITE_DATABASE="wn"
RETURN_MESSAGE="\n\nPress 'B' to return to the previous page."
printf "Content-type: text/plain\n"
type dict \
|| {
  # Originally, we inconsiderately failed silently ...
  #     printf "W3m-control: BACK\n\n"
  printf "\n\nERROR: dict client software not found${RETURN_MESSAGE}"
  exit
  }
# First, we check only our best and favorite database. This is most
# likely to give us a best defintion, and avoids displaying a long and
# cluttered page with entries from many databases.
dict --host "${DICT_SERVER}" \
     --database "${FAVORITE_DATABASE}" \
     "${QUERY_STRING}" 2>&1 \
&& {
  printf "${RETURN_MESSAGE}"
  } \
|| {
  # The initial attempt failed, so let's search ALL databases
  # available on the server.
  dict --host "${DICT_SERVER}" \
       "${QUERY_STRING}" 2>&1 \
  && {
    printf "${RETURN_MESSAGE}"
    } \
  || {
    # No defintions were found in any of the server's databases, so
    # let's return to the favorite database in order to retrieve its
    # guess of what we meant to type. Originally, for this case, we
    # pushed the user's default action to be entering another word for
    # a dict defintion, so the print command was:
    #     printf "W3m-control: DICT_WORD\n\n"
    # Now, we need only print a blank line to separate the cgi header
    # from the page content.
    printf "\n"
    dict --host "${DICT_SERVER}" \
         --database "${FAVORITE_DATABASE}" \
         "${QUERY_STRING}" 2>&1
    printf "${RETURN_MESSAGE}"
    }
  }

null pointer deref due to bad form id in HTMLlineproc2body()

input

00000000: 303c 6275 7474 6f6e 2076 616c 7565 3d27  0<button value='
00000010: 223e 3c69 6e70 7574 5f61 6c74 2066 6964  "><input_alt fid
00000020: 3d36 3e3c 666f 726d 5f69 6e74 2066 6964  =6><form_int fid
00000030: 3d36 3027 3e30 3030 3030 3030 3030 3030  =60'>00000000000
00000040: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000050: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000060: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000070: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000080: 3030 3030 30                             00000

gdb --args w3m -T text/html -dump file

Program received signal SIGSEGV, Segmentation fault.
0x0000000000429ca0 in HTMLlineproc2body (buf=0x7d4e00, feed=0x428395 <textlist_feed>, llimit=-1) at file.c:5862
5862                            if (!form->target)
(gdb) p form
$1 = (FormList *) 0x0
(gdb) bt
#0  0x0000000000429ca0 in HTMLlineproc2body (buf=0x7d4e00, feed=0x428395 <textlist_feed>, llimit=-1) at file.c:5862
#1  0x000000000042afd6 in HTMLlineproc2 (buf=0x7d4e00, tl=0x7cd280) at file.c:6197
#2  0x000000000042e1f4 in loadHTMLstream (f=0x7fffffffcac0, newBuf=0x7d4e00, src=0x0, internal=0) at file.c:7285
#3  0x000000000042ca01 in loadHTMLBuffer (f=0x7fffffffcac0, newBuf=0x7d4e00) at file.c:6779
#4  0x0000000000416b3f in loadSomething (f=0x7fffffffcac0, loadproc=0x42c8e7 <loadHTMLBuffer>, defaultbuf=0x7d4e00) at file.c:224
#5  0x000000000041c9b1 in loadGeneralFile (path=0x7c4b00 "min/10", current=0x0, referer=0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>, flag=0, request=0x0) at file.c:2241
#6  0x00000000004071c1 in main (argc=5, argv=0x7fffffffcde8, envp=0x7fffffffce18) at main.c:1020

I found form is obtained by form[form_id] earlier

5836                            if (form_id < 0 || form_id > form_max || forms == NULL)
5837                                break;      /* outside of <form>..</form> */
5838                            form = forms[form_id];
(gdb) p form_id
$3 = 6
(gdb) p form_max
$4 = 60

Although the value of form_id is validated but form_max is incorrectly obtained from user input.

This is found by afl-fuzz.

heap-buffer-overflow read in process_textarea()

input (xxd cases/tats-w3m-58)

00000000: 3c54 4558 5441 5245 4120 636f 6c73 3d3e  <TEXTAREA cols=>

how to reproduce:

ASAN_OPTIONS=abort_on_error=1:detect_leaks=0 LD_LIBRARY_PATH=./notgc ./w3m-tats.asan -T text/html -dump cases/tats-w3m-58

stderr:

=================================================================
==3694326==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300000d0ef at pc 0x000000573747 bp 0x7ffd68fc5570 sp 0x7ffd68fc5568
READ of size 1 at 0x60300000d0ef thread T0
    #0 0x573746 in process_textarea /targets/w3m-tats/file.c:4015:6
    #1 0x58cf06 in HTMLtagproc1 /targets/w3m-tats/file.c:5144:8
    #2 0x5927e3 in HTMLlineproc0 /targets/w3m-tats/file.c:6477:10
    #3 0x5a5a0c in loadHTMLstream /targets/w3m-tats/file.c:7252:2
    #4 0x55bbf8 in loadHTMLBuffer /targets/w3m-tats/file.c:6781:5
    #5 0x55ee64 in loadSomething /targets/w3m-tats/file.c:224:16
    #6 0x5535ac in loadGeneralFile /targets/w3m-tats/file.c:2241:6
    #7 0x4f9202 in main /targets/w3m-tats/main.c:1017:12
    #8 0x7fe58505ff44 in __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:287
    #9 0x41bf25 in _start (/w3m-tats.asan+0x41bf25)

0x60300000d0ef is located 1 bytes to the left of 32-byte region [0x60300000d0f0,0x60300000d110)
allocated by thread T0 here:
    #0 0x4c6288 in __interceptor_malloc (/w3m-tats.asan+0x4c6288)
    #1 0x7fe5867adc21 in GC_malloc_atomic /notgc/notgc.c:275
    #2 0x6dabae in parse_tag /targets/w3m-tats/parsetagx.c:236:20
    #3 0x592754 in HTMLlineproc0 /targets/w3m-tats/file.c:6472:17
    #4 0x5a5a0c in loadHTMLstream /targets/w3m-tats/file.c:7252:2
    #5 0x55bbf8 in loadHTMLBuffer /targets/w3m-tats/file.c:6781:5
    #6 0x55ee64 in loadSomething /targets/w3m-tats/file.c:224:16
    #7 0x5535ac in loadGeneralFile /targets/w3m-tats/file.c:2241:6
    #8 0x4f9202 in main /targets/w3m-tats/main.c:1017:12
    #9 0x7fe58505ff44 in __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:287

SUMMARY: AddressSanitizer: heap-buffer-overflow /targets/w3m-tats/file.c:4015:6 in process_textarea
Shadow bytes around the buggy address:
  0x0c067fff99c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff99d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff99e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff99f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9a00: 00 00 00 00 fa fa 00 00 00 00 fa fa 00 00 00 00
=>0x0c067fff9a10: fa fa 00 00 00 00 fa fa 00 00 00 00 fa[fa]00 00
  0x0c067fff9a20: 00 00 fa fa 00 00 00 00 fa fa 00 00 00 00 fa fa
  0x0c067fff9a30: 00 00 00 00 fa fa 00 00 00 fa fa fa 00 00 00 00
  0x0c067fff9a40: fa fa 00 00 00 00 fa fa 00 00 00 00 fa fa fd fd
  0x0c067fff9a50: fd fd fa fa 00 00 00 00 fa fa 00 00 00 00 fa fa
  0x0c067fff9a60: 00 00 00 fa fa fa 00 00 00 00 fa fa 00 00 02 fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==3694326==ABORTING

This is detected with help of dummy libgc wrapper. See http://github.com/kcwu/fuzzing-w3m/notgc for detail.
More detail to reproduce please see http://github.com/kcwu/fuzzing-w3m

For your convenience,
gdbline:
LD_LIBRARY_PATH=./notgc ASAN_OPTIONS=abort_on_error=1:detect_leaks=0 gdb --args ./w3m-tats.asan -T text/html -dump cases/tats-w3m-58

This is found by afl-fuzz.

invalid read in Strgrow (Str.c)

# w3m -version
w3m version w3m/0.5.3+git20161120, options lang=en,m17n,image,color,ansi-color,mouse,menu,cookie,ssl,ssl-verify,external-uri-loader,w3mmailer,nntp,ipv6,alarm,mark
# w3m -T text/html -dump $FILE
ASAN:DEADLYSIGNAL
=================================================================
==21767==ERROR: AddressSanitizer: SEGV on unknown address 0x000002fc6000 (pc 0x7fdf411334d8 bp 0x7ffe0cec6220 sp 0x7ffe0cec61e8 T0)
==21767==The signal is caused by a READ memory access.
    #0 0x7fdf411334d7  /var/tmp/portage/sys-libs/glibc-2.22-r4/work/glibc-2.22/string/../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:1545
    #1 0x716e1f in Strgrow /tmp/w3m-0.5.3-git20161120/Str.c:240:5
    #2 0x587396 in push_spaces /tmp/w3m-0.5.3-git20161120/file.c:2592:2
    #3 0x587396 in fillline /tmp/w3m-0.5.3-git20161120/file.c:2707
    #4 0x5a2494 in HTMLtagproc1 /tmp/w3m-0.5.3-git20161120/file.c:4778:6
    #5 0x5adcef in HTMLlineproc0 /tmp/w3m-0.5.3-git20161120/file.c:6477:10
    #6 0x62e6b5 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1872:5
    #7 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #8 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #9 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #10 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #11 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #12 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #13 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #14 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #15 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #16 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #17 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #18 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #19 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #20 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #21 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #22 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #23 0x5b2020 in HTMLlineproc0 /tmp/w3m-0.5.3-git20161120/file.c:6452:3
    #24 0x5c1dc6 in completeHTMLstream /tmp/w3m-0.5.3-git20161120/file.c:7022:2
    #25 0x5663b3 in loadHTMLstream /tmp/w3m-0.5.3-git20161120/file.c:7258:5
    #26 0x57aebd in loadHTMLBuffer /tmp/w3m-0.5.3-git20161120/file.c:6781:5
    #27 0x56fadf in loadSomething /tmp/w3m-0.5.3-git20161120/file.c:224:16
    #28 0x56fadf in loadGeneralFile /tmp/w3m-0.5.3-git20161120/file.c:2241
    #29 0x5106a5 in main /tmp/w3m-0.5.3-git20161120/main.c:1017:12
    #30 0x7fdf4101d61f in __libc_start_main /var/tmp/portage/sys-libs/glibc-2.22-r4/work/glibc-2.22/csu/libc-start.c:289
    #31 0x41bd08 in _start (/usr/bin/w3m+0x41bd08)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /var/tmp/portage/sys-libs/glibc-2.22-r4/work/glibc-2.22/string/../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:1545 
==21767==ABORTING

Testcase: https://github.com/asarubbo/poc/blob/master/00085-w3m-invalidread-Strgrow

valgrind found many issues about uninitialised value

There are many noise in libgc. I don't know they are valid or not, so just disable GC by GC_DONT_GC=1 and ignore memory leak for now.

Minimal test case:

$ echo '<i>' | env GC_DONT_GC=1 valgrind ./w3m -T text/html -dump
(skip some lines)
==3705616== Conditional jump or move depends on uninitialised value(s)
==3705616==    at 0x4763F5: parse_tag (parsetagx.c:129)
==3705616==    by 0x42B6FA: HTMLlineproc0 (file.c:6446)
==3705616==    by 0x42DB3A: loadHTMLstream (file.c:7221)
==3705616==    by 0x42C596: loadHTMLBuffer (file.c:6755)
==3705616==    by 0x42ED37: openGeneralPagerBuffer (file.c:7776)
==3705616==    by 0x406BCC: main (main.c:926)
(skip some lines)

There is different issue for <dd>

$ echo '<dd>' | env GC_DONT_GC=1 valgrind ./w3m -T text/html -dump
==74559== Conditional jump or move depends on uninitialised value(s)
==74559==    at 0x4247D9: HTMLtagproc1 (file.c:4678)
==74559==    by 0x42B728: HTMLlineproc0 (file.c:6451)
==74559==    by 0x42DB3A: loadHTMLstream (file.c:7221)
==74559==    by 0x42C596: loadHTMLBuffer (file.c:6755)
==74559==    by 0x42ED37: openGeneralPagerBuffer (file.c:7776)
==74559==    by 0x406BCC: main (main.c:926)

If feeding w3m more complex input, there are more. For example,

$ echo "<table>0<td rowspan=0 colspan=300>a"   | env GC_DONT_GC=1 valgrind ./w3m -T text/html -dump

(just paste few examples here, the actual output is much longer)
==877622== Conditional jump or move depends on uninitialised value(s)
==877622==    at 0x443C54: check_compressible_cell (table.c:1174)
==877622==    by 0x4447EF: check_table_width (table.c:1261)
==877622==    by 0x446068: renderTable (table.c:1769)
==877622==    by 0x42B616: HTMLlineproc0 (file.c:6426)
==877622==    by 0x42D1A7: completeHTMLstream (file.c:6995)
==877622==    by 0x42DBB2: loadHTMLstream (file.c:7227)
==877622==    by 0x42C596: loadHTMLBuffer (file.c:6755)
==877622==    by 0x42ED37: openGeneralPagerBuffer (file.c:7776)
==877622==    by 0x406BCC: main (main.c:926)

==877622== Conditional jump or move depends on uninitialised value(s)
==877622==    at 0x443597: correlation_coefficient (table.c:1054)
==877622==    by 0x44375F: recalc_width (table.c:1085)
==877622==    by 0x443D0C: check_compressible_cell (table.c:1175)
==877622==    by 0x4447EF: check_table_width (table.c:1261)
==877622==    by 0x446068: renderTable (table.c:1769)
==877622==    by 0x42B616: HTMLlineproc0 (file.c:6426)
==877622==    by 0x42D1A7: completeHTMLstream (file.c:6995)
==877622==    by 0x42DBB2: loadHTMLstream (file.c:7227)
==877622==    by 0x42C596: loadHTMLBuffer (file.c:6755)
==877622==    by 0x42ED37: openGeneralPagerBuffer (file.c:7776)
==877622==    by 0x406BCC: main (main.c:926)

==877622== Conditional jump or move depends on uninitialised value(s)
==877622==    at 0x4E5DB08: sqrt (w_sqrt.c:27)
==877622==    by 0x4435B5: correlation_coefficient (table.c:1056)
==877622==    by 0x44375F: recalc_width (table.c:1085)
==877622==    by 0x443D0C: check_compressible_cell (table.c:1175)
==877622==    by 0x4447EF: check_table_width (table.c:1261)
==877622==    by 0x446068: renderTable (table.c:1769)
==877622==    by 0x42B616: HTMLlineproc0 (file.c:6426)
==877622==    by 0x42D1A7: completeHTMLstream (file.c:6995)
==877622==    by 0x42DBB2: loadHTMLstream (file.c:7227)
==877622==    by 0x42C596: loadHTMLBuffer (file.c:6755)
==877622==    by 0x42ED37: openGeneralPagerBuffer (file.c:7776)
==877622==    by 0x406BCC: main (main.c:926)

==877622== Conditional jump or move depends on uninitialised value(s)
==877622==    at 0x4437CD: recalc_width (table.c:1092)
==877622==    by 0x443D0C: check_compressible_cell (table.c:1175)
==877622==    by 0x4447EF: check_table_width (table.c:1261)
==877622==    by 0x446068: renderTable (table.c:1769)
==877622==    by 0x42B616: HTMLlineproc0 (file.c:6426)
==877622==    by 0x42D1A7: completeHTMLstream (file.c:6995)
==877622==    by 0x42DBB2: loadHTMLstream (file.c:7227)
==877622==    by 0x42C596: loadHTMLBuffer (file.c:6755)
==877622==    by 0x42ED37: openGeneralPagerBuffer (file.c:7776)
==877622==    by 0x406BCC: main (main.c:926)

found with help of afl-fuzz

segfault when iso2022 parsing

$ echo -e '<meta charset=gb18030>\n<a target=\x80><meta charset=hz>\n\x1b' | ./w3m -T text/html -dump

Program received signal SIGSEGV, Segmentation fault.
0x0000000000481b3d in wc_push_to_iso2022 (os=0x7cc2c0, cc=..., st=0x7fffffffc7d0) at iso2022.c:408
408             g = cs94_gmap[WC_CCS_INDEX(cc.ccs) - WC_F_ISO_BASE];
(gdb) p cc.ccs
$1 = 256
(gdb) bt
#0  0x0000000000481b3d in wc_push_to_iso2022 (os=0x7cc2c0, cc=..., st=0x7fffffffc7d0) at iso2022.c:408
#1  0x000000000047edb2 in wc_conv_to_ces (is=0x7cc2e0, ces=2099217) at conv.c:93
#2  0x000000000047eacf in wc_Str_conv (is=0x7cc2e0, f_ces=3211264, t_ces=2099217) at conv.c:23
#3  0x000000000047eb32 in wc_Str_conv_strict (is=0x7cc2e0, f_ces=3211264, t_ces=2099217) at conv.c:37
#4  0x000000000042903c in HTMLlineproc2body (buf=0x7cee00, feed=0x427fa1 <textlist_feed>, llimit=-1) at file.c:5664
#5  0x000000000042aba1 in HTMLlineproc2 (buf=0x7cee00, tl=0x7cc5e0) at file.c:6173
#6  0x000000000042dd6e in loadHTMLstream (f=0x7fffffffd120, newBuf=0x7cee00, src=0x0, internal=0) at file.c:7258
#7  0x000000000042c597 in loadHTMLBuffer (f=0x7fffffffd120, newBuf=0x7cee00) at file.c:6755
#8  0x0000000000416a40 in loadSomething (f=0x7fffffffd120, loadproc=0x42c4b2 <loadHTMLBuffer>, defaultbuf=0x7cee00) at file.c:224
#9  0x000000000041c7e6 in loadGeneralFile (path=0x7c3ae0 "/tmp/zshWcHIdH", current=0x0, referer=0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>,
    flag=0, request=0x0) at file.c:2241
#10 0x00000000004070d1 in main (argc=5, argv=0x7fffffffd448, envp=0x7fffffffd478) at main.c:1020

found by afl-fuzz

heap out of bound write due to negative array index

How to reproduce:

$ echo -e '<table><b<>\x00<listing><select_int selectnumber=90000000000>' | ./w3m -T text/html -dump > /dev/null
Segmentation fault
$ echo -e '<table><b<>\x00<listing><select_int selectnumber=-90000>' | ./w3m -T text/html -dump > /dev/null
Segmentation fault

Here, selectnumber could be negative, or positive but overflows to negative.

The corresponding code snippet:

6033    if (parsedtag_get_value(tag, ATTR_SELECTNUMBER, &n_select)
6034        && n_select < max_select) {
6035        select_option[n_select].first = NULL;

n_select is the selectnumber mentioned above. It will crash at line 6035.

Similar code pattern at line 6015:

                      if (parsedtag_get_value(tag, ATTR_TEXTAREANUMBER,
                                              &n_textarea)
                          && n_textarea < max_textarea) {
                          textarea_str[n_textarea] = Strnew();

this is found by afl-fuzz

memory exhausted due to repeat appending "</table>"

$ echo "<taBle><title><xmp><body><table><textarea><T<t='<v='>" | ./w3m -T text/html -dump > /dev/null
GC Warning: Repeated allocation of very large block (appr. size 25112576):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 107966464):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 268652544):
        May lead to memory leak and poor performance.
GC Warning: Out of Memory! Heap size: 2213 MiB. Returning NULL!
Out of memory: 18446744073314787007 bytes unavailable!

gdb

(gdb) b die_oom
Breakpoint 1 at 0x404e89: file main.c, line 390.
(gdb) r
Breakpoint 1, die_oom (bytes=18446744073314787007) at main.c:390
390         fprintf(stderr, "Out of memory: %lu bytes unavailable!\n", (unsigned long)bytes);
(gdb) bt
#0  die_oom (bytes=18446744073314787007) at main.c:390
#1  0x00007ffff787c489 in GC_core_malloc_atomic () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#2  0x0000000000479adc in Strgrow (x=0x7ba660) at Str.c:238
#3  0x00000000004394fc in read_token (buf=0x7ba660, instr=0x7fffffffc198, status=0x7fffffffc504, pre=2048, append=1) at etc.c:858
#4  0x000000000042b1fd in HTMLlineproc0 (line=0x4953e9 "</table>", h_env=0x7fffffffc300, internal=1) at file.c:6336
#5  0x000000000042d330 in completeHTMLstream (h_env=0x7fffffffc300, obuf=0x7fffffffc490) at file.c:7012
#6  0x000000000044268b in do_refill (tbl=0x7bf000, row=0, col=0, maxlimit=79) at table.c:804
#7  0x00000000004464d1 in renderTable (t=0x7bf000, max_width=79, h_env=0x7fffffffcb10) at table.c:1800
#8  0x000000000042b782 in HTMLlineproc0 (line=0x4953f1 "", h_env=0x7fffffffcb10, internal=1) at file.c:6442
#9  0x000000000042d330 in completeHTMLstream (h_env=0x7fffffffcb10, obuf=0x7fffffffcca0) at file.c:7012
#10 0x000000000042dd3b in loadHTMLstream (f=0x7fffffffd130, newBuf=0x7d3e00, src=0x0, internal=0) at file.c:7244
#11 0x000000000042c71f in loadHTMLBuffer (f=0x7fffffffd130, newBuf=0x7d3e00) at file.c:6772
#12 0x0000000000416a90 in loadSomething (f=0x7fffffffd130, loadproc=0x42c61d <loadHTMLBuffer>, defaultbuf=0x7d3e00) at file.c:224
#13 0x000000000041c8ae in loadGeneralFile (path=0x7c3ae0 "/tmp/zshzsw6kF", current=0x0, referer=0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>, flag=0, request=0x0) at file.c:2245
#14 0x0000000000407121 in main (argc=5, argv=0x7fffffffd458, envp=0x7fffffffd488) at main.c:1020
(gdb) up
#1  0x00007ffff787c489 in GC_core_malloc_atomic () from /usr/lib/x86_64-linux-gnu/libgc.so.1
(gdb)
#2  0x0000000000479adc in Strgrow (x=0x7ba660) at Str.c:238
238         x->ptr = GC_MALLOC_ATOMIC(newlen);
(gdb) p x[0]
$1 = {
  ptr = 0x7659c000 "<t='\n</table></table></table></table></table></table></table></table></table></table></table></table></table></table></table></table></table></table></table></table></table></table></table></table></t"...,
  length = 386857375,
  area_size = 386857376
}
(gdb) p newlen
$2 = -394764609

Two issues

  1. why w3m repeat appending "</table>" to the buffer
  2. newlen integer overflow in Strgrow and to allocate huge buffer.

found by afl-fuzz

null pointer defer in HTMLlineproc0()

input

00000000: 3c6d 6574 6120 6368 6172 7365 743d 6762  <meta charset=gb
00000010: 3138 3033 303e 0a3c 696e 7075 740b 7661  18030>.<input.va
00000020: 6c75 653d 803c 7461 626c 653e            lue=.<table>

gdb --args w3m -T text/html -dump file

Program received signal SIGSEGV, Segmentation fault.
0x000000000042b440 in HTMLlineproc0 (line=0x7beef5 "\n", h_env=0x7fffffffc3d0, internal=0) at file.c:6333
6333            int pre_mode = (obuf->table_level >= 0) ? tbl_mode->pre_mode :
(gdb) p obuf->table_level
$1 = 0
(gdb) p tbl_mode
$2 = (struct table_mode *) 0x0

This is found by afl-fuzz.

Mapping of external commands

The MANUAL.html says that for security reasons, w3m wants to limit the folders available for cgi-bin. However, it seems that is circumvented by settings in the options screen: "URL of directory listing command" and "URL of dictionary lookup command". The defaults include the protocol specifier "file:/" indicating that one can use any folder, and that one might even be able to replace it with a remote url using "http:/"

segfault due to write to lineBuf[-1] in addMultirowsForm

$ xxd crash
00000000: 303c 7461 626c 6520 7769 6474 683d 3230  0<table width=20
00000010: 3e30 3c74 6974 6c65 3e3c 6c69 7374 696e  >0<title><listin
00000020: 673e 3c62 6f64 793e 3c74 6162 6c65 3e3c  g><body><table><
00000030: 2f69 6e74 6572 6e61 6c3e 3c74 643e f830  /internal><td>.0
00000040: 30d1 3030 0430 30fa 3030 2030 303d 3030  0.00.00.00 00=00
00000050: 9b30 309b 3030 9b3c 696e 7465 726e 616c  .00.00.<internal
00000060: 3e3c 7465 7874 6172 6561 2072 6f77 733d  ><textarea rows=
00000070: 3230 3e                                  20>
$ gdb --args ./w3m -T text/html -dump crash
Program received signal SIGSEGV, Segmentation fault.
0x00000000004757ce in addMultirowsForm (buf=0x7d3e00, al=0x7f3540) at anchor.c:688
688                 l->lineBuf[pos - 1] = '[';
(gdb) p pos
$1 = 0
(gdb) bt
#0  0x00000000004757ce in addMultirowsForm (buf=0x7d3e00, al=0x7f3540) at anchor.c:688
#1  0x000000000042aaff in HTMLlineproc2body (buf=0x7d3e00, feed=0x4280d9 <textlist_feed>, llimit=-1) at file.c:6136
#2  0x000000000042ad0c in HTMLlineproc2 (buf=0x7d3e00, tl=0x7cc120) at file.c:6189
#3  0x000000000042def6 in loadHTMLstream (f=0x7fffffffd140, newBuf=0x7d3e00, src=0x0, internal=0) at file.c:7275
#4  0x000000000042c71f in loadHTMLBuffer (f=0x7fffffffd140, newBuf=0x7d3e00) at file.c:6772
#5  0x0000000000416a90 in loadSomething (f=0x7fffffffd140, loadproc=0x42c61d <loadHTMLBuffer>, defaultbuf=0x7d3e00) at file.c:224
#6  0x000000000041c8ae in loadGeneralFile (path=0x7c3b00 "exp", current=0x0, referer=0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>, flag=0, request=0x0) at file.c:2245
#7  0x0000000000407121 in main (argc=5, argv=0x7fffffffd468, envp=0x7fffffffd498) at main.c:1020

This is found by afl-fuzz

segfault due to dereference near-null pointer in do_refill

input

00000000: 3c74 6162 6c65 3e3c 6c69 7374 696e 673e  <table><listing>
00000010: 3c74 6162 6c65 5f61 6c74 2074 6964 3d30  <table_alt tid=0
00000020: 3c2f 6c69 7374 696e 673c 7461 626c 653e  </listing<table>

gdb

Program received signal SIGSEGV, Segmentation fault.
0x00000000004423a4 in do_refill (tbl=0x7bf000, row=0, col=0, maxlimit=79) at table.c:768
768                     int limit = tbl->tables[id].indent + t->total_width;
(gdb) p t
$1 = (struct table *) 0x0

found by afl-fuzz

NULL point derefence in Strcopy_charp_n (Str.c)

# w3m -version
w3m version w3m/0.5.3+git20161120, options lang=en,m17n,image,color,ansi-color,mouse,menu,cookie,ssl,ssl-verify,external-uri-loader,w3mmailer,nntp,ipv6,alarm,mark
# w3m -T text/html -dump $FILE
ASAN:DEADLYSIGNAL
=================================================================
==21801==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000021 (pc 0x7f0e31c37e84 bp 0x000002db0000 sp 0x7fff93db1870 T0)
==21801==The signal is caused by a READ memory access.
==21801==Hint: address points to the zero page.
    #0 0x7f0e31c37e83 in GC_free /tmp/portage/dev-libs/boehm-gc-7.4.2/work/gc-7.4.2/malloc.c:496
    #1 0x7164d1 in Strcopy_charp_n /tmp/w3m-0.5.3-git20161120/Str.c:173:2
    #2 0x57f56d in push_nchars /tmp/w3m-0.5.3-git20161120/file.c:2542:2
    #3 0x57f56d in push_render_image /tmp/w3m-0.5.3-git20161120/file.c:2622
    #4 0x62fd6b in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1956:6
    #5 0x62876b in renderCoTable /tmp/w3m-0.5.3-git20161120/table.c:1668:2
    #6 0x62d6a8 in renderTable /tmp/w3m-0.5.3-git20161120/table.c:1812:5
    #7 0x5b2020 in HTMLlineproc0 /tmp/w3m-0.5.3-git20161120/file.c:6452:3
    #8 0x5c1dc6 in completeHTMLstream /tmp/w3m-0.5.3-git20161120/file.c:7022:2
    #9 0x5663b3 in loadHTMLstream /tmp/w3m-0.5.3-git20161120/file.c:7258:5
    #10 0x57aebd in loadHTMLBuffer /tmp/w3m-0.5.3-git20161120/file.c:6781:5
    #11 0x56fadf in loadSomething /tmp/w3m-0.5.3-git20161120/file.c:224:16
    #12 0x56fadf in loadGeneralFile /tmp/w3m-0.5.3-git20161120/file.c:2241
    #13 0x5106a5 in main /tmp/w3m-0.5.3-git20161120/main.c:1017:12
    #14 0x7f0e3067261f in __libc_start_main /var/tmp/portage/sys-libs/glibc-2.22-r4/work/glibc-2.22/csu/libc-start.c:289
    #15 0x41bd08 in _start (/usr/bin/w3m+0x41bd08)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /tmp/portage/dev-libs/boehm-gc-7.4.2/work/gc-7.4.2/malloc.c:496 in GC_free
==21801==ABORTING

Testcase: https://github.com/asarubbo/poc/blob/master/00081-w3m-nullptr-Strcopy_charp_n

SInce the last call is in GC_free, do you think that the issue resides in libgc.so instead of w3m ?

segfault due to dereference near-null pointer in formUpdateBuffer

Input:

00000000: 3c74 6162 6c65 3e3c 623c 3e00 3c6c 6973  <table><b<>.<lis
00000010: 7469 6e67 3e3c 696e 7465 726e 616c 3e3c  ting><internal><
00000020: 696e 7075 743e                           input>

gdb trace:

Program received signal SIGSEGV, Segmentation fault.
0x000000000044f596 in formUpdateBuffer (a=0x7e0000, buf=0x7d3e00, form=0x7dff80) at form.c:474
474             col = COLPOS(l, a->start.pos);
(gdb) p l
$1 = (Line *) 0x0
(gdb) bt
#0  0x000000000044f596 in formUpdateBuffer (a=0x7e0000, buf=0x7d3e00, form=0x7dff80) at form.c:474
#1  0x000000000044ed05 in formResetBuffer (buf=0x7d3e00, formitem=0x7da360) at form.c:272
#2  0x000000000042c773 in loadHTMLBuffer (f=0x7fffffffd130, newBuf=0x7d3e00) at file.c:6778
#3  0x0000000000416a90 in loadSomething (f=0x7fffffffd130, loadproc=0x42c61d <loadHTMLBuffer>, defaultbuf=0x7d3e00) at file.c:224
#4  0x000000000041c8ae in loadGeneralFile (path=0x7c3b00 "to-report/1159", current=0x0, referer=0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>,
    flag=0, request=0x0) at file.c:2245
#5  0x0000000000407121 in main (argc=5, argv=0x7fffffffd458, envp=0x7fffffffd488) at main.c:1020

This is found by afl-fuzz

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.