johnsaigle / scary-strings Goto Github PK

Collection of wordlists containing dangerous function calls in many languages

License: GNU General Public License v3.0

Makefile 100.00%

source-code-analysis white-box-testing static-code-analysis bugbounty hacking infosec appsec pentesting penetration-testing penetration-testing-tools

scary-strings's Issues

Unicode issues

blorp

Traceback (most recent call last):
  File "/mnt/c/Users/JohnSaigle/bin/scary-strings/scary_strings.py", line 138, in <module>
    result = [scan_file(file_to_scan, function_names, language, args.scan_comments, comment_strings)
  File "/mnt/c/Users/JohnSaigle/bin/scary-strings/scary_strings.py", line 138, in <listcomp>
    result = [scan_file(file_to_scan, function_names, language, args.scan_comments, comment_strings)
  File "/mnt/c/Users/JohnSaigle/bin/scary-strings/scary_strings.py", line 51, in scan_file
    lines = list(map(str.strip, f.read().splitlines()))
  File "/usr/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 1021: invalid start byte

Improve python wordlist

the WAHH book woefully does not include a wordlist for python.

I've added some functions based on some blog posts I've read but the list is very small.

It would be great to increase support for Python here.

Add environment variables

aka steal this list: https://github.com/Puliczek/awesome-list-of-secrets-in-environment-variables/

Add entries for PHP wrappers

For example, this seems... risky
https://www.php.net/manual/en/wrappers.ssh2.php

Update python version

3.9 is fine. See if there's a way to specify e.g. 3.8 or higher

FileNotFoundError when running script outside of its path

Traceback (most recent call last):
  File "/home/jsaigle/.local/bin/scary-strings", line 134, in <module>
    args.path, build_list_of_file_extensions(language), build_list_of_exclude_dirs(language))
  File "/home/jsaigle/.local/bin/scary-strings", line 13, in build_list_of_file_extensions
    with open(filepath, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/jsaigle/.local/bin/extensions/php'

It's possible the script only works within its own directory right now. May need to use os.path.realpath() here to resolve the issue.

Add OS execution functions for all languages

Add make targets

Create a Makefile to:

Concatenate the wordlists into the all lists for each folder
Update the tree output in the README (this could be a Git pre-commit hook too)

Add C

e.g.

strcopy
gets

Cross reference with Rust semgrep support

https://github.com/semgrep/semgrep-rules/pull/2601/files

Examine comments as well

Comments are skipped in the current source code.

However, it could be useful to extend this tool to include scanning for TODO strings, e.g.:

TODO
FIXME
XXX
and other things that get highlighted in vim etc.

It would be good also to add a regex for strings inserted by editors. For example, VS Code has a JIRA integration that will prompt devs to attach their TODO message to a ticket. It auto-generates a URL to JIRA when it does this. It would be good to look for this and other similar things. Links to sites like GitHub, Jira, Redmine, etc. in general would be useful.

Some heuristic type words could be useful too. People sometimes write things like "This is broken and someone should fix it later" or "insecure, fix this after demo" or "check permissions on this later".

Add rust slice panics

e.g. https://doc.rust-lang.org/std/primitive.slice.html#panics-31

Look for more on this page

copy_from_slice
clone_from_slice
flatten_mut
swap
windows
chunks
chunks_mut
chunks_exact
chunks_exact_mut
as_chunks
as_chunks_mut
as_rchunks
- as_rchunks_mut
array_chunks
- array_chunks_mut
array_windows
rchunks
rchunks_mut
rchunks_exact
rchunks_exact_mut
split_at
split_at_mut
split_array_ref
split_array_mut
rsplit_array_ref
rsplit_array_mut
select_nth_unstable
- select_nth_unstable_by
rotate_left
rotate_right
copy_within
as_simd
repeat

Add some documentation about how to use the lists

Add examples for taking the list and using them in a clever way with grep or rg.

Linking to the gf repo would be a good idea too because it has good examples of grep patterns.

Add Go

tls.Config for cryptographic issues
Use of panic,
....

Add go unsafe

Like Rust, go has unsafe that allows devs to break memory safety.
Check also https://github.com/jlauinger/go-geiger

The reflect package and/or functions are also important

Scan for lines that contain ignore directives for linters

A lot of linting tools will allow you to write a comment that will tell the scanner to not throw a warning to the devs. This is usually a good place to look for bugs as it indicates a place where the devs have silenced a warning

Add Solana

Ideas:

invoke_signed (e.g. https://blog.neodyme.io/posts/solana_common_pitfalls/#arbitrary-signed-program-invocation)
anything to do with calling external programs
anything to do transferring funds or assets

'Malformed UTF-8' results in crash

Description

Occurred while scanning https://github.com/josecl/cool-php-captcha which contains an accented character here, possibly in other places also:

https://github.com/josecl/cool-php-captcha/blob/dd8ea199c3fb238ea73781479415646c2542470b/captcha.php#L3

Program output

===>>>>>   SCARY STRINGS   <<<<<===

Source code analysis tool, Copyright (C) 2017 by John Saigle
Analyse source code for potentially dangerous APIs, or 'scary strings'!
This is free software. <https://github.com/johnsaigle/scary-strings>
Scanning 5 files...
Malformed UTF-8
  in sub MAIN at scary-strings.p6 line 42
  in block <unit> at scary-strings.p6 line 121

Add support for more programming languages

Currently the only existing wordlists are for PHP. It would be good to at least extend this to Perl and Perl6.

Add additional wordlists for dangerous APIs
Add exclude folders for common third-party libraries for these languages

Rewrite this in Python because no one uses Perl 6, you nerd

Add Rust

unsafe
panic (and variants) -- https://halborn.com/dont-panic-how-improper-error-handling-can-lead-to-blockchain-hacks/
extern
::with_capacity -- https://doc.rust-lang.org/std/vec/struct.Vec.html#panics
vec!

Usage of .clone()

Add more scary strings based on useful blog posts

e.g. https://btlr.dev/blog/how-to-find-vulnerabilities-in-code-bad-words

Add regex functions for all languages (to indicate ReDoS)

Add Go SQL functions

Extract SQL function calls from popular Go libraries, like https://github.com/stripe-archive/safesql#how-does-it-work but without the SAST component

Packages listed in the above link:
https://pkg.go.dev/database/sql#DB
https://github.com/jinzhu/gorm
https://github.com/jmoiron/sqlx

Any others? That repo has not been updated for years so maybe there are new popular packages that people are using.

Add Javascript

dangerouslySetInnerHtml (React)
eval

...?

Comments-only mode

Refactor arguments so that a language does not need to be specified. This will allow the script to e.g. check only comments rather than function calls.

Write output to a results/ directory with dates

Add PHP's randomness functions

e.g. mt_rand() is not actually random and shouldn't be used in security. This would be a good one to flag.

A new wordlist should probably be created called 'randomness' containing this, its aliases, and related functions.

Add some examples in the README

Some scenarios where this is useful and perhaps some sample input and output.

Add more linting directives

Solium/Ethlint https://github.com/duaraghav8/Ethlint

Add Solidity

Ideas:

selfdestruct
anything that moves assets, e.g. ERC20 and NFT functions to approve/send/transfer
transfer vs call
delegatecall

It would also be a good idea to check slither and the code4rena auto-audit tool for other function calls

Add PRNG functions for all languages

Add a list for generic secrets

Here's an example of some stuff: https://github.com/tomnomnom/gf/blob/master/examples/sec.json

asymmetric key pairs would be a good example, e.g. RSA PRIVATE and equivalents for other algorithms

If there are common patterns for API keys for various services that would be great too. Check the cloud version of hacktricks

Add calls to weak crypto functions

e.g.

md5 should basically never be used
anything to do with mersenne-twisters
ECB mode (may be difficult to grep for)

Add web application error messages

à la this file in SecLists: https://github.com/danielmiessler/SecLists/blob/master/Pattern-Matching/errors.txt

Most of this repo is intended to be used in white-box source code analysis but a list of error messages like this would be a useful thing to import into Burp, for example. You could load this up and have Burp flag HTTP responses containing these strings.

Note that the list above is several years old and most of the errors seem to be related to .NET and PHP. It could be good to extend this to other technologies like Express, Go, and weird Java tech like Tomcat

Add Cosmos

Ideas

BeginBlocker and EndBlocker -- should be reviewed for code that can panic
Any module function from the SDK that involves transferring funds
UNIX time functions
Grants and authorizations

It should be non-overlapping with the Go wordlists. A user can run both Go and Cosmos rules on a Cosmos project

Add Java

Provide list of files not to scan

Would be nice to be able to pass a list of files to exclude. For example I scanned a project recently that had a giant wordlist that was in a .php file. This won't contain the dangerous API calls that this script wants to find, but will slow the execution significantly.

Add concurrency to speed up results

The script must check every line in a file against every other line in a wordlist, and do this for all files. This takes a lot of time for longer files or against the default wordlist.

Adding concurrency could help speed up the results here.

Provide detailed help message on the command-line

Replace "Help message will go here" with real help message

Add Java

CSV report gets messed up when "Line of Code" cell contains commas

This is happening because of naive splitting on commas on a line of code.

The simplest solution would probably be to change the output to a TSV file instead of a CSV file.

TSVs still might contain problems if a line of code contains tab characters, but everybody uses spaces instead of tabs anyway, right?

Create script to generate "all" wordlists

I've been copying and pasting function names as I add them. This is error-prone and easily automated by concatenating all the other wordlists together.

Add LOTP strings

https://boostsecurityio.github.io/lotp/
https://github.com/boostsecurityio/lotp

Basically these can cause RCE in specific contexts. Could be interesting and simple to add the Go/Python examples here

Use ASTs to represent scanned code instead of going line-by-line

Currently the project scans each file into memory and examines every single line using a regex. This is fine for now but probably not very efficient and is error-prone.

It would be interesting to try to use something like PHP-AST to parse the code in an intelligent way and scan only the function names (or alternatively just the comments along with #11). It might be more efficient and would make this tool much more robust.

Add C++

https://snyk.io/blog/top-5-c-security-risks/
https://snyk.io/blog/unintimidating-intro-to-c-cpp-vulnerabilities/
https://snyk.io/blog/exploring-3-types-of-directory-traversal-vulnerabilities-in-c-c/

e.g.
printf
gets
...
and more classics

It might be good to structure this to be non-overlapping with the C wordlist. It would be possible to audit a C++ project by concatenating some combination of C and C++ lists that way

johnsaigle / scary-strings Goto Github PK

scary-strings's Issues

Description

Program output

Recommend Projects

Recommend Topics

Recommend Org