google / go-flow-levee Goto Github PK

License: Apache License 2.0

Go 99.45% Shell 0.55%

go-flow-levee's Introduction

Go Flow Levee

This static analysis tool works to ensure your program's data flow does not spill beyond its banks.

An input program's data flow is explored using a combination of pointer analysis, static single assignment analysis, and taint analysis. "Sources" must not reach "sinks" without first passing through a "sanitizer." Additionally, source data can "taint" neighboring variables during a "propagation" function call, such as writing a source to a string. Such tainted variables also must not reach any sink.

Such analysis can be used to prevent the accidental logging of credentials or personally identifying information, defend against maliciously constructed user input, and enforce data communication restrictions between processes.

User Guide

See guides/ for guided introductions on:

How to configure and run the analyzer
More topics coming soon!

Motivation

Much data should not be freely shared. For instance, secrets (e.g, OAuth tokens, passwords), personally identifiable information (e.g., name, email or mailing address), and other sensitive information (e.g., user payment info, information regulated by law) should typically be serialized only when necessary and should almost never be logged. However, as a program's type hierarchy becomes more complex or as program logic grows to warrant increasingly detailed logging, it is easy to overlook when a class might contain these sensitive data and which log statements might accidentally expose them.

Technical design

See design/.

Configuration

See configuration/ for configuration details.

Reporting bugs

Static taint propagation analysis is a hard problem. In fact, it is undecidable. Concretely, this means two things:

False negatives: the analyzer may fail to recognize that a piece of code is unsafe.
False positives: the analyzer may incorrectly claim that a safe piece of code is unsafe.

Since taint propagation is often used as a security safeguard, we care more deeply about false negatives. If you discover unsafe code that the analyzer is not recognizing as unsafe, please open an issue here. Conversely, false positives waste developer time and should also be addressed. If the analyzer produces a report for code that you consider to be safe, please open an issue here.

For general bug reports (e.g. crashes), please open an issue here.

Contributing

See CONTRIBUTING.md for details.

Developing

See DEVELOPING.md for details.

Disclaimer

This is not an officially supported Google product.

go-flow-levee's People

Stargazers

Watchers

Forkers

purelyapplied immutablet mlevesquedion laashub-sda leozz37 vinayakankugoyal neotim guodongli-google jordanbent jbampton shmsr isabella232 iamjimyes chressie test-mass-forker-org-1 timothy-king ghas-results 5l1v3r1

go-flow-levee's Issues

Detect incorrect sanitization by value

We may wish to be able to detect incorrect uses of sanitization by value. Consider the following sanitizer:

func Sanitize(s Source) Source {
    s.Data = "<redacted>"
    return s
}

Now consider the following incorrect use of this sanitizer:

func Oops(s Source) {
    Sanitize(s)
    Sink(s)
}

The issue is rather obvious: since Sanitize receives a copy of s, the original s is not sanitized.

Although the issue is obvious, we currently do not produce a report for it. For additional context, here is the correct way to use a "value" sanitizer:

func Value(s Source) {
    san := Sanitize(s)
    Sink(san)
}

The reason the incorrect case above does not yield a report is that our checks to determine whether a source was sanitized before reaching a sink rely on domination of the sink instruction by the sanitizer instruction. This allows us to handle cases like this:

func Pointer(s Source) {
    Sanitize(&s)
    Sink(s)
}

So it seems we will need some way to treat sanitizers differently depending on whether they take their argument by pointer or by value.

Detect field assignment to zero-value as sanitization

Consider the following:

type Source struct {
  secret string // `levee:"source"`
}

func Bar(s Source) {
  s.secret = ""
  Sink(s)  // This should not emit a Diagnostic.
}

Explicitly setting to zero-values all fields containing source data should be recognized as sanitization.

Handle calls to sinks in a recovery block

This issue replaces some TODOs that have been in the code for some time now:
levee.go:50

			if b == fn.Recover {
				// TODO Handle calls to sinks in a recovery block.
				continue // skipping Recover since it does not have instructions, rather a single block.
			}

source.go:323

		if b == fn.Recover {
			// TODO Handle calls to log in a recovery block.
			continue
		}

Currently, if a sink is called in a function's Recover block, we will not catch it. This is an issue, since such a sink call may leak source data, just like any other sink call.

Use propagation logic from the main analyzer in the fieldpropagator analyzer

Currently, the analysis performed by the fieldpropagator analyzer to determine if a given method on a Source type is a Field propagator is very basic, and it only works for simple getters. As just one example, the following method is not identified as a Field propagator:

func (s Source) DerivedFromData() string { // want DerivedFromData:"field propagator identified"
	derived := fmt.Sprintf("%s", s.data)
	return derived
}

The main analyzer would identify derived as being tainted by s.data, but the fieldpropagator analyzer does not. Currently, we do not propagate taint to the return value of a method on a Source type unless the method is a Field propagator. This means that we would not find a potential leak if it involves a Field propagator like the one above.

I think both analyzers should use the same propagation logic.

(This will also apply to interprocedural analysis when we get there.)

Improve `isProducedBySanitizer` to detect more cases of values produced by a call to a sanitizer

Consider this function:

func TestSanitizedSourceDoesNotTriggerFindingWhenTypeAsserted(s *core.Source) {
	sanitized := core.Sanitize(s)[0].(*core.Source)
	core.Sinkf("Sanitized %v", sanitized)
}

The relevant part of the SSA graph is:

The issue here is that the TypeAssert is not identified as being produced by a sanitizer according to isProducedBySanitizer. We still don't produce a report here because the current propagation code finds the sanitizer and determines that the sinked value is dominated by the sanitizing call.

This incorrect behavior may lead to false positives.

Support "reverse" propagation through Store instructions

Consider this case:

func Test(str *string, src core.Source) {
	fmt.Sscan(src.Data, &str)
	core.Sink(str) // TODO() want "a source has reached a sink"
}

If you're not familiar with Sscan (I wasn't), it essentially splits its first argument (a string) on whitespace and places the resulting strings into its second argument (which has type ...interface{}). (See here for an example of using the closely related Sscanf function).

So, clearly, if the string is tainted, the varargs slice can also be tainted. This slice is implicit, however, and to construct it, in SSA form a slice is made and the arguments are placed into it via Store instructions. Since for Stores we only propagate from the value being stored to the storage location, we can't currently handle this case.

One solution would be to allow taint to propagate backwards through stores. This is likely to cause false positives, however.

Allow customization of reporting messages

It would likely be useful for users to be able to customize all or part of the report message. Currently, the report message is generic: <sink position>: a source has reached a sink: <source position>. This is an adequate message in general, but users may wish to add something to it to make it more actionable, e.g. by specifying what a developer who sees the message should do about it: (this is just an example) The sensitive value at <source position> must be sanitized before reaching the call at <sink position>.

Fix bugs involving sanitization

This issue describes some bugs involving sanitization that could lead to false positives.

Consider the following case:

func TestTaintedInIfButSanitizedBeforeIfExit() {
	var e interface{}
	if false {
		e = core.Source{}
		e = core.Sanitize(e)[0]
	}
	core.Sink(e)
}

With the current code, e is not considered sanitized when it reaches core.Sink. This is because:

Traversal goes through the sanitizer, which gives a path to core.Sink.
The call to core.Sanitize does not dominate the call to core.Sink, because one can get from var e interface{} to core.Sink(e) without going through the loop. However, in that path e is not tainted.

For this first case, correct behavior can be obtained by stopping traversal upon reaching a sanitizer. The fact that the sanitizer has been reached still needs to be recorded though, so that for other cases involving pointers, domination can be checked.

However, even if we don't traverse through the sanitizer, the following case still poses an issue:

func TestPointerTaintedInIfButSanitizedBeforeIfExit() {
	var e interface{}
	if false {
		s := &core.Source{}
		core.SanitizePtr(s)
		e = s
	}
	core.Sink(e)
}

In this case, even if we stop traversing when we hit SanitizePtr, there is still a path from s, to e (via e = s), to core.Sink. Then, because the call to core.SanitizePtr does not dominate the call to core.Sink, e is not considered sanitized.

I have opened a PR in an attempt to address this: #154. However, I am not sure how to fix the second bug without making the code much more convoluted. I also feel like many additional tests would be needed.

go core.Sink(source) does not create report

False negative report

Consider the following test which is expected to create a report but does not:

func Oops(source core.Source) {
    go core.Sink(source) // want "a source has reached a sink"
}

(We are assuming that core.Source has been configured as a source and core.Sink has been configured as a sink.)

Reconsider traversal through Allocs

We currently do not traverse through Allocs, except when they are slice-like:

source.go

	case *ssa.Alloc:
		// An Alloc represents the allocation of space for a variable. If a Node is an Alloc,
		// and the thing being allocated is not an array, then either:
		// a) it is a Source value, in which case it will get its own traversal when sourcesFromBlocks
		//    finds this Alloc
		// b) it is not a Source value, in which case we should not visit it.
		// However, if the Alloc is an array, then that means the source that we are visiting from
		// is being placed into an array, slice or varags, so we do need to keep visiting.
		if _, isArray := utils.Dereference(t.Type()).(*types.Array); isArray {
			s.visitReferrers(n, maxInstrReached, lastBlockVisited)
		}

At the time that this was implemented, avoiding Allocs was allowing us to avoid many false positives. With changes to the traversal code over time, it seems that this check may need to be revisited, as it can cause false negatives:

func TestPointerToNamedStructIsTainted(s core.Source, i core.Innocuous) {
	// This test is failing because &x introduces an Alloc,
	// and we don't traverse through non-array Allocs
	colocateInnocPtr(s, &i)
	core.Sink(i) // TODO(212) want "a source has reached a sink"
}

At the time of writing, only a single false positive is obtained:

func TestSelectSourceAndInnoc(sources <-chan core.Source, innocs <-chan core.Innocuous) {
	select {
	case s := <-sources:
		core.Sink(s) // want "a source has reached a sink"
	case i := <-innocs:
		core.Sink(i) // an undesired report is produced here
	}
}

In this case the incorrect behavior is likely not traversal through Allocs, but a mishandling of the SSA instructions associated with select statements.

Calling a function with a struct that has a tainted field should taint every field

The following test cases were introduced by #195 and are currently failing:

func TestCallWithStructReferenceTaintsEveryField(h Headers) {
	fooByPtr(&h)       // without interprocedural assessment, foo can do anything, so this call should taint every field on h
	core.Sink(h.Name)  // TODO want "a source has reached a sink"
	core.Sink(h.Other) // TODO want "a source has reached a sink"
}

func TestCallWithStructValueDoesNotTaintNonReferenceFields(h Headers) {
	foo(h) // h is passed by value, so only its reference-like fields should be tainted
	core.Sink(h.Name)
	core.Sink(h.Other) // TODO want "a source has reached a sink"
}

type Headers struct {
	Name  string
	Auth  map[string]string `levee:"source"`
	Other map[string]string
}

func fooByPtr(h *Headers) {}

func foo(h Headers) {}

The crux of the issue here is that since one of Headers's fields is a source, that source will be in scope inside the caller and so in theory it could taint every other field. Without interprocedural analysis, we have to assume that every field will be tainted.

In fact, we have to do this inference whenever a struct with a tainted field is passed to a call, not just when a field is a source.

See #195 for additional discussion.

Determine correct propagation behavior for ambiguous cases

PR #178 added explicit handling for each node type.

For most node types, it is obvious what the correct behavior is, e.g. when doing a MapUpdate, we shouldn't be traversing to the Key or the Value, only the Map.

For other node types, the correct behavior is not obvious.

Here are the cases that need further investigation, and likely, more tests:

Handle propagation via source methods

Consider the following (slightly contrived) test case:

func TestPropagationViaSourceMethod(s core.Source) {
	str := s.Propagate(s.Data)
	core.Sink(str) // TODO want "a source has reached a sink"
}

Currently, we do not produce a report in the above case. This is because of the following snippet in propagation.go:

// This is to avoid attaching calls where the source is the receiver, ex:
// core.Sinkf("Source id: %v", wrapper.Source.GetID())
if recv := t.Call.Signature().Recv(); recv != nil && sourcetype.IsSourceType(prop.config, prop.taggedFields, recv.Type()) {
	return
}

As far as I know, this check is to avoid incorrectly propagating through safe getters such as Source.GetID(). If a getter is not safe, it should be identified by the fieldpropagators analyzer, which will cause the Call to be identified as a source in source.go.

However, as seen in the test case at the beginning of this issue, this does not take into accounts methods on source types that can propagate taint received via one of their arguments. This could cause false negatives, and is inconsistent with how we handle calls in general : if a tainted argument goes in, we assume that whatever comes out is tainted.

Allow for partial taint propagation and sanitization through collections

Currently, any interaction between a source and a collection taints the entire collection. Explicit access of non-source elements of a tainted collection should not necessarily be considered tainted. For example:

func Foo(s Source, key string) {
  m := map[interface{}]interface{} {"one": 1, "two": 2}
  m["source"] = s  // this taints m
  Sink(m)          // this emits a Diagnostic, as well it should
  Sink(m["one"])   // this currently emits a Diagnostic, because m is tainted, but const access of this non-Source element should not necessarily emit.
  Sink(m[key])     // this is ambiguous, so this should emit a Diagnostic

  m["source"] = nil  // Related to #92, the propagated taint has been removed
  Sink(m)            // since the only taint has been removed, this should no longer emit Diagnostic.

For reference: #92

Slices must be handled carefully, as the positions can shift if a user is using a slice as, e.g., a queue. See SliceTricks for potential pitfalls. Any ambiguity should propagate taint rather than halting it.

Deleting a key from a tainted map should not taint the key

Context

func TestDeletingFromTaintedMapDoesNotTaintTheKey(key string, sources map[string]core.Source) {
	delete(sources, key)
	// TODO: no report should be produced here
	core.Sink(key) // want "a source has reached a sink"
}

Discussion

The simplest way to handle this would be to check if the delete's associated Call's StaticCallee is the Builtin function delete, and avoid propagating if that is the case.

Struct fields should be taintable

The following test case was introduced by #195 and is currently failing:

func TestTaintFieldOnNonSourceStruct(s core.Source, i *core.Innocuous) {
	i.Data = s.Data
	core.Sink(i)      // TODO want "a source has reached a sink"
	core.Sink(i.Data) // TODO want "a source has reached a sink"
}

This behavior is inadequate: It should be possible to taint a field on a non-Source struct, and tainting this field should taint the struct.

The following test case is also failing:

func TestTaintNonSourceFieldOnSourceType(s core.Source, i *core.Innocuous) {
	s.ID, _ = strconv.Atoi(s.Data)
	core.Sink(s.ID) // TODO want "a source has reached a sink"
}

It should be possible to taint a non-source field on a source type.

This is related to this piece of propagation code:

		if !prop.config.IsSourceField(typPath, typName, fieldName) && !prop.taggedFields.IsSourceFieldAddr(t) {
			return
		}

Currently, we stop traversing when we reach a field unless the field is a Source.

See #195 for additional discussion.

Tainting an interface value should not taint the contained value

This issue relates to this test case:

func TestTaintingInterfaceValueDoesNotTaintContainedValue(s *core.Source, str string) {
	var x interface{} = str
	colocate(s, x)
	// TODO(209): no report should be produced below
	core.Sink(str) // want "a source has reached a sink"
}

func colocate(s *core.Source, x interface{}) {}

See #200 for additional context.

Adopt some manner of cross-function labeling to identify interprocedural propagation into non-source types.

Similar to #98, consider the following:

func Sinker(st string) {
  Sink(st)
}

func Foo(s Source){
  Sinker(s.String())
}

During analysis of Sinker, the concrete type of st is not concerning to reach a sink. However, without also configuring Sinker as a sink, the taint propagation from s -> st := s.String() -> Sink(st) is lost.

Some preliminary work has been done in #57, #58, #60.

Enforce strict config unmarshalling

In order to minimize the risk of configuration errors, we would like the unmarshalling of the configuration to be strict. Specifically, it should disallow unknown fields in matchers. This avoids errors where a typo in a key name leads to an incorrect configuration. We don't want typos to lead to silent failures, e.g. a matcher is broken and the user thinks their code is safe when it is not.

There are already two test cases covering this behavior to some extent, in config/matcher_test.go:

		{
			desc: "Unmarshaling is strict",
			yaml: `
Blahblah: foo
PackageRE: bar`,
			shouldErrorOnLoad: false, // TODO true
		},

(It is the same case, in the tests for two different matchers.)

Add configuration validation / sanity-checks

When #87, #88, #89 are implemented, a user may be able to configure, e.g., a sink belonging to a package that is listed outside the configured scope. Some level of validation should safeguard against user error.

Improve error reporting when config is missing

Currently, when the analyzer is used through go vet, if no config flag is provided or the provided path is invalid, the analyzer produces this error message:

levee: failed prerequisites: fieldtags, source

This message does not provide any hints as to its cause, which will likely leave users scratching their head. (I know I've scratched my own head more than once when dealing with this issue.)

Additional context

If the config can't be read, the fieldtags and source analyzers will fail:

// fieldtags/analyzer.go
conf, err := config.ReadConfig()
if err != nil {
	return nil, err
}

Then, in the analysis package, running levee fails because its dependencies failed:

// Report an error if any dependency failed.
var failed []string
for _, dep := range act.deps {
	if dep.err != nil {
		failed = append(failed, dep.String())
	}
}
if failed != nil {
	sort.Strings(failed)
	act.err = fmt.Errorf("failed prerequisites: %s", strings.Join(failed, ", "))
	return
}

Note that when the analyzer is run directly (i.e. not through go vet), the error message is much better:

error reading analysis config: open config.yaml: no such file or directory

This message may still be a bit confusing to users who did not specify a config at all. The open config.yaml part is due to the fact that the config flag defaults to config.yaml.

Implement understanding of formatting verbs

Currently, the following can cause a report:

core.Sinkf("%T", source)

There is actually no issue in the above example: the type of a sensitive value is not sensitive.

We can likely borrow some logic from the printf analyzer.

If we want to be especially diligent, we could go through all the formatting verbs and determine how to handle each one. From a cursory glance, I think the relevant ones are %T, and maybe %s and %q. For these two, we could e.g. do some kind of analysis of the appropriate String() function for the corresponding argument.

Proposal for testdata convention - spoof source root with go.mod to assist IDEs

Problem statement

analysistest expects testdata to be a fake ~~GOROOT~~GOPATH. We currently have various tests at .../testdata/src/example.com/... to test the various analyzers.

IntelliJ expects our project to be configured with Go modules. This fake root is opaque to IntelliJ. As a consequence, the IDE can't resolve the linkage for packages within testdata. E.g.,

This conflict makes development in of tests sticker than it needs to be. Additionally, while I believe I'm the only one who does this, but it also prevents general execution of an analyzer from the command-line. I am partial to using ssadump when investigating issues, but this is prevented in any file with an unresolved import.

Proposal

I propose adding a trivial go.mod to each testdata/src/<some shared root> so that our IDEs can resolve these paths. See #273 and associated branch for an example within config/testdata. This allows proper linking within IntelliJ while retaining test compatability.

For compatability with the builder within analysistest, we would each testdata/src/... to be self-contained. Cross-package imports are permitted, but not cross-module imports. This requires a shared root within testdata/src/. #273 demonstrates this as testdata/src/github.com/. In practice, we should avoid collision of go modules and adopt adopt testdata/src/<identifier>, e.g. testdata/src/levee_configtest/....

I'd love to hear your thoughts about adding this as a convention going forward. At your convenience, please pull my pseudo-module-testing branch and see if this works with your own workflow. (I had to execute config_test to trigger compiling and linkage.)

Update documentation - contributing / reporting false positives

Document in README that users are welcome to open an Issue when they experience unexpected behavior, false positives or false negatives, etc.

Enable exclusion of analysis by filename (rather than only package)

Currently, configuration permits exclusion of functions from levee analysis (but not upstream analyzers) via the Exclude: clause. However, a consumer may wish to exclude analysis based on filename. In particular, consumers that use a vendor/ directory may wish to exclude analysis and possible findings from packages outside their control. These vendor/ packages do not contain vendor in the package's path, so exclusion would currently need to be done on a per-package basis. While functional, it presents a potential pain point for consumers.

Refine identification of sources, sinks, and sanitizers

A user may wish to configure identification of sources, sinks, and sanitizers based on many criteria beyond package, type, and method names.

See previous discussion for context.

Identification of sources, sinks, and sanitizers

Much of our work is directly with ssa.Values and ssa.Instructions.
Whether those values or instructions represent one element or another,
the configuration to identify these members should be as uniform as possible.
In that light, specification should follow:

ValueSpecifier

A ValueSpecifier marks an ssa.Value, e.g., for identification as a source or as the safe > result of a sanitizer.
Values should be specifiable via any or all of:

Specifier name: For identification during reporting

Type / Field: Package path and type name, optional field name

Field tags

Scope: local variable, free variable, or global

Context: within a specific package or function call. See CallSpecifier below.

Is a reference

Is "reference-like," e.g., slices, maps, chans.

Const value (e.g., "PASSWORD")

Argument position (for use below)

CallSpecifier

A CallSpecifier marks an *ssa.Call, e.g., as a sink or sanitizer.
Calls should be specifiable via any or all of:

Specifier name: For identification during reporting

Symbol: package path, function name, optional receiver name.

Context: A CallSpecifier indicating scope, such as package or specific functions.

Based on argument value and position (ValueSpecifier).

Handle methods on non-struct source types

This issue was prompted by discussion on #264.

Currently, specifying a source type does not require specifying a field, so any named type can be identified as a Source in the configuration (using the package and type keys). Although we do not know of any users leveraging this currently, it is easy to imagine how/why users might want to configure sources that are not structs:

type Secret string

type Secret interface {
        ASecret()
}

Any named type can have methods declared on it. For structs, sensitive information is ultimately stored in fields. Since propagating taint to the results of any source method was too coarse, the fieldpropagator analyzer was introduced to identify methods on struct sources that could return values tainted by a sensitive field. Consider, e.g.:

type Source struct {
        SomethingPublic string
        SomethingPrivate string `levee:"source"`
}

func (s *Source) DoesNotReturnATaintedValue() string {
        return s.SomethingPublic
}

func (s *Source) ReturnsATaintedValue() string {
        return s.SomethingPrivate
}

In the first case, the return value isn't tainted (more on this later), because the field being returned is not a source. We do want the return value of the second method to be tainted, because it returns a source field. We have tests validating this behavior.

Currently, the behavior with respect to methods on Source types is:

If the method is a fieldpropagator, then it always returns a tainted value, and it will get its own propagation (i.e., the call will be identified as a source and a propagation will start from there).
If the method is not a fieldpropagator, we only propagate to the return value if one of the arguments (excluding the receiver, which is the Source value whose method is being called) is tainted.

For methods on values that are not Sources, we always consider the returned values to be tainted, no matter if the value itself is tainted or not. Since we don't scrutinize these methods' bodies at all, we assume that they always propagate taint, just like we do for regular functions.

Coming back to this statement:

If the method is a fieldpropagator, then it always returns a tainted value, and it will get its own propagation (i.e., the call will be identified as a source and a propagation will start from there).

If a source type has no fields, then its methods will not be identified as field propagators. However, I believe that methods on non-struct source types should always propagate taint. Indeed:

Non-structs, non-interfaces are just values, so it is reasonable that they are used to derive the return value of whatever method is defined on them.
For interfaces, we don't know what the value could be, so we should assume the worst.

Given this reasoning, I believe that we should:

Mark all methods on non-structs as fieldpropagators.
Rename the fieldpropagator analyzer to accurately reflect this behavior. Maybe something like methodpropagator.

Further discussion:

At the time of writing the fieldpropagator analyzer (and before), we were concerned about false positives. In general, there's no harm in a simple getter that returns a non-source field. However, in some cases it is possible that non-source fields may be tainted, which effectively turns any method that returns that field into a propagator. Continuing with the previous example:

func Oops(s *Source) {
        s.SomethingPublic = s.SomethingPrivate
        Sink(s.DoesNotReturnATaintedValue()) // OOPS, actually, it *does* return a tainted value, and it just reached a sink
}

Currently, we do not have a good way of modeling something like this. The fieldpropagator analyzer merely identifies methods as being propagators or not. It does not say which fields could be incorporated in any return values. In any case, the propagation code does not precisely model the taint status of fields either, because different FieldAddrs referring to the same field are different ssa.Values, so tainting one reference to a field does not taint the others.

Furthermore, consider this other example:

func (s *Source) MoreOops() {
        s.SomethingPublic = s.SomethingPrivate
}

This behavior would also be very difficult to model using our existing abstractions.

Ultimately, what I believe this points to is:

The fieldpropagator analyzer is currently doing a very limited form of IPA (interprocedural analysis), which only captures whether or not the function always returns a value derived from a source field. Implementing a more complete form of IPA (e.g., function summaries describing the behavior of functions in more detail) will likely help alleviate some of the issues outlined here.
We may benefit from some kind of abstraction over ssa values.

Enable specification of analysis scope

Currently, analysis is executed on all provided packages, as well as the transitive closure of all Go package dependencies.

A user may consider third-party packages, generated code to be out-of-scope, or specifically sanctioned packages out of scope for analysis. While upstream analysis on these may be necessary (e.g. buildssa), we should bypass our analysis of such specified packages as much as possible.

Ideally, this bypass should occur prior to entry into the various analyzer.Analyzers to reduce the size of the transitive dependency graph.

Handle recursive fieldpropagators

The fieldpropagator analyzer currently does not identify fieldpropagators that get their fields from another fieldpropagator, e.g. :

type Source struct { secret string }

func (s Source) Secret() string {
    return s.secret
}

func (s Source) FormattedSecret() string {
    return "the secret is: " + s.Secret()
}

The above example is admittedly a bit frivolous. I'm not sure how common these are.

The analysis is fact-based, so if a method has already been analyzed, its status as a fieldpropagator can be checked by calling pass.ImportObjectFact(obj, &isFieldPropagator{}).

Make identifying field tags configurable

Currently, we only examine for the tag levee:"source". The source tag should allow this to be user-configurable.

Update documentation - promises and goals

The following need to be included or updated in our documentation.

Explicitly document what taint propagation we do and do not attempt to track.
Document existing inference, i.e., given a source type Foo, what other types do we implicitly consider sources?
State that we do not handle reflection explicitly, though some cases may be handled by general taint propagation.

Add benchmarks

Particularly as we examine cross-function and whole-program analysis, it will be useful to be able to benchmark our overall runtime. In particular, anecdotal evidence suggests that the construction of the SSA graph by buildssa.Analyzer constitutes the majority of our runtime. It would be good to be able to put actual data to such claims, identify any significant performance regressions, and evaluate the runtime costs of future work.

Improve the granularity of taint propagation in collections

Currently, when a tainted value is placed in a collection, the entire collection is considered to be tainted. This is not unreasonable considering that if a collection containing a tainted value reaches a sink, the tainted value will almost certainly "reach the sink" as well, since e.g. printing a collection implies printing its contents.

This does have the unfortunate side effect that any key access on the collection will yield a value that is considered to be tainted. For example, analyzing the following instructions yields a diagnostic, even though the value that reaches the sink is not really tainted:

m := map[string]interface{}{"source": s}
core.Sink(m[""]) // diagnostic: "a source has reached a sink"

This sequence of instructions, in which a map is properly "sanitized" (ahem...) also yields a diagnostic:

m := map[string]interface{}{"source": s}
delete(m, "source")
core.Sink(m) // diagnostic: "a source has reached a sink"

A similar "sanitization" effect could be obtained by setting a slice or map key to nil.

We would like to be able to keep track of which keys/values in a collections are actually tainted, in order to avoid producing false positives.

Do not propagate taint through a channel that is tainted in one select branch and read from in another branch

Consider the following test case:

func TestTaintedAndSinkedInDifferentBranches(objects chan interface{}) {
	select {
	case objects <- core.Source{}:
	case s := <-objects:
		// TODO(211) want no report here, because objects is only tainted if the
		// other branch is taken, and only one branch can be taken
		core.Sink(s) // want "a source has reached a sink"
	}
}

Given the semantics of select, one of two things can happen:

We put a source on the objects channel
We get an interface{} value from the objects channel and we sink it

As indicated by the TODO, it is therefore not possible that the channel be tainted and that a tainted value is obtained from it in the same call to this function.

However, this code may still be unsafe, because the fact that one of the cases puts a source on the channel indicates that the channel may contain sources in general. Consider e.g. a case where this function is called concurrently in two different goroutines. Such a case would be unsafe code. Handling this behavior in full generality within the scope of a single function would require us to model every interaction with the channel. Alternatively, this could be handled via interprocedural analysis (#99).

This issue is somewhat related to #161 and #210, in the sense that it is related to a behavior where we may want to leverage

Improve handling of suppression comments in nested calls

#283 is adding the ability to suppress reports at call sites using a code comment. In most cases, the suppression behaves as expected. There is one case where the (lack of) suppression behavior is surprising:

	// TODO(#284): we don't actually want a report here
	fmt.Println(
		core.SinkAndReturn(s), // levee.DoNotReport // want "a source has reached a sink"
	)

This is because in the comment map, the // levee.DoNotReport comment is (unexpectedly) associated with the s identifier, not the core.SinkAndReturn call.

Since s is an argument to core.SinkAndReturn, we could find the identifier by traversing the tree under the CallExpr that corresponds to core.SinkAndReturn and find the suppressing comment on the identifier. However, looking for suppressing comments on the arguments may cause misleading behavior, as in other cases it would suggest that it is possible to suppress a report for a specific argument, which isn't the case.

See #283 (comment) for additional discussion.

Here is a bit of code that can be used to dump the comment map for a file, which may be handy in investigating this issue.

package main

import (
	"fmt"
	"go/ast"
	"go/parser"
	"go/token"
	"os"
	"sort"
)

func main() {
	fset := token.NewFileSet()
	f, err := parser.ParseFile(fset, os.Args[1], nil, parser.ParseComments)
	if err != nil {
		panic(err)
	}
	// type CommentMap map[Node][]*CommentGroup
	cm := ast.NewCommentMap(fset, f, f.Comments)
	nodes := make([]ast.Node, 0, len(cm))
	for node := range cm {
		if node.Pos() == f.Pos() {
			continue
		}
		nodes = append(nodes, node)
	}
	sort.Slice(nodes, func(i, j int) bool { return nodes[i].Pos() < nodes[j].Pos() })
	for _, node := range nodes {
		comments := cm[node]
		fmt.Printf("%#v\n", node)
		fmt.Println(fset.Position(node.Pos()).Line)
		for _, c := range comments {
			fmt.Println(fset.Position(c.Pos()).Line)
			fmt.Println(c.Text())
		}
	}
}

Revisit tests involving source interface propagation

Bug report

Describe the bug
While iterating #278, I noticed that internal/pkg/levee/testdata/test-config.yaml will pass all tests with the following Source specification removed:

Sources:
  # ...
  - Package: "levee_analysistest/example/core"
    Type: "SourceManipulator"

This was introduced in #264.

To Reproduce
Remove or comment out the above configuration. Observe that tests pass. See #278 first test run for demonstration.

Adopt pointer or other package to identify passing sources as interface parameters.

A key limitation at present is in the following:

func Sinker(obj interface{}) {
  Sink(obj)
}

func Foo(s Source){
  Sinker(s)
}

During analysis of Sinker, the concrete type of obj is not known to be a Source. The pointer package could be used to produce a PointsTo set of possible concrete types which could reach Sink(obj).

Because whole-program analysis can be expensive, configuration to enable/disable this feature should be provided.

Determine which Basic types can't meaningfully be tainted

Currently, we are avoiding Booleans, because it is quite clear that those can't meaningfully be tainted. The are other Basic types we may want to avoid though, among the list of types.BasicKinds:

type BasicKind int

const (
	Invalid BasicKind = iota // type is invalid

	// predeclared types
	Bool
	Int
	Int8
	Int16
	Int32
	Int64
	Uint
	Uint8
	Uint16
	Uint32
	Uint64
	Uintptr
	Float32
	Float64
	Complex64
	Complex128
	String
	UnsafePointer

	// types for untyped values
	UntypedBool
	UntypedInt
	UntypedRune
	UntypedFloat
	UntypedComplex
	UntypedString
	UntypedNil

	// aliases
	Byte = Uint8
	Rune = Int32
)

(this is from go/types/type.go)

For additional context, see this discussion: #178 (comment)

Expose an extensible point for custom identification.

A user may have highly-specific considerations outside the scope of configuration we provide, perhaps interacting with ssa, types, or library-specific packages. We should consider exposing an entrypoint between our analysis steps for users to provide their own identification methods.

If there is user interest in this feature, please respond here with a comment or your choice of affirming :reaction:.

Leverage type information provided by type assertions

This issue is related to #161.

Consider the following test case:

func TestPanicTypeAssertSource(i interface{}) {
	s := i.(core.Source)
	_ = s
	// The dominating type assertion would panic if i were not a source type.
	core.Sink(i) // TODO(210) want "a source has reached a sink"
}

Clearly, the only way that the call to core.Sink can be reached is if i holds a value of type core.Source. The above code is therefore unsafe.

The following case is more ambiguous, and currently we do not expect a report to be produced:

func TestOkayTypeAssert(i interface{}) {
	s, ok := i.(core.Source)
	_, _ = s, ok
	// The dominating type assertion will not panic.
	core.Sink(i)
}

Since the , ok form is used, it is possible that the type assertion failed and that i is in fact safe to sink. However, there is certainly a possibility that i contains a value of type core.Source, and therefore this code should perhaps be considered unsafe.

Source graph should continue beyond sanitizers.

Currently, when building a graph of a source we stop following the graph when a sanitizer is encountered.
However, there is not guarantee that such sanitizer will dominate the yet to be discovered sinks.
Therefore, we should build the graph beyond sanitizers.
Once we know all the sinks, we could trip the graph by removing sinks that are dominated by a sanitizers.

Infer as a source type any type that contains a source type as a field

Consider the following:

type Source struct {
  secret string // `levee:"source"`
}

type Container struct {
  ID int
  content *Source  // This field should implicitly be marked similarly as a source
}

While explicit instantiation c := Container{ID: 10, content: Source{secret: "my-token"}} should be currently detected by taint propagation to the struct's field, this context may be lost outside the instantiation of the wrapper.

Improve reporting for Sources that don't have a Pos()

Related issue

This issue falls under the umbrella of #101.

The issue

Some ssa Instructions, for whatever reason, return token.NoPos when asked for their position (instruction.Pos()). If such an instruction is identified as a Source and we produce a report for it, the report will say ... source: -, which is rather unhelpful.

Context

I first noticed this when running levee on kubernetes. A report was being produced where a Source was an ssa.Extract instruction. According to my investigations, (ssa.Extract).Pos() always returns NoPos.

I wrote a small analyzer to find which instructions sometimes return NoPos, and which ones always return NoPos. Running on kubernetes, these are my findings:

The following instructions had no position at least 1 time:

*ssa.Alloc
*ssa.BinOp
*ssa.Call
*ssa.ChangeType
*ssa.Convert
*ssa.Field
*ssa.FieldAddr
*ssa.IndexAddr
*ssa.MakeClosure
*ssa.MakeInterface
*ssa.Panic
*ssa.Phi
*ssa.Return
*ssa.Slice
*ssa.Store
*ssa.TypeAssert
*ssa.UnOp

Many of these instructions could be Sources. (Some clearly could not). I have not deeply investigated why these instructions sometimes have a position and sometimes don't. At this time, I am not convinced that it is worth investigating.

The following instructions had no position every time:

*ssa.ChangeInterface
*ssa.Extract
*ssa.If
*ssa.Jump
*ssa.Next
*ssa.RunDefers

In my opinion, it would be sufficient to handle Extracts, since none of the other instructions could be Sources, except maybe Nexts, but these will always occur with related Extracts (I am 90% confident of this).

Summary

I think we should improve the reporting at least for Extracts, and possibly for other instructions too. For other instructions, I believe a deeper investigation would be needed to ensure that special handling is actually required, i.e. that other instructions could simultaneously 1. be identified as Sources, and 2. have no position.

For Extracts, I believe a reasonable approximation would be to use the position of the associated Tuple. PR here: #107

Detect sources produced by non-CommaOk TypeAssert of a Pointer type.

Consider this test case:

func TestSourcePointerAssertedFromParameterEface(e interface{}) {
	s := e.(*core.Source)
	core.Sink(s) // TODO want "a source has reached a sink"
}

The SSA is:

func TestSourcePointerAssertedFromParameterEface(e interface{})
0: entry
	0(*ssa.TypeAssert     ): t0 = typeassert e.(*example.com/core.Source)
	1(*ssa.Alloc          ): t1 = new [1]interface{} (varargs)
	2(*ssa.IndexAddr      ): t2 = &t1[0:int]
	3(*ssa.MakeInterface  ): t3 = make interface{} <- *example.com/core.Source (t0)
	4(*ssa.Store          ): *t2 = t3
	5(*ssa.Slice          ): t4 = slice t1[:]
	6(*ssa.Call           ): t5 = example.com/core.Sink(t4...)
	7(*ssa.Return         ): return

It it were a CommaOk TypeAssert, there would be an Extract, so the Source would be detected in sources.go:sourcesFromBlocks.
If it were a TypeAssert to a Source value (not a pointer), there would be an Alloc, so again the source would be detected in sources.go:sourcesFromBlocks.

Allow suppression of false positives via code comments.

This is a continuation of #87.

At time of writing, report of diagnostics can be suppressed at a function-level granularity. While functional, this has potential to suppress a wider breadth of reports than intended.

Suppression via code comment, e.g. nolint:levee <Justification> would allow fine-grained suppression by immediately preceding a sink call. Optionally, we may consider suppression of source identification or taint propagation.

Infer from a source type any defined type using the source type as the underlying type

Let's deconstruct that Issue title. Consider the following:

type Source struct {
  secret string // `levee:"source"`
}

type Foo Source

The type Source will be identified as a source due to the field tag. The type Foo has the underlying type Source, and should also be identified as a source type without explicit configuration.

In this instance, the underling struct literal contains a field tag, which takes part of type identity. As such, Foo.secret also contains the field tag and will be identified as a source type.

However, consider instead

type Source struct {  // Suppose this source type is identified by package path and name regexp
  secret string
}

type Foo Source

Because Source is matched by regexp, and if that regexp does not also match Foo, then Foo will not be currently be considered a source. This should be corrected by this Issue.

Implementation detail: if the underlying struct literal were marked as a source type, rather than the named type Source, this would propagate more easily.

Leverage type information available in type switch.

Within a type switch, non-default blocks have definite type information that remains obfuscated by interface. That information could be carried within that block and those dominated by that case entry for better reporting. This could possibly also be applied to type assertions, though nontrivial effort would be required to support typ, ok := foo.(bar) style type assertions.

A sample of (suppressed) test cases will be introduced by #149.

Improve Diagnostic reporting

At present, reporting of Diagnostics is a generic a source has reached a sink message with source and sink positions included. This can be useful in simple cases, but can be opaque in the event that a source has reached a sink via taint through many propagating steps.

While a full trace of propagation steps may be unnecessary, diagnostic reporting should be improved to the point that a developer reading a diagnostic does not need familiarity with SSA, taint propagation, or go-flow-levee to remediate the failure. For instance, a more general "<source name and position> has exceeded its data boundary by reaching <sink name and position>. Arguments to <sink name> must be sanitized."

[FALSE NEGATIVE]: go panic(source) and defer panic(source)

False negative report

levee treats panic as a sink but does not create a report for the cased below when AllowPanicOnTaintedValues=false (default config)

func Oops() {
    go panic(Source{"sensitive data"})
}

func OopsIDidItAgain() {
    defer panic(Source{"sensitive data"})
}

(We are assuming that Source has been configured as a source and Sink has been configured as a sink.)

Provide a mechanism by which false positives may be suppressed.

If analysis is integrated to become a blocking test, users will be dismayed if a false positive blocks progress. It should be possible to bypass these errors.

This could be done via an explicit allow-list (inclusive-)or in-code comment in the // NOLINT style. The former should perhaps be favored for the possibility of out-of-scope findings, e.g. a finding in a user's third-party dependencies or in generated code.

google / go-flow-levee Goto Github PK

go-flow-levee's Introduction

Go Flow Levee

User Guide

Motivation

Technical design

Configuration

Reporting bugs

Contributing

Developing

Disclaimer

go-flow-levee's People

Stargazers

Watchers

Forkers

go-flow-levee's Issues

False negative report

Context

Discussion

Additional context

Problem statement

Proposal

Identification of sources, sinks, and sanitizers

ValueSpecifier

CallSpecifier

Bug report

Related issue

The issue

Context

Summary

False negative report

Recommend Projects

Recommend Topics

Recommend Org