Git Product home page Git Product logo

is593-language-based-security's People

Contributors

hyunsooda avatar jeehoonkang avatar kihongheo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

is593-language-based-security's Issues

[Homework 4] `of_allocsite` in the module `Location`

Hi.

Currently, of_allociste in the 84th line of domain.ml has one unused parameter, and it seems to need to be removed.

  let of_allocsite _ a = Allocsite (Label.make a)

->

  let of_allocsite a = Allocsite (Label.make a)

Thank you.

[Homework 4] `cmp` of `POWSET_DOMAIN`

Hi.

Currently, the signature of cmp in POWSET_DOMAIN is the following:

val cmp : Llvm.Icmp.t -> t -> t -> t

However, I believe that the result of abstract cmp should be an abstract integer. Could you consider changing the definitions of POWSET_DOMAIN and cmp?

The following seems to work, though I am not familar with OCaml:

diff --git a/src/domain.ml b/src/domain.ml
index abd69ba..2dbe93c 100644
--- a/src/domain.ml
+++ b/src/domain.ml
@@ -44,12 +44,13 @@ end

 module type POWSET_DOMAIN = sig
   module Elt : SET
+  module Numerical : NUMERICAL_DOMAIN

   include Set.S with type elt = Elt.t

   include CPO with type t := t

-  val cmp : Llvm.Icmp.t -> t -> t -> t
+  val cmp : Llvm.Icmp.t -> t -> t -> Numerical.t

   val filter : Llvm.Icmp.t -> t -> t -> t
 end
@@ -509,8 +510,9 @@ module Interval : INTERVAL = struct
     )
 end

-module PowSet (Elt : SET) : POWSET_DOMAIN with type Elt.t = Elt.t = struct
+module PowSet (Elt : SET) (Numerical : NUMERICAL_DOMAIN) : POWSET_DOMAIN with type Elt.t = Elt.t = struct
   module Elt = Elt
+  module Numerical = Numerical
   include Set.Make (Elt)

   let bottom = empty
@@ -538,13 +540,13 @@ module PowSet (Elt : SET) : POWSET_DOMAIN with type Elt.t = Elt.t = struct
     F.fprintf fmt "}"
 end

-module LocSet = PowSet (Location)
-module FunSet = PowSet (Function)

 module type VALUE = sig
   include CPO

   module Numerical : NUMERICAL_DOMAIN
+  module LocSet : POWSET_DOMAIN with type Elt.t = Location.t
+  module FunSet : POWSET_DOMAIN with type Elt.t = Function.t

   val make : Numerical.t * LocSet.t * FunSet.t -> t

@@ -565,6 +567,8 @@ end

 module Value (Numerical : NUMERICAL_DOMAIN) : VALUE = struct
   module Numerical = Numerical
+  module LocSet = PowSet (Location) (Numerical)
+  module FunSet = PowSet (Function) (Numerical)

Thank you.

[Homework5] segmentation fault causes example1.ll to become empty

Hi. After my first attempt at the implementation, I tested my code using ../debloater ./example1.sh example1.ll, and it somehow resulted in a segmentation fault, probably because of some mistake in my code.
On the next call of ../debloater ./example1.sh example1.ll, however, it returned The test script initially failed and upon checking I found that example1.ll had only the first two lines left.
Since example1.ll is there but just wrong, and example1.c contains no changes make does not fix up example1.ll, causing me to delete and rerun make.

My guess is that the segmentation fault is the cause, but it's very difficult to
cd .. make cd test rm -rf example1.ll make ../debloater ./example1.sh example.ll every single time in order to debug the code.
Is there any way to fix this behavior?
Thank you.

[HW6] How can I make the top-down algorithm faster?

Hi. My current top-down algorithm finds a solution at 6 mins 30 secs for example3, which doesn't feel like a "safe" time, since we should terminate in 10 mins for all test cases. This feels especially long compared to bottom-up, which takes under three seconds for every case.

I've tried various pruning/optimization techniques, such as 1) restricting the grammar in the unrolling stage, 2) creating a "simplifying" algorithm to simplify the program before checking equivalence / adding to the queue, 3) depth comparisons to "balance" unrolling, 4) changing most of the code and API calls to be tail-recursive as well as 5) the basic equivalence checking for various mathematically equivalent structures.
However, more sophisticated algorithms don't help the execution time very much, they often increase the time instead because of the added complexity.

Could you give some tips on how to shorten the execution time (which optimizations to focus on, which to ignore, etc.)? Would it be better to focus on improving the equivalence checking instead of trying these new approaches, like the original algorithm from the slides? Or are there other pruning techniques that I should explore?

Also, I'm curious about the bottom-up algorithm being so much faster, and whether it is possible to achieve such an execution time with the top-down algorithm.

Thank you.

[Homework 6] Travis CI failed

Here is the job. It fails when opam init

But I expect that maybe evaluation of my code will be OK because it works well on my local environment.

[Homework 3] About double_loop.interval test case

The code of double_loop test case is:

#include "homework.h"

int main()
{
    int ave = 0;

    for(int i = 1; i < 100; i++)
    {
        for(int j = 0; j < 100; j++)
        {
            int k = i + j;
            int w = i - j;

            int z = k / w;

            if(i >= 10)
            {
                int m = k / i;
            }
        }
    }
}

double_loop.interval.expected is:

Potential Division-by-zero @ advanced/double_loop.c:main:14:23, %sub = [-98, +oo]

However, I got the result:

Potential Division-by-zero @ advanced/double_loop.c:main:14:23, %sub = [-98, 99]

and it seems it's more precise than the expected range [-98, +oo].
Anyone can help me about this case?

[Presentation] Next speakers

Hi,

It was very interesting to see all your work. I am looking forward to see other interesting work next week.

Next speakers are

  • 김현수
  • Mehdi
  • 박지희
  • 조경민
  • 신현수

A few things:

  • Please finish your presentation within 10 min. I don't want to strictly cut your presentation after 10 min but it would better to have enough time for QnA.
  • When you embed example code in slides, please make sure it has enough size and resolution. Embedding screenshots are usually not a good choice. You can embed text instead, or have a high resolution and readable screen shot.

Thanks.

[Homework 4] Narrowing operator for values

Hi again.
Currently, the PowSet functor returns a failure with the narrowing operator.
In the case of Sign domain that doesn't support narrowing, we can just widen and terminate and not narrow at all, but for Interval domain that is not possible.
Since a value contains Numerical, LocSet and FunSet, how can I define a proper narrowing operator for Values?
Thank you.

[Homework5] grading question

Hi. Right now my debloater generates all results correctly, except for the "predecessor block" info that llvm automatically generates.
For instance, dune test gives me:

-if.then:        ;preds = %entry
+if.then:       ;preds = %entry, %entry

Is this okay? I'm using the given delete function, so I can't think of a way to update that info as well.

Also, is it okay to leave in debug code, since hw5 does not grade the program output but instead grades the debloated .ll file? I experienced some behavior changes when adding/removing print code (weirdly enough, sometimes adding a print saved me from seg fault..!) so I would like to leave it this way if possible.
Thank you.

[Homework 4] Section 4 in Homework document

In Section 4 of Homework.pdf, it says that abstract value is a pair of abstract integer, abstract stack memory address, and abstract heap memory address (Z#, P#, and H# respectively).

However, I believe the abstract value is a pair of abstract integer, abstract address, and abstract function. So instead of Z#, P#, and H#, should it be Z#, A#, and F#?

Thanks.

[Homework 4] Abstract semantics of a null dereference

Hi.

In homework 4, what is the abstract semantics of a null dereference? If the concretization of eval_E(m) contains a null pointer, then does x := load E make the abstract value of x become Top, like division by zero in the previous homework? In addition, what does happen when there is a store operation to a null pointer? If the concretization of eval_E1(m) contains a null pointer, then does store E0 E1 make the abstract values of all the variables become Top?

Thank you.

[Homework 4] a few more questions about function call

Hi. Sorry for the repetitive questions, but I'm facing many problems with function call.

I have two questions.

  1. Should we check the arity of functions and the calls? That is, how should we treat the case where f(int x) is called with two parameters?
  2. How can we get the label for the parameter of f? By using Llvm.operand, I only get the constant values that are passed as an argument. However, I want to get the x in f(x), and assign a value to this x. I thought Llvm.param could do that, but I'm getting segmentation faults even when the index seems to be within range.

Thank you.

[Homework 4] Format of input programs

Hi,
In the readme, there is a small mistake:

All values can have i32, i32*, i1, or function types in LLVM IR (i.e., no floating points, array, structures, enums, etc). You can ignore other types of values.

An input C program can have arbitrary-depth of pointers like int ****p.
If your implementation follows the formal definitions of the abstract semantics, I believe this does not change your implementation at all.

[Homework 4] new question about worklist algorithm

Hi.
In the narrowing phase of the worklist algorithm, what happens if there are no previous labels that can reach the current working label w, like for the first instruction of the main function?
Screen Shot 2020-05-13 at 8 36 30 AM

As far as I can guess, since there are no l that transition to w, at line 10 it assigns the empty set (bottom) to m_new, and since the if condition is satisfied, it applies narrowing. However, since our narrowing is meet, this results in the new m_new going to bottom as well.

This is exactly what my code is doing at the moment. I made a temporary fix by returning m_old when m_newis bottom, but I'm not sure if this is correct. Can you help me?

Thank you.

[Opam] opam commands not working in VM

Hi. It's my first time using the provided VM via kvpn. I'm encountering some issues with opam while trying to install ocamlformat and merlin.
opam install [something] and opam update opam upgrade all fail with the following output.

opam: "open" failed on /home/student/.opam/is593-4.08.0/.opam-switch/lock: Permission denied
Backtrace:
  Raised by primitive operation at file "src/core/opamSystem.ml", line 845, characters 13-73
  Called from file "src/core/opamFilename.ml" (inlined), line 372, characters 33-82
  Called from file "src/state/opamSwitchState.ml", line 99, characters 4-70
  Called from file "src/state/opamSwitchState.ml", line 797, characters 11-33
  Called from file "src/state/opamRepositoryState.ml", line 265, characters 14-18
  Re-raised at file "src/core/opamStd.ml", line 1164, characters 4-38
  Called from file "src/state/opamGlobalState.ml", line 162, characters 14-18
  Re-raised at file "src/core/opamStd.ml", line 1164, characters 4-38
  Called from file "src_ext/cmdliner/src/cmdliner_term.ml", line 27, characters 19-24
  Called from file "src_ext/cmdliner/src/cmdliner.ml", line 27, characters 27-34
  Called from file "src_ext/cmdliner/src/cmdliner.ml", line 106, characters 32-39
  Called from file "src_ext/cmdliner/src/cmdliner.ml", line 136, characters 18-36
  Called from file "src_ext/cmdliner/src/cmdliner.ml", line 251, characters 22-48
  Called from file "src/client/opamMain.ml", line 209, characters 8-15
  Called from file "src/client/opamMain.ml", line 136, characters 6-10

opam list works fine, and I can check that everything else is pre-installed correctly.
How can I fix this issue? Thank you.

[Final] Announcement

The final exam will happen on Jun 29 (9 - 10:30, including submission) over Zoom. It will be an open-book exam.

  • Instructions:
    • Checkout the repository for final exam: https://classroom.github.com/a/za8czkx-. Sign the document (regarding honor code and personal information collection) and submit before the exam.
    • Turn on your camera during the exam. The Zoom sessions will be recorded for preventing cheating.
    • Example papers will be distributed via Zoom's screen share during the session.
    • Submit your answer sheet to the GitHub repository after the exam. (e.g., upload photos of your handwritten answer sheets) The detailed instructions are described in the repository.
    • no coding questions
    • I will show you the exam paper via Zoom's screen sharing

For someone who is in a different time zone at this moment, we will have a separated session on Jun 29 (14 - 15:30). The detailed instructions will be the same.

If you have any further concerns and suggestions for fairness, feel free to contact me.

[Homework 4] how to handle join of callgraph in worklist algorithm

The worklist algorithm in Lecture 6 joins all memory states that can happen after a transition to current label w.
Screen Shot 2020-05-11 at 2 59 12 AM
However, since the transfer function of our ThriLLVM analyzer returns a pair (callgraph, (location, memory) list), so it seems like we may need to join the returned callgraphs as well.
Is this necessary? Or can we just take the current callgraph and transition according to that?
Thank you.

[Project] Typo in slides

Hi.
Currently, the slides for the project say that the presentation is 6/17, 6/22, 6/24 at one point and 6/15, 6/17, 6/22 at another. Which one is correct?
Thank you.

Safety condition 2 of sparse analysis

Michael and Hyunsoo asked an important question regarding the safety condition during and after today's class. So let me summarize that again here.

Q: How to compute D#pre that satisfies the condition?

In general, spurious defs are indistinguishable from real defs before the main analysis.
But in practice, we can compute safe yet precise def set and use set as follows:

  • If there is definitely no spurious def: for example x = y + 1, x is defined and we know that this is definitely a real def. In this case, the set of spurious defs is the empty set which is obviously a subset of the approximated use set. I.e., def#pre = {x}, use#pre = {y}.

  • If there exist indistinguishable elements: for example *x = *y + 1. In such a case, we can simply define use#pre to subsume def#pre. I.e., use#pre includes all elements in def#pre. Then obviously the condition holds.

[Homework 6] Is there any guideline for hidden test cases?

After implementing top-down algorithm as #41, I'm still not sure that this is enough to pass all the tests, including bottom-up algorithm. I think my bottom-up algorithm may suffer from test cases with many rows, and there may be no exception for other students.

I think it would be better if there's a guideline for other test cases. For example, the maximum number of examples in one csv file, or the maximum number of terms in answers.

Definition of ℙ?

I think ℙ = AllocaSite × 𝕀 so that every alloca in b1 yields the same ptr in each iteration.

while (1) {
    int i;
}
b1:
    %i = alloca i32
    br label %b1

But the specification suggests that it might not be the case. What is the correct definition of ℙ?

Memory is allocated; a pointer is returned. The allocated memory is uninitialized, and loading from uninitialized memory produces an undefined value. The operation itself is undefined if there is insufficient stack space for the allocation.’alloca’d memory is automatically released when the function returns. The ‘alloca’ instruction is commonly used to represent automatic variables that must have an address available. When the function returns (either with the ret or resume instructions), the memory is reclaimed. Allocating zero bytes is legal, but the returned pointer may not be unique. The order in which memory is allocated (ie., which way the stack grows) is not specified.
https://llvm.org/docs/LangRef.html#id201

[Homework 5] Dependency in shell scripts and a minor issue in README

I see problematic dependency in given shell scripts. (test/example*.sh) These scripts assume that target example*.ll file is located at the command execution path.

My suggestion is adding cd "{0%/*}" at the beginning of each shell scripts. For example, example1.sh will become:

#!/bin/bash
cd "${0%/*}"

NAME=example1

clang -o ${NAME} ${NAME}.ll >&/dev/null && ./${NAME}

Then, the output (without implementation of ddmin) becomes:

Iteration 1, # Instrs = 10
-- running dd_global
Fatal error: exception Failure("Not implemented")

My idea is fixing shell scripts in a way that is agnostic to the path being executed (by forcing cd to desired working directory). ref


Also, in README, I think Function is rather Global according to the code (src/debloat.ml):

[Homework 6] Segmentation fault in compilation of output .ll

In example 1, I came up with a program which is printed as follows:

If ((X == Y), If ((Y <= X), X, 1), If ((X == Y), 1, ((Y - 0) + X)))

Although it seems to satisfy the spec of example1.csv, the output .ll file (example1.topdown.ll) causes segmentation fault in compilation with clang.

Is there any limitations in converting pseudo-program (Language.t) to LLVM IR?

[Project] Announcements

Hi all,

I hope you enjoy your project. The project is due on this Sunday. Here are some announcement.

Submission

Please submit the following things in your private repository:

  • Source code
  • README.md: brief description about your info (name + student id), prerequisites, build commands, run commands, etc
  • test examples: under the test directory + test commands
  • Proposal and slides (pdf files): under the doc directory.

For slides, it is okey to upload after your presentation.

Presentation: 10 min + 3 min QnA

Please prepare a microphone and a camera. I will give you access to screen share.

Evaluation

Below is the evaluation sheet:

  • Motivation & Problem:

    • Is the problem well-motivated? [0-5]
    • Is the problem well-designed? [0-5]
    • Is the problem novel? [0-5]
  • Approach & Result:

    • Is the approach complete? [0-5]
    • Are the results significant? [0-5]
    • Is the approach promising? [0-5]
  • Presentation:

    • Is the flow logical and interesting? [0-5]
    • Are the speech and slides clear and readable? [0-5]
    • Are questions answered clearly? [0-5]

[Project & Lecture10] Abstract domains for relational analysis

Hi. I'm currently experiencing some difficulties in defining the abstract domain for my project.
My topic incorporates a type of (directed) relational analysis, and I want to ask how to define abstract domains for such analyses.

I think the octagon domain in Lecture 10 could be a good example to help grasp the intuition. How would one define the abstract domain of the octagon domain? It seems like there's a value for (x,y) instead of a value for x as usual, so I don't understand how the set of integers is mapped into this 2-dimensional domain.

[Homework 4] initial value of Allocsite

Hi. I have a question.

What should be initial value of Allocsite?
For example, what is the value of *a in following code?

#include "homework.h"

int main() {
  int* a = malloc(); // a -> l, l -> ?
  int k = 10 / *a; // error?
  return 0;
}

I guess (Bot, {}, {}) (Top, {}, {}), and ([0,0], {}, {}) are reasonable choices for initial value, but I'm not sure of it.

[Homework 3] Naive fixed point algorithm worklist

Hi, I'm Sungu Lee. I'm now trying working analysis by Naive fixed point algorithm, which seems a worklist algorithm but everytime worklist does not change.
So I tried with D.Worklist.fold trying to gather all of memories in the next step. I wanted to execute program with order (from start to end with respect to the input program) but D.Worklist.fold does not loop Worklist in program order.
So I was wondering if I have to gather "previous labels" even with Naive algorithm, or is there a good way to make sure D.Worklist.fold does loop in correct order.
Thanks!

Memory operation type

The type declarations of memory operation in MEMORY_DOMAIN seem to be wrong.
For example, the signature of weak update doesn't make sense because it accepts a single location.

val weak_update : Location.t -> Value.t -> t -> t

I think this and other functions should get LocSet or Value.t instead of Location.t.

Also the definition 𝕄♯ = 𝔸♯ → 𝕍♯ should be updated to the definition similar to the one in the lecture7 p10.

[General Llvm Question] Confusion between label and instruction

While doing homeworks, I'm still quite confused about difference between label and instruction. It seems like both uses "At_end", or "Before something" for finding the positions.
Are they actually a same thing in OCaml implementation, or is there a subtle difference between them?

Thanks.

[Project] Presenters

Hi,

We will have 5 (17th), 5 (22nd), and 6 (24th) presenters for each day.

I randomly picked up 5 presenters for Wednesday this week:

17th: 안승민, 이선구, 김은진, 최원우, 홍재민

Please leave an ack as a comment here once you read this announcement.

Thanks.

[Homework 3] if_for_if.interval testcase

In the given expected result of the if_for_if.interval analysis, it has got
Potential Division-by-zero @ advanced/if_for_if.c:main:23:31, %call = [0, +oo]
as one of the outputs.

However, my understanding is that since the outer for loop does comparison (i < x), x must be greater than zero (as i has initial value of zero) and so the division-by-zero cannot occur.

I believe this issue is related to one of the questions asked by "Jaehwang Jung" http://klms.kaist.ac.kr/mod/ubboard/article.php?id=347956&bwid=174013
. In other words, I think it is related the issue of filtering both sides when comparisons consist of variable at both sides.

1

[Homework 4] Different definition of abstract Value domain

Currently, V# is defined in Z# * A# * H#.

But LLVM IR is strongly typed, and all type conversion has explicit instructions.

Since we didn't implement type conversion instructions such as inttoptr,
we can assume that there is no type conversion in given IR, and it means that all variables can't be two or more values of Numerical, Location, Function simultaneously.

If the assumption is true, why do we use Z# * A# * H#, instead of Z# ∪ A# ∪ H#?

[Random Program Generator] not working

Hi. I can't seem to get the generator to work. After running make, the "generator" executable is generated. However, running ./generator 5 5 > example.csv returns the error zsh: no such file or directory: ./generator. What am I doing wrong here?

I posted an issue on the generator repo, but realized other students probably weren't following the repository. Has anyone encountered the same problem?

[Homework 4] function and parameters of call

Hello.
I tried to use Llvm.operand to get the function and the parameters of a call instruction, but the operand order doesn't seem to be as I anticipated. (for example 5, operand 0 was the parameter and operand 1 was the function. I'm not sure for cases with more parameters.)
Is there a better way to get the function and the parameters in a call instruction?

[Homework 4] How to define abstract semantics for powerset?

Hi. I'm a little confused with the semantic functions for powerset.
According to the lecture notes, all semantic functions are defined as kind-wise liftings.
However, as far as I understood, pointwise lifting defines a Galois connection for a function to domains connected with Galois connection.
So I'm still confused to how the semantic functions for an element, say cmp for Location, extends to the semantic function for the powerset, in this case cmp for P(Location). Can this also be explained by pointwise lifting? Because if so, I cannot see how.
Thanks.

[Project] Delayed Submission

A student asked me if a delayed submission is acceptable.

Yes. Delayed submissions are acceptable and the penalty is the same as usual.

[Homework 4] Abstract Domains in Documentation

1

I'm confused with the abstract domains given in the documentation.

According to the skeleton code, it seems

  1. M# = A -> V#
  2. V# = Z# x A# x F#
  3. A# = powerset({null} U X U P U H)

Am I missing something?

[HW5] The test script initially failed

Hi. I'm currently getting the error "The test script initially failed" when running ./debloater test/example1.sh test/example1.ll. However, as I understand, this error comes before my code for ddmin is run, so my code is not the cause. This same error happens for all 3 test cases.
What am I doing wrong? I ran make in both the root and the test directory.

[Project] proposal guidelines

Hi. Are there any recommended quidelines regarding the proposal? i.e) what should be in it, how detailed it should be, etc. Also, is it sufficient to submit a 1-page pdf? Thank you.

[Homework 4] correct and precise `filter` function in `Semantics`

In my opinion, filter function should applied recursively.

For example,

%a = alloca i32
%b = load i32, i32* %a
%cmp = icmp eq i32 %b, 0
br i1 %cmp, label %label1, label %label2

then should filter the Location (Variable %b), but also filter the Location (Alloca %a).

Implementing this feature will make analyzer more precise, but I don't know how to achieve it.
Is this possible with only current and next label?

[Homework 4] callgraph information to Semantics.filter

Hi.
Currently, the Semantics.filter function has signature Llvm.llvalue -> bool -> Memory.t -> Memory.t. However, since the function applies update based on the predicate, it seems like it might need the callgraph information to determine whether it should apply strong update or weak update.
What am I missing here?
Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.