Git Product home page Git Product logo

comp461905's Introduction

ELF dynamic linker

A course project for the latter part of COMP461905:Operating Systems, Proudly at XJTU.

To make

git clone https://github.com/Qcloud1223/COMP461905.git
cd COMP461905
make

To get you hands on this project, check instructions/ for a step-by-step tutorial.

Evaluate your code

To review your submission as a whole, use:

# you may have to run it with python3 if python points to python2
python autograder.py

To debug individual test case, use:

python autograder.py test-index

Asking questions

If you have any question about the code and environment, please raise an issue.

For questions about grading or something else, feel free to shoot me an e-mail.

Beyond this project

This project is meant to be supportive material for Computer Systems: A Programmer's Perspective, Chapter 7: Linking.

You don't have to be enrolled in our course to use this project. As a contrast, it is designed to be self-explanatory for learners all over the world (and thus in English).

Dynamic linking and loading is quite under-documented, and I hope this could make some improvement to it.

comp461905's People

Contributors

qcloud1223 avatar

Stargazers

Ruitian Zhong avatar 李万 avatar  avatar GC Toph avatar Daniel Gu avatar Eason Wang avatar  avatar MonKey Lee avatar Zihang Tang avatar iNx avatar  avatar TanZhendong avatar  avatar Hao Li avatar

Watchers

 avatar

comp461905's Issues

[problem] Offset and Alignment in ELF files

I use "readelf -l" to read the file(SimpleMul.so)
And the result is below

...
Program Headers:(there are 7 program headers, I just show some of them)
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x00000000000006f4 0x00000000000006f4  R E     20000
  LOAD           0x0000000000000e30 0x0000000000200e30 0x0000000000200e30
                 0x00000000000001f0 0x00000000000001f8  RW      20000
  DYNAMIC        0x0000000000000e48 0x0000000000200e48 0x0000000000200e48
                 0x0000000000000190 0x0000000000000190  R W     8
  NOTE           0x00000000000001c8 0x00000000000001c8 0x00000000000001c8
                 0x0000000000000024 0x0000000000000024  R       4
...

Here, we could see that the p_align of the first and the second(LOAD) item is 20000.
According to the blog https://blog.csdn.net/u011210147/article/details/54092405, we know that the align should be a power of 2, like 0x1000 etc. But here, 20000 is not a power of 2.
What's more, in the second item, the offset is 0x0..0e30, which is not a multiple of 4096. When we use mmap, the last argument--offset, must be the multiple of page size(4k), else we will get an error -- invalid argument(same problem in previous issue). In this case, should we align p_offset with page size by ourselves? By the way, there is no problem when I use mmap to load the first LOAD segment.

[bug] Instruction.md(test0 part)

In Instruction.md test0 part.
You provide a template code as below

// Elf64_Phdr *first_segment;
// int fd = open(path_to_library);
int prot = 0;
prot |= (first_segment->prot && PF_R)? PROT_READ : 0;  
prot |= (first_segment->prot && PF_W)? PROT_WRITE : 0;  
prot |= (first_segment->prot && PF_X)? PROT_EXEC : 0;  
// NULL means "allow OS to pick up address for you"
void *start_addr = mmap(NULL, ALIGN_UP(first_segment->p_memsz, getpagesize()), prot, 
     MAP_FILE | MAP_PRIVATE, fd, first_segment->offset);

Here, the value prot is used to get permission(R,W,X).
You should use '&' but not '&&' to get the right permission!

Test1: Wrong address returned by dlsym

As I use dlsym to find the address of string "puts", the address returned is 0x7ffff785eaa0
While the base_address in my program is 0x7ffff7ff5000, and regard to "puts" offset, the address should be 0x7fffff81f5018.
I do not know whether there's something wrong... The address returned is just an illegal one...
(ps, I got the right address of dyn)

mmap failed: Invalid argument

environment: VMWare20, Ubuntu 16

enconter "Invalid argument" error using mmap(), printed by perror().
I was iteraing through the Program Headers, trying to map some space for PT_LOAD type segments. I always get this error the second time I call mmap().
My code here:

# MapLibrary.c
# ...
Elf64_Phdr *phdr = malloc(sizeof(Elf64_Phdr));
void *addr = NULL;
for(int i=0; i<ph_num; i++){
    <redacted>
    
    if(phdr->p_type == PT_LOAD){
        int prot = 0;
        prot |= (phdr->p_flags && PF_R)? PROT_READ : 0;
        prot |= (phdr->p_flags && PF_W)? PROT_WRITE : 0;
        prot |= (phdr->p_flags && PF_X)? PROT_EXEC : 0;
        <redacted>
        DEBUG_PRINT(("addr: %d\n", addr));
        
    #...
    }
}

and I get output like this

(gdb) r
Starting program: /work/COMP461905/build/run-openlib ./test_lib/SimpleMul.so multiply 1 2 3
addr: -134262784
mmap failed: Invalid argument   #error here
addr: -1

I wonder if it is because the wrong argument as indicated, or some other reasons like the defacts of VMWare virtual environment.

Ubuntu 20.04 allocates virtual memory in a non-consecutive way

I had a problem similar to #10 (Also a SIGSEGV signal in setup_hash()). I tried to output how mmap maps each LOAD segment into virtual memory:
QQ图片20211220203425
As you can see, its default mapping starts from 0x7ffff7ffb000 but ends at 0x7ffff7fbf000. My understanding is that VirtAddr shows the offset in the virtual memory to the first segment, does that mean it may cause segment fault because dynamic address can't be calculated in the way that instruction tells me? Should I just hard-code the correct address, change the address parameter in mmap to force it to choose a consecutive memory spaces, or do something else to map virtual memory?

Detect abuse of dlopen and dlsym

A compromise I made during I writing this project is that a non-system dynamic linker is nearly impossible to correctly load glibc, so that I turn to dlopen and dlsym to resolve glibc symbols.

This could cause questions, for example:

// MapLibrary,c
void *MapLibrary(const char *libpath)
{
    LinkMap *l = malloc(sizeof(LinkMap));
    return (l->fakeHandle = dlopen(libpath, RTLD_LAZY));
}

// RelocLibrary.c
void RelocLibrary(LinkMap *lib, int mode)
{
    return;
}

In a word, one may call dlopen on every shared object and get away with the autograder.

I fully believe the students who are working on this project, and will still keep my grading policies. However, I do want to fix the imperfectness.

Possible solutions are as follows:

  1. Using hash-based method to keep 'fake load' code segment 'untainted'. The problem is that I want to restrict the way students write the code.
  2. Interpose dlopen. Using a wrapper to count how many times dlopen and dlsym are called, and print it to stderr. I can make sure how many times these functions need to be called. This seems alright, but needs to make sure Makefile is not modified.

(README) Ask questions via issues

I know that there are many bugs and defects in my code, and please feel free to directly ask anything about the project here. Your questions could be your fellow students'!

Before raising an issue, I humbly to ask you mind the following points:(In fact, the rules without an '*' apply when you raising issues in other Github repos and asking questions on the Internet)

  • Check the opened and closed issues. Is your question there?
  • Provide enough information about the cause, and even how to reproduce it.
  • Double check before you post any code and pseudo-code, and provide as less as possible. Do not spoil this project, alright?*
  • Only provide necessary help to your classmates(at least in the issues). Helping other students is encouraged because I may not be able to answer all your questions, but don't make it another step-by-step instruction. You don't want to watch a movie if you've known its plot.*
  • Keep your question consistent. Only adding more information, or asking follow-up questions if you think it is tightly related to this issue, otherwise you can open a new one.
  • Have a good title. Your title should be a brief summary of your question.

Plus, you can ask questions in both Chinese and English, though I prefer the latter:)

Thank you all who are working on this 🎉🎉🎉!

[test1 problem] Where to fill the symbol's address

I have read the instruction of test 1. I know there are 4 steps to finish.

  1. find relocation table;
  2. process each relocatoin entry;
  3. find the address of referred symbol (in test 1 the symbol is "puts");
  4. add the address with addend and fill (in test 1 the addend is 0)

We can see that in .rela.plt segment, the r_offset is 0x201018.
Now after reading the instruction, I have write a program which could find the address of "puts" successfully, even could printf "puts".
My Problem : where should I fill the address in? We already the r_offset is 0x201018, should I write the address in original file(lib1.so) or in the virtual memory? I use gdb to printf my result as figure below. By the way, I write the address in virtual memory.
Some text
Some text

(Submission) How to submit your work

Hi everyone, I hope you are doing well. If not, please do not hesitate to ask help from your classmates and me!

Since there are around 10 days before deadline, I want to announce submission issues now:

The following lines are deleted.

- You only need to submit your code under `src/` to me, and that's it. No report, no binaries.

- To do this, you should create an archive file via:

- tar -cf YourName-YourStudentID.tar src/

- To verify if you've done it right:

- tar -tf YourName-YourStudentID.tar

- and you should see:

- src/
- src/RelocLibrary.c
- src/trampoline.S
- src/Link.h
- src/RuntimeResolve.c
- src/OpenLibrary.c
- src/LoaderInternal.h
- src/FindSymbol.c
- src/Loader.h
- src/MapLibrary.c
- src/InitLibrary.c

Update: Since the university requires me to assign a report, the way of submission needs changes.

I know that lots of you are using a VM, so it would be uncomfortable to write a report on that. Therefore, you can create a YourName-YourStudentID.zip file including src/ folder, and a report named *.md/*.docx/*.doc/*.pdf.

A report should briefly summarize how you consider each test case you finished, and how did you implement it(e.g. the argument or the algorithm you use). Feel free to explain it with code if you feel necessary. At the end of the report, you should also provide an image of your autograder evaluation to give me a lower bound of your score.

Sorry for the confusion caused.

TODO:
Github has some functionalities enabling automated grading of students submission, but I'm too busy for that. Plus, networking is also an issue. Connection to Github is usually unstable.

Maybe in the future I could free myself from extracting students' submissions and making them. By that time this issue can be close :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.