hewlettpackard / quartz Goto Github PK

View Code? Open in Web Editor NEW

157.0 21.0 66.0 255 KB

Quartz: A DRAM-based performance emulator for NVM

Home Page: https://github.com/HewlettPackard/quartz

License: Other

CMake 1.39% C 86.98% Shell 9.20% Makefile 0.10% C++ 2.33%

quartz's People

Contributors

Stargazers

Watchers

quartz's Issues

How to set DRAM+NVM or single NVM?

I don't understand from the documentation how can we define the mode that we want to use. As I understand in the nvmemul.ini file we define the parameters of the NVM, however how can we select which mode of the emulator we will use? Thanks

Feature request: NVM programming support for mmap()

In my case(running Jikes RVM on NVM), I need a specific virtual memory range mapped to NVM, by using some API like pmmap().
Can you give me some hint to start patching quartz?

wonder about statistics

Hello,
during the experiments with pure PM mode,
I found that the number of NVM accesses are very different in each trial as followings.
I only changed the latency of read and write in the nvmemul.ini

Are there any other configurations should I do to get correct emulation results?

The program uses malloc() and free(), and I run the script after loading nvmemul module.

scripts/runenv.sh prog.exe args

following is CPU information

2-socket, Haswell, 2-way E5-4650v3

Usage of undocumented performance events on Haswell

Quartz uses the two encodings 0x530cd3 and 0x5303d3 for the events MEM_LOAD_UOPS_L3_MISS_RETIRED.REMOTE_DRAM and MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM, respectively. However, these encodings are only documented in the Intel manual for Ivy Bridge and not Haswell. Instead, on Haswell, the encodings to be used should be 0x5304d3 and 0x5301d3, respectively.

Releasing videocapture object in OpenCV takes more and more time.

I'm running a opencv program with quartz, this program is to read a lots of videos from a dataset and get some frames from video. But with for loop going on, the videocapture object's release function takes more and more time. At the beginning, release() takes a few milliseconds, then takes hundreds of milliseconds, finally the program need to wait for release() for seconds.

Here is my program:

#include <fstream>
#include <iostream>
#include <string>
#include <cstdio>
#include <random>
#include <algorithm>

#include <opencv2/core/core.hpp>
#include <opencv2/core/version.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/highgui/highgui_c.h>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/opencv.hpp>

#include <sys/time.h>

#include "/home/liupai/hme-workspace/hme-opencv-test/quartz/src/lib/pmalloc.h"

using namespace std;

void ImageChannelToBuffer(const cv::Mat* img, char* buffer, int c)
{
    int idx = 0;
    for (int h = 0; h < img->rows; ++h) {
        for (int w = 0; w < img->cols; ++w) {
            buffer[idx++] = img->at<cv::Vec3b>(h, w)[c];
        }
    }
}
int data_size = 0;
int read_video_to_volume_datum(const char* filename, const int start_frm,
    const int label, const int length, const int height, const int width,
    const int sampling_rate, char** datum)
{
    cv::VideoCapture cap;
    cv::Mat img, img_origin;

    int offset = 0;
    int channel_size = 0;
    int image_size = 0;
    
    int use_start_frm = start_frm;

    cout << "\n#######Start!!!! cap.open file" << endl;
    cap.open(filename);
    if (!cap.isOpened()) {
        cout << "Cannot open " << filename << endl;
        return false;
    }

    int num_of_frames = cap.get(CV_CAP_PROP_FRAME_COUNT) + 1;
    if (num_of_frames < length * sampling_rate) {
        cerr << filename << " does not have enough frames; having "
             << num_of_frames << endl;
        return false;
    }

    offset = 0;
    if (use_start_frm < 0) {
        cerr << "start frame must be greater or equal to 0" << endl;
    }

    int end_frm = use_start_frm + length * sampling_rate - 1;
    if (end_frm > num_of_frames) {
        cerr << "end frame must be less or equal to num of frames, "
             << "filename: " << filename << endl;
    }

    if (use_start_frm) {
        cout << "\033[31m"
             << "use_start_frm: " << use_start_frm
             << ", end_frame: " << end_frm
             << ", num_of_frames: " << num_of_frames
             << ", filename: " << filename
             << "\033[0m" << endl;
        cap.set(CV_CAP_PROP_POS_FRAMES, use_start_frm - 1);
    }

    for (int i = use_start_frm; i <= end_frm; i += sampling_rate) {
        if (sampling_rate > 1) {
            cap.set(CV_CAP_PROP_POS_FRAMES, i);
        }

        if (height > 0 && width > 0) {
            cap.read(img_origin);
            if (!img_origin.data) {
                cerr << filename << " has no data at frame " << i << endl;
                if (*datum != NULL) {
                    pfree(datum, data_size);
                }
                cap.release();
                return false;
            }
            cout << "resize img_origin" << endl;
            cv::resize(img_origin, img, cv::Size(width, height));
        } else {
            cap.read(img);
        }

        if (!img.data) {
            cerr << "Could not open or find file " << filename << endl;
            if (*datum != NULL) {
                pfree(datum, data_size);
            }
            cap.release();
            return false;
        }

        if (i == use_start_frm) {
            image_size = img.rows * img.cols;
            channel_size = image_size * length;
            data_size = channel_size * 3;
            *datum = (char*)pmalloc(data_size*sizeof(char));
        }

        for (int c = 0; c < 3; c++) {
            ImageChannelToBuffer(&img, *datum + c * channel_size + offset, c);
        }
        cout << "offset = " << offset << endl;
        offset += image_size;
        img_origin.release();
    }
    cout << "\033[32mstart cap.release()\033[0m" << endl;
    struct timeval tv_begin, tv_end;
    gettimeofday(&tv_begin, NULL);
    cap.release();
    gettimeofday(&tv_end, NULL);
    cout << "cap.release(): " << 1000.0*(tv_end.tv_sec - tv_begin.tv_sec)
        + (tv_end.tv_usec - tv_begin.tv_usec)/1000.0 << " ms." << endl;
    cout << "\033[32mend cap.release()\033[0m" << endl;
    return true;
}

void shuffle_clips(vector<int>& shuffle_index){
    std::random_device rd;
    std::mt19937 g(rd());
    std::shuffle(shuffle_index.begin(), shuffle_index.end(), g);
}

int main()
{
    const string root_folder = "/home/liupai/hme-workspace/train-data/UCF-101/";
    const string list_file = "/home/liupai/hme-workspace/workspace/C3D/C3D-nvram/examples/c3d_ucf101_finetuning/train_02.lst";

    cout << "opening file: " << list_file << endl;
    std::ifstream list(list_file.c_str());

    vector<string> file_list_;
    vector<int> start_frm_list_;
    vector<int> label_list_;
    vector<int> shuffle_index_;

    int count = 0;
    string filename;
    int start_frm, label;
    while (list >> filename >> start_frm >> label) {
        file_list_.push_back(filename);
        start_frm_list_.push_back(start_frm);
        label_list_.push_back(label);
        shuffle_index_.push_back(count);
        count++;
    }
    shuffle_clips(shuffle_index_);

    const int dataset_size = shuffle_index_.size();
    const int batch_size = 30;
    const int new_length = 8;
    const int new_height = 128;
    const int new_width = 171;
    const int sampling_rate = 1;
    char* datum = NULL;
    int lines_id_ = 0;

    const int max_iter = 20000;
    for (int iter = 0; iter < max_iter; ++iter) {
        
        for (int item_id = 0; item_id < batch_size; ++item_id) {
            cout << "------> iter: " << iter << endl;
            bool read_status;
            int id = shuffle_index_[lines_id_];
            read_status = read_video_to_volume_datum((root_folder + file_list_[id]).c_str(), start_frm_list_[id],
                label_list_[id], new_length, new_height, new_width, sampling_rate, &datum);
            if (read_status) {
                pfree(datum, data_size);
            }

            lines_id_++;
            if (lines_id_ >= dataset_size) {
                // We have reached the end. Restart from the first.
                cout << "Restarting data prefetching from start." << endl;
                lines_id_ = 0;
            }
        }
    }
    cout << "$$$$$$$$$$$$$$ read file finish!!!!!!!!!!!!" << endl;
}

Here is a output:

# At the beginning

------> iter: 0
#######Start!!!! cap.open file
use_start_frm: 65, end_frame: 72, num_of_frames: 179, filename: /home/liupai/hme-workspace/train-data/UCF-101/PlayingViolin/v_PlayingViolin_g24_c02.avi
...
start cap.release()
cap.release(): 3.018 ms.
end cap.release()

------> iter: 0
#######Start!!!! cap.open file
use_start_frm: 1, end_frame: 8, num_of_frames: 202, filename: /home/liupai/hme-workspace/train-data/UCF-101/TrampolineJumping/v_TrampolineJumping_g18_c01.avi
...
start cap.release()
cap.release(): 3.062 ms.
end cap.release()

------> iter: 0
#######Start!!!! cap.open file
use_start_frm: 81, end_frame: 88, num_of_frames: 296, filename: /home/liupai/hme-workspace/train-data/UCF-101/PommelHorse/v_PommelHorse_g12_c03.avi
...
start cap.release()
cap.release(): 2.453 ms.
end cap.release()

------> iter: 0
#######Start!!!! cap.open file
use_start_frm: 49, end_frame: 56, num_of_frames: 272, filename: /home/liupai/hme-workspace/train-data/UCF-101/StillRings/v_StillRings_g22_c04.avi
...
start cap.release()
cap.release(): 2.146 ms.
end cap.release()

------> iter: 0
#######Start!!!! cap.open file
use_start_frm: 225, end_frame: 232, num_of_frames: 252, filename: /home/liupai/hme-workspace/train-data/UCF-101/HeadMassage/v_HeadMassage_g08_c03.avi
...
start cap.release()
cap.release(): 2.136 ms.
end cap.release()

------> iter: 0
#######Start!!!! cap.open file
use_start_frm: 49, end_frame: 56, num_of_frames: 106, filename: /home/liupai/hme-workspace/train-data/UCF-101/Bowling/v_Bowling_g19_c07.avi
...
start cap.release()
cap.release(): 3.315 ms.
end cap.release()

# After about 400 iterations

------> iter: 437
#######Start!!!! cap.open file
use_start_frm: 113, end_frame: 120, num_of_frames: 376, filename: /home/liupai/hme-workspace/train-data/UCF-101/Kayaking/v_Kayaking_g13_c04.avi
...
start cap.release()
cap.release(): 301.021 ms.
end cap.release()

------> iter: 437
#######Start!!!! cap.open file
use_start_frm: 49, end_frame: 56, num_of_frames: 141, filename: /home/liupai/hme-workspace/train-data/UCF-101/ApplyLipstick/v_ApplyLipstick_g20_c04.avi
...
start cap.release()
cap.release(): 301.74 ms.
end cap.release()

------> iter: 437
#######Start!!!! cap.open file
use_start_frm: 209, end_frame: 216, num_of_frames: 230, filename: /home/liupai/hme-workspace/train-data/UCF-101/BlowDryHair/v_BlowDryHair_g18_c03.avi
...
start cap.release()
cap.release(): 302.311 ms.
end cap.release()

------> iter: 438
#######Start!!!! cap.open file
use_start_frm: 177, end_frame: 184, num_of_frames: 307, filename: /home/liupai/hme-workspace/train-data/UCF-101/BoxingPunchingBag/v_BoxingPunchingBag_g08_c01.avi
....
start cap.release()
cap.release(): 351.546 ms.
end cap.release()

------> iter: 438
#######Start!!!! cap.open file
use_start_frm: 49, end_frame: 56, num_of_frames: 113, filename: /home/liupai/hme-workspace/train-data/UCF-101/FrontCrawl/v_FrontCrawl_g21_c06.avi
...
start cap.release()
cap.release(): 292.598 ms.
end cap.release()
------> iter: 438

Only print Debug messages when I run my app!

I set latency=true and debug level=5.
There are no print messages for program runs and only the Debug messages.
Would you have any suggestions on how to resolve this?
Thank you!

Will the write latency in NVM only mode or DRAM + NVM (Hybrid) mode be same as the write latency of the DRAM?

Since Quartz doesn't have write memory latency implemented yet as mentioned in Limitations of README file, does this mean that any write operations performed in NVM only mode or DRAM + NVM mode will have same write latency as that of the DRAM?

Errors regarding copy_from_user and libelf-dev are emitted during build

When I compile Quartz on Ubuntu 16.04 with kernel 4.15.0.29, I get three errors:

Makefile:976: "Cannot use CONFIG_STACK_VALIDATION=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel"

/home/hadi/code/quartz/build/src/dev/pmc.c: In function ‘pmc_ioctl_setcounter’:
/home/hadi/code/quartz/build/src/dev/pmc.c:171:9: error: implicit declaration of function ‘copy_from_user’ [-Werror=implicit-function-declaration]
if (copy_from_user(&q, (ioctl_query_setcounter_t*) arg, sizeof(ioctl_query_

/home/hadi/code/quartz/build/src/dev/pmc.c: In function ‘pmc_ioctl_getpci’:
/home/hadi/code/quartz/build/src/dev/pmc.c:224:17: error: implicit declaration of function ‘copy_to_user’ [-Werror=implicit-function-declaration]
if (copy_to_user((ioctl_query_setgetpci_t*) arg, &q, sizeof(ioctl_q

I resolved the first error by installing libelf-dev. Note that this library is not included in the script scripts/install.sh. I resolved the other two errors by modifying pmc.c so that it includes linux/uaccess.h instead of asm/uaccess.h.

After making these changes, the build completes successfully.

Segmentation Fault Error

Hello, I am trying to execute an application through the emulator. My application is executing successfully in the native machine. I try to link it with the emulator by adding the following flags in the compilation :

-I/NVMemul/quartz/src/lib/ -L/NVMemul/quartz/build/src/lib/ -lnvmemul

However, when I try to execute the app through the runenv.sh script I receive the following error:

../quartz/scripts/../build/src/lib/libnvmemul.so
../quartz/scripts/../nvmemul.ini
../quartz/scripts/runenv.sh: line 57: 25128 Segmentation fault (core dumped) $@

I have executed applications in the past successfully with these flags. Is there anything else that I am missing?

Need help to get quartz to work on skylake cpus

Hi, I'm trying to get quartz to work on Skylake cpus.
According to the paper, LDM_STALL is derived from L2stalls, L3 hits L3 miss.. which in turn are derived from the performance counter events on different cpu micro-architecture. I looked up(with papi_native_avail command) the events used on Haswell and found that most of the events still exist on Skylake except CYCLE_ACTIVITY:STALLS_L2_PENDING. The closest event I know is CYCLE_ACTIVITY:STALLS_L2_MISS which counts the Execution stalls while at least one L2 demand load is outstanding . But I'm not sure. So any idea on Skylake which event is equivalent?

By the way, I'm tyring to access native event counter instead of PAPI for performance reasons. So I have to assemble integer format of event id similar to the number 0x55305a3 in here. Any useful references for how this event id is represented?

What is pflush()?

HI,

I found there is a pflush() function in the code. Do we need to call it in our user programs in order to inject the PM latency we want?

Failed in Multiple emulated processes and MPI programs

I export the EMUL_LOCAL_PROCESSES environment variable with the number or emulated processes on the host. And I also choice the NVM model and DRAM+NVM model to run the MPI programs and parsec benchmark for multiple programs. But I can't get the answer.

There is fatal error in MPI_Finalize , I don't know how to use this to test the multiple programs. So how to deal with this problem.

Can quartz support other cpus?

My cpu is core i5 7th gen, which is Kaby lake rather than one of the three cpus mentioned in the articke, can I build and run quartz successfully?
Besides, can I run quartz in a virtual machine with linux OS?

Is write latency mode supported?

Hi there!

Is write latency mode supported now?

If not, what is write latency at configuration file?

Thanks!

no rule to make tart pmc.o

Hi,
I met the following error while compiling the Quartz code:

[root@localhost build]# make
[ 8%] Built target cpu
[ 82%] Built target nvmemul
[ 86%] Device]
make[5]: *** No rule to make target `/home/sbl/Quartz/quartz-master/build/src/dev/pmc.o, needed by /home/sbl/Quartz/quartz-master/build/src/dev/nvmemul.o. Stop.
make[4]: *** [module/home/sbl/Quartz/quartz-master/build/src/dev] Error 2
gmake[3]: *** [all] Error 2
make[2]: *** [src/dev/nvmemul.ko] Error 2
make[1]: *** [src/dev/CMakeFiles/dev_build.dir/all] Error 2
make: *** [all] Error 2

The environment I use is 2Socket Xeon5600/CentOS-7/Linux4.10/gcc-4.8.5.
I have installed all the required packages in the README.md, and compile the code in the following steps:

mkdir build
cd build
cmake ..
make

and the aforementioned error occurs...
Any suggestions?

Thank you very much.

a question about emulation of DRAM+NVM mode

hello, I have a question about emulation of DRAM+NVM mode. In nvmemul.ini file, Which type of memory (DRAM or NVM) will be affected by latency? When I use NVM-only mode, I find the performance is changed with different latency(set in ./nvmemul.ini), even if I don't employ pmalloc and pfree. can you describe how to emulate DRAM+NVM mode ? Thank you!

0 NVM access

When I try to run a program I don't get the correct output especially for NVM access, is it due to this :
"tee: /sys/bus/event_source/devices/cpu/rdpmc: No file or folder of this type" ?

tee: /sys/bus/event_source/devices/cpu/rdpmc: No file or folder of this type" ?

if you meet the issue：
tee: /sys/bus/event_source/devices/cpu/rdpmc: No file or folder of this type" ?

I think the problem might be:

your Quartz is built on Virtual Machine. Refer to https://stackoverflow.com/questions/19763070/ubuntu-12-10-perf-stat-not-supported-cycles/44253130#44253130, I guess RDPMC is still unavailable on the most virtual machine (at least I tried Ubuntu 14.04, 16.04 and 18.08 and centos 7.0 with Linux kernel 4.4 and 4.11 respectively.)
Still exploring other solution to support virtual machine within Quartz.

Unable to run Bandwidth Model

I am having difficulty running the bandwidth-model-building.sh where I am getting a segfault error. I have checked the configuration files to make sure things are set as instructed and with debugging find that the segfault occurs when the intel_xeon_ex_get_throttle_register's regs is set to 0x00 (image of work-space below).
Would you have any suggestions on how to resolve this?

Additional Info:

I am able to run the memlat-orig-lat-test.sh memlat-bench-test-10M.sh without any issues.
-I have run this on an i7-3740QM (IvyBridge) and an i7-4700MQ (Haswell)

How to run with numactl?

When quartz come to DRAM+NVM mode , it simulate the nvm on one (remote) node and inject the latency (maybe read latency?).

So can I think that the access memory behavior in remote node's dram is NVM access behavior?

If it is , can I use numactl mbind on node to run the app in nvm? What should I change in nvmemul.ini?

Some issues in pure PM mode...

I set Quartz to pure PM mode by setting physical_nodes = "0" in numemul.ini, and set read/write latency both to 1000. Then I start running a program by using runenv.sh the runtime of a test program, which has more than 100000 malloc() called inside, the runtime is about 0.13 seconds. If I run it without using runenv.sh , the runtime is about 0.12s. If I increase the read/write latencies to 10000, then running by runenv.sh, the runtime is about 0.22s.

However, if I replace malloc()/free() with pmalloc()/pfree() in the program, then the runtime is about 2.2s. Which means in a pure PM mode, pmalloc() and malloc() have obvious performance gap. But based on my understanding from the README file, pmalloc() and malloc() should have similar performance under a pure PM environment. Am I missing something?

Executing an application in Quartz

Statistics showing 0 NVM accesses for a simple linked list code using pmalloc

I have used this sample code where I have used pmalloc for a linked list

#include<stdio.h>
#include<stdlib.h>
#include "pmalloc.h"

typedef struct node
{
	int data;
	struct node *next;
}NODE;

void insertAtFront(NODE **head,int x)
{
	NODE *new_node = (NODE*)pmalloc(sizeof(NODE));
	new_node->data = x;
	new_node->next = *head;
	*head = new_node;
}


void insertAfter(NODE *prev,int x)
{
	if(prev==NULL)
	{
		printf("prev can't be NULL\n");
		return;
	}
	NODE *new_node = (NODE*)pmalloc(sizeof(NODE));
	new_node->data = x;
	new_node->next = prev->next;
	prev->next = new_node;
}

void append(NODE **head,int x)
{
	NODE *new_node = (NODE*)pmalloc(sizeof(NODE));
	new_node->data = x;
	new_node->next = NULL;

	NODE *last = *head;
	if(*head==NULL)
	{
		*head = new_node;
		return;
	}
	while(last->next != NULL)
		last = last->next;
	last->next = new_node;
}

void printList(NODE *p)
{
	while(p)
	{
		printf("%d->",p->data);
		p = p->next;
	}
	printf("\n");
}

void deleteElement(NODE **p,int elem)
{
	NODE *temp=*p;
	NODE *prev;
	if(temp != NULL && temp->data == elem) // if elem is at first node
	{
		*p = temp->next;
		free(temp);
	}
	while(temp!=NULL && temp->data!=elem)
	{
		prev=temp;
		temp=temp->next;
	}
	if(temp==NULL) return; // no such element
	prev->next = temp->next;
	free(temp);
}

void deleteAtPosition(NODE **p,int pos)
{
	if(*p==NULL) return;
	NODE *temp = *p;
	if(pos==0)
	{
		*p = temp->next;
		free(temp);
		return;
	}
	int i;
	for(i=0;temp!=NULL && i<pos-1;i++)
		temp = temp->next;   // ultimately gets previous node of the node to be deleted
	if(temp==NULL || temp->next==NULL)
		return;
	NODE *next = temp->next->next;
	free(temp->next);
	temp->next = next;
}

int getLength(NODE *p)
{
	int count = 0;
	while(p)
	{
		count++;
		p = p->next;
	}
	return count;
}

int getLengthRecursive(NODE *p)
{
	if(p==NULL)
		return 0;
	return 1 + getLengthRecursive(p->next);

}

void swapNodes(NODE **p,int x, int y)  
{
	if(x==y) 
		return;

	NODE *prevX=NULL, *prevY=NULL,*X=*p,*Y=*p;

	while(X!=NULL && X->data != x)
	{
		prevX = X;
		X = X->next;
	}

	while(Y!=NULL && Y->data !=	y)
	{
		prevY = Y;
		Y = Y->next;
	}
	
	if(X==NULL || Y == NULL)
		return;

	if(prevX==NULL)
		*p = Y;
	else
		prevX->next = Y;

	if(prevY==NULL)
		*p = X;
	else
		prevY->next = X;

	NODE *temp = X->next;
	X->next = Y->next;
	Y->next = temp;
	
}

void reverse(NODE **p)
{
	NODE *prev=NULL,*curr=*p,*next;
	while(curr!=NULL)
	{
		next = curr->next;
		curr->next=prev;
		prev = curr;
		curr = next;
	}
	*p = prev;	
}

void reverseRecursive(NODE **p)
{
	NODE *node = *p;
	if(node == NULL)
		return;
	NODE *rest = (*p)->next;
	if(rest==NULL)
		return;
	reverseRecursive(&rest);
	node->next->next = node;
	node->next = NULL;
	*p = rest;
}

int main()
{
	NODE *head = NULL;
	append(&head,1);
	insertAtFront(&head,2);
	append(&head,3);
	insertAfter(head->next,10);
	printList(head);
	printf("Length: %d \n",getLength(head));
	printf("Length Recursive: %d \n", getLengthRecursive(head));
	//deleteElement(&head,1);
	printList(head);
	//deleteAtPosition(&head,1);
	printList(head);
	printf("Length: %d \n",getLength(head));
	printf("Length Recursive: %d \n", getLengthRecursive(head));
	swapNodes(&head,2,1);
	printList(head);
	reverse(&head);
	printList(head);
	reverseRecursive(&head);
	printList(head);
	return 0;
}

My current directory contents looks like this

plinkedlist.c
src < src directory of quartz >
scripts < scripts directory of quartz>
build < build file of quartz>
nvmemul.ini
nvmemul.dox
nvmemul-orig.ini
a.out < the program executable>

I have compiled the file using the following commands
gcc -I src/lib/ plinkedlist.c -L build/src/lib/ -lnvmemul
sudo scripts/setupdev.sh load
scripts/runenv.sh ./a.out

I get the correct program output but in the statistics I get 0 NVM accesses, even though this is untrue.

Statistics Output:


===== STATISTICS (Thu Nov 23 22:22:17 2017) =====

PID: 18718
Initialization duration: 2136458 usec
Running threads: 0
Terminated threads: 1

== Running threads == 

== Terminated threads == 
	Thread id [18718]
		: cpu id: 0
		: spawn timestamp: 632629839714
		: termination timestamp: 632629839811
		: execution time: 97 usecs
		: stall cycles: 0
		: NVM accesses: 0
		: latency calculation overhead cycles: 0
		: injected delay cycles: 0
		: injected delay in usec: 0
		: longest epoch duration: 0 usec
		: shortest epoch duration: 0 usec
		: average epoch duration: 0 usec
		: number of epochs: 0
		: epochs which didn't reach min duration: 0
		: static epochs requested: 0

Is there any reason/mistake I'm making?

Run wtih C++ program and complie return errors:

I run Quartz with my own CPP file, with the command:
g++ -I [Eumlator_Path]/quartz/src/lib/ myprogram.cpp -L [Eumlator_Path]/quartz/build/src/lib/ -lnvmemul,
(it works well with .C file with gcc complier)
But turns error:
/usr/include/c++/6/ext/string_conversions.h: In constructor ‘__gnu_cxx::__stoa(_TRet ()(const _CharT, _CharT**, _Base ...), const char*, const _CharT*, std::size_t*, _Base ...)::_Save_errno::_Save_errno()’:
/usr/include/c++/6/ext/string_conversions.h:63:27: error: ‘errno’ was not declared in this scope
_Save_errno() : _M_errno(errno) { errno = 0; }
^
/usr/include/c++/6/ext/string_conversions.h: In destructor ‘__gnu_cxx::__stoa(_TRet ()(const _CharT, _CharT**, _Base ...), const char*, const _CharT*, std::size_t*, _Base ...)::_Save_errno::~_Save_errno()’:
/usr/include/c++/6/ext/string_conversions.h:64:23: error: ‘errno’ was not declared in this scope
~_Save_errno() { if (errno == 0) errno = _M_errno; }
^
/usr/include/c++/6/ext/string_conversions.h: In function ‘_Ret __gnu_cxx::__stoa(_TRet ()(const _CharT, _CharT**, _Base ...), const char*, const _CharT*, std::size_t*, _Base ...)’:
/usr/include/c++/6/ext/string_conversions.h:72:16: error: ‘errno’ was not declared in this scope
else if (errno == ERANGE
^
In file included from /usr/include/c++/6/bits/basic_string.h:5420:0,
from /usr/include/c++/6/string:52,
from /usr/include/c++/6/bits/locale_classes.h:40,
from /usr/include/c++/6/bits/ios_base.h:41,
from /usr/include/c++/6/ios:42,
from /usr/include/c++/6/ostream:38,
from /usr/include/c++/6/iostream:39,
from /home/lishuai/fwang/quartz/reram_test.cpp:2:
/usr/include/c++/6/ext/string_conversions.h:72:25: error: ‘ERANGE’ was not declared in this scope
else if (errno == ERANGE

The primal error has a lot "XXX" was not declared in this scope, and I have already fixed some. But for the left, I need help.
Have somebody met the same problem or give some suggestions? Thank you.

wrmsr:pwrite: Input/output error Turbo Boost disabled for all CPUs?

Quartz affecting the application thread sleep time in hybrid (DRAM+NVM) mode.

I wrote a sample program in which I allocate memory randomly to dram (using malloc) and nvm (using pmalloc) and a background thread which is supposed to print out the total bytes allocated to NVM and DRAM after every 1 second.

#include <iostream>
#include <cstdlib>
#include <chrono>
#include <thread>
#include <pthread.h>

using namespace std::chrono;

size_t nvm_size = 0;
size_t dram_size = 0;
high_resolution_clock::time_point start;
high_resolution_clock::time_point stop;
bool status = true;

void print_all() {
    stop = high_resolution_clock::now();
    milliseconds time = duration_cast<milliseconds>(stop-start);
    std::cout << time.count() << "\t" << nvm_size << "\t" << dram_size << std::endl;
}

void start_time() {
    start = high_resolution_clock::now();
    while (status) {
        print_all();
        std::this_thread::sleep_for(seconds(1));
    }
}

void stop_time() {
    status = false;
}

void add_nvm_size(size_t size) {
    nvm_size += size;
}

void remove_nvm_size(size_t size) {
    nvm_size -= size;
}

void add_dram_size(size_t size) {
    dram_size += size;
}

void remove_dram_size(size_t size) {
    dram_size -= size;
}

// void *allocate_nvm(size_t size) {
//     return pmalloc(size);
// }

void *allocate_dram (size_t size) {
    return malloc(size);
}

int main(int argc, char *argv[]) {
    std::thread (start_time).detach();

    int count=1;
    
    while(count<=10000000) {
        int random = rand() % 4;

        if (random==0) {

            allocate_dram (67108864);
            add_dram_size(67108864);
            // std::cout<<count<<"- Allocated in DRAM"<<"\tDRAM SIZE: "<<dram_size<<std::endl;


        }
        else if(random==1){

            allocate_dram (67108864);
            add_nvm_size(67108864);
            // std::cout<<count<<"- Allocated in NVRAM"<<"\tNVRAM SIZE: "<<nvm_size<<std::endl;

        }
        else if(random==2){

            if(dram_size>=67108864) {
                remove_dram_size(67108864);
                // std::cout<<count<<"- Freed from DRAM"<<"\tDRAM SIZE: "<<dram_size<<std::endl;

            }
            // else
                // std::cout<<count<<"- Not Enough Memory Allocated in DRAM to be freed"<<"\tDRAM SIZE: "<<dram_size<<std::endl;



        }
        else if(random==3){

            if(nvm_size>=67108864) {
                remove_nvm_size(67108864);
                // std::cout<<count<<"- Freed from NVRAM"<<"\tNVRAM SIZE: "<<nvm_size<<std::endl;

            }
            // else
                // std::cout<<count<<"- Not Enough Memory Allocated in NVRAM to be freed"<<"\tNVRAM SIZE: "<<nvm_size<<std::endl;


        }

        count++;

    }
    stop_time();
    return 0;
}

The following program ouputs correctly outside quartz. It displays the ouput after every 1 second. So on the left is time in milliseconds, followed by bytes allocated on NVM and bytes allocated on DRAM.

time    NVM    DRAM
0	201326592	0
1000	2885681152	1275068416
2000	30735859712	3288334336
3000	16911433728	138512695296
4040	37983617024	191797133312
5042	14159970304	129654325248
6361	38453379072	189918085120
7363	33554432000	108045271040
8365	15099494400	109521666048
9366	24763170816	117306294272

When I run this program with the quartz in hybrid mode it prints output after 10 milliseconds.

0	268435456	7650410496
10	1879048192	10401873920
20	2415919104	6845104128
30	2483027968	12616466432
40	3556769792	11811160064
50	4496293888	11408506880
60	536870912	17783848960
70	11072962560	16575889408
80	8120172544	9663676416
90	8657043456	7583301632
100	4966055936	268435456
110	939524096	1006632960
120	1476395008	2617245696
130	1946157056	10066329600
140	1073741824	14898167808
150	2281701376	15502147584
160	1744830464	17448304640
...

So quartz is not affecting the functionality of the thread but it's affecting the sleep time of thread.
I have not set EMUL_LOCAL_PROCESSES. Do I need to? Also why will quartz affect only the sleep time of a application thread?

CPUs seem inactive

I tried to use Quartz, on a machine with 12 CPUs. However, when I htop after the emulation there seem only 2 CPUs active. How can I restore my CPUs to the initial state?

Broadwell Intel processors Not supported

Hello , my server's cpu is Xeon E5-2630 v4 @ 2.20GHz , a Broadwell processor , and it's not supported.
Could you please modify the program to support this new processor ?

Make issue

Hi,
I met the following error while compiling the Quartz code:

when I use make clean all following your step , I got a problem.

[ 69%] Building C object src/lib/CMakeFiles/nvmemul.dir/stat.c.o
/home/ZHduan/quartz/src/lib/stat.c:19:20: fatal error: utlist.h: No such file or directory
#include "utlist.h"
^
compilation terminated.
make[2]: *** [src/lib/CMakeFiles/nvmemul.dir/stat.c.o] Error 1
make[1]: *** [src/lib/CMakeFiles/nvmemul.dir/all] Error 2
make: *** [all] Error 2

You said No specific Linux distribution or kernel version is required. So what's wrong ?
The environment I use is Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz CentOS - linux 3.11.0 gcc version 4.8.5

Can we change the Memory capacity that we simulate?

Since the capacity of PCM can be larger than DRAM, can we set the NVM to be larger than the DRAM when we do the simulation?

Thank you very much.

Can’t run Qemu in Quartz.

When I run this:
./scripts/runenv.sh qemu-system --enable-kvm -cpu host -m 8192 -smp 2 -vcpu 0,affinity=0 -vcpu 1,affinity=1 -numa node,mem=4096,cpus=0 -numa node,mem=4096,cpus=1 -drive file=/home/temp/Dyang/centos7-200.qcow2,if=none,id=drive-virtio-disk,format=qcow2 -device virtio-blk-pci,bus=pci.0,drive=drive-virtio-disk,id=virtio-disk -net nic,model=virtio -net tap,script=no -monitor telnet:10.192.168.118:4444,server,nowait -balloon virtio

I get an unexpected error.
qemu-system: ……qemu-gfn/qemu/accel/kvm/kvm-all.c:2380: kvm_ipi_signal: Assertion kvm_immediate_exit' failed`

I set the Debug level to 5 and just find nothing in Quartz print out.
But when I run qemu-system without Quartz, it works.

In kvm_ipi_signal, it calls kvm_cpu_kick to atomic_set(&cpu->kvm_run->immediate_exit,1).
In this reference（https://patchwork.ozlabs.org/patch/732808/?tdsourcetag=s_pctim_aiomsg）,

The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick" a VCPU out of KVM_RUN through a POSIX signal. A signal is attached to a dummy signal handler; by blocking the signal outside KVM_RUN and unblocking it inside, this possible race is closed:

      VCPU thread                     service thread

    check flag
                                                            set flag
                                                            raise signal
    (signal handler does nothing)
    KVM_RUN

However, one issue with KVM_SET_SIGNAL_MASK is that it has to take tsk->sighand->siglock on every KVM_RUN. This lock is often on a remote NUMA node, because it is on the node of a thread's creator. Taking this lock can be very expensive if there are many userspace exits (as is the case for SMP Windows VMs without Hyper-V reference time counter).

Since Quartz generates IPI interrupt injection delay through remote NODE node memory access, will this affect KVM?Does Quartz support Qemu? Does Quartz have some influences on kvm?

Setting read throttling register?

Hello,

The BW throttling worked for me only after setting the THROTTLE_DDR_READ registers
in the __set_read_bw function specifically for runs after the training phase when the bandwidth model file is already present. Is this correct?

__set_read_bw() {
...
node->cpu_model->set_throttle_register(regs, THROTTLE_DDR_ACT,
read_bw_model.throttle_reg_val[point]);
//Added statement
node->cpu_model->set_throttle_register(regs, THROTTLE_DDR_READ,
read_bw_model.throttle_reg_val[point]);
...
}

Not able to run scala programs

ERROR: ld.so: object 'scripts/../build/src/lib/libnvmemul.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
/usr/bin/scala: line 19: cd: /usr/share/scala/bin

Hi, I am getting the above error when i try to run scala programs. There are no issues when I try to run java and C applications.

This is the code which I am trying to run,

object ForLoop {
def main(args: Array[String]) {
var a = 0;
for( a <- 1 to 100){
println( "Value of a: " + a );
}
}
}

The above code works as usual with scala but, when ran with quartz it returns an error. The following is the command which I gave to run this code,
$ scripts/runenv.sh scala ForLoop

Note:

I have extended quartz to support Broadwell based processors and ran benchmark tests to verify the configurations. The results looked fine.
C and Java applications have no issues with quartz.

Some questions about bandwidth emulator in DRAM+NVM mode

I have read the README.md file and I was confused with the bandwidth emulation.
Consider a duel-socket NUMA environment in which node1 is configured as a virtual NVM node. Does that mean all memory requests to node1's local memory are affected by the bandwidth emulation? (Even if the process is running on node0)
And...what if a process running on node1 access the local memory of node0, will it be affected by the bandwidth emulation?
Sorry, these questions may seem stupid because I'm not familiar with the memory access in a NUMA environment.

Computer configuration

My Computer configuration is very common personal computer. Intel core i3, Ubuntu 14.04. Can I install quartz successfully

CPU support issue

Looks like by default, only 3 CPUs are supported:

In /src/lib/cpu/known_cpus.h line 21:

cpu_model_t* known_cpus[] = {
&cpu_model_intel_xeon_ex_v3,
&cpu_model_intel_xeon_ex_v2,
&cpu_model_intel_xeon_ex,
0
};

My question is, can we add our own CPU model names into this without causing any trouble, as long as the CPU I use is in the three processor families: Sandy Bridge, Ivy Bridge, and Haswell？

Does Scala programs run on Quartz?

I just wanted to know if anyone were able to run Scala programs on Quartz. If so, what were the changes which you made to be able to run it?

Running an application in Quartz

scripts/runenv.sh <your_app>
This is the command mentioned to run our application.
I am trying to run dhrystone-2.1 benchmark, but not sure how to run it on Quartz tool.
Please let me know how to run dhrystone-2.1 benchmark on Quartz tool.

NVM delay does not work in the middle of program execution

Hello

I have some problem with NVM read delay.

In my case, as the size of the data increases, it seems that the NVM read delay does not work in the middle of program execution. but if the data size is small, it works well.

I attached the picture that I captured the part where delay did not work using debug mode.

What should I do?

I look forward to your reply

My Experiment setup

Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz (2 socket)
Linux Kernel : 4.4.0-31-generic
Ubuntu 14.04.5 LTS
RAM : 256GB

Error in loading kernel module

Hey,I am getting the error : Unable to load kernel module when i execute below command

sudo scripts/setupdev.sh load

How do i fix this ?

Kernel module loading failed when run setupdev.sh load

Hi guys !
I had a problem when I run command "sudo scripts/setupdev.sh load". It reports that Kernel module loading failed.I don't know how to fix it .I was following the README step by step.
My OS is 4.15.0-46-generic #49-Ubuntu
And the prerequest I guess i have been installed successfully.Because when i use apt-get install xxx it says that

cmake is already the newest version (3.10.2-1ubuntu2).
libconfig-dev is already the newest version (1.5-0.4).
libnuma-dev is already the newest version (2.0.11-2.1).
uthash-dev is already the newest version (2.0.2-1).

I use "apt-get install linux-headers-$(uname -r)“ to install linux-header it says that
linux-headers-4.15.0-46-generic is already the newest version (4.15.0-46.49).

I don't know if there is any version incompatibility problem. Could anybody give me a favour?
Big Thanks!

xeon E5-2620 v4 @2.10GHz No supported processor found

Hello, I would like to ask a question, my server's cpu model is xeon E5-2620 v4 @2.10GHz, in the implementation of runenv.sh script prompt [16811] ERROR: No supported processor found. I want to determine if this processor meets the requirements.

The number of physical nodes is greater than the number of memory-controller pci buses

when I first run the benchmarktest of bandwidth in benchmarktest directory,it show me "The number of physical nodes is greater than the number of memory-controller pci buses".the result figure is as below:

it show that topology mc pci file saved,but there is no data in /tmp/mc_pci_bus.
then I run the benchmarktest of bandwitdh,it show that there is no complete memory-controller pci topology to be found and report segmentation fault.the result figure is as below:

thanks in advance for any help! my cpu model is haswell.

How to modify xeon-ex.h to support Core CPU?

Hi, I'm trying to get quartz to work on my Core Skylake cpus.
According to the paper, the bandwidth model utilize thermal control registers. In Xeon, the corresponding register is THRT_PWR_DIMM[0:2]. I look up the register documents for Core, there doesn't have any register named THRT_PWR_DIMM. Also, there are no registers in Core can set the max number of transactions during the 1 usec throttling time frame per power throttling . Is it possible for bandwidth model to work on Core cpus?

Can DRAM+NVM mode run on Sandy Bridge ?

My cpu is Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz and has two socket.

When I load the module , it find that this is Sandy Bridge.

Do Quartz support D+N mode in Sandy Bridge?

Unexplained Building Error

I've been trying to set up Quartz for a few days. After struggling far more than I should have, I've reached a dead end. When I attempt to run the make clean all command from within quartz-master/build, I run into the error pictured in the screenshot below. (This is my second time running the instruction, hence why it begins at 77%.)

I'm assuming the error is that nvmemul.ko is undefined. The only potential cause I can identify is that when I ran scripts/install.sh, the report says that 13 packages were not upgraded. I have configured my CMakeLists.txt file so that there were no errors during the cmake .. command. I haven't been able to find anything by searching, and a friend who is well-versed in Unix did not understand why this error occurred.

I have also tried doing the build instructions from within quartz-master/src. The instructions are not clear what is meant by "the emulator's source code root folder". However, this causes an error 4% into the cmake .. command, so I'm guessing using quartz-master/src is not the solution.

Computer Information:
Intel® Core™ i7-2600S CPU @ 2.80GHz × 8 (Sandy Bridge, I believe)
AMD® Turks / AMD® Turks
Ubuntu 21.04, 64-bit
The Ubuntu installation is a partition running natively on a ~2013 IMac.

hewlettpackard / quartz Goto Github PK

quartz's People

Contributors

Stargazers

Watchers

Forkers

quartz's Issues

Recommend Projects

Recommend Topics

Recommend Org