Git Product home page Git Product logo

nanocube's Introduction

Nanocubes: an in-memory data structure for spatiotemporal data cubes

Nanocubes are a fast data structure for in-memory data cubes developed at the Information Visualization department at AT&T Labs Research. Visualizations powered by nanocubes can be used to explore datasets with billions of elements at interactive rates in a web browser, and in some cases nanocubes uses sufficiently little memory that you can run a nanocube in a modern-day laptop.

About this branch

The master branch now contains a new implementation of Nanocubes in the C programming language (version v4.0 on). The goal with this new implementation was to get a much finer control in all aspects of the data structure and specially on its memory aspects (allocation, layout). In our original C++ template-based implementation of Nanocubes (up to version 3.3), we implemented the Nanocube data structure on top of C++ STL (standard library) and while this was a reasonable solution at the time, it had some important downsides: (1) complex serialization which made it hard to save/load Nanocube into files; (2) variations in the internal memory layout of a Nanocube based on the specific STL implementation we used.

Here is a link to the new API

Docker Demo

$git clone https://github.com/laurolins/nanocube.git
$cd nanocube
#build the docker image
$docker build . -t nanocube 

#run the demo
$docker run -it --rm -p 12345:80 nanocube

# See the demo at http://localhost:12345/ zoom into Chicago

Compiling on Linux or Mac

# Dependencies for Ubuntu 18.04
# sudo apt install build-essential curl unzip
#
# Dependencies for Mac OS X 10.13.4
# XCode

# get the v4 branch
curl -L -O https://github.com/laurolins/nanocube/archive/master.zip
unzip master.zip
cd nanocube-master

# modify INSTALL_DIR to point to another installation folder if needed
export INSTALL_DIR="$(pwd)/install"
./configure --with-polycover --prefix="$INSTALL_DIR"
make
make install

# Test if nanocubes is working
$INSTALL_DIR/bin/nanocube

# Add nanocube binaries to the PATH environment variable
export PATH="$INSTALL_DIR/bin":$PATH

Creating and serving a nanocube index

# create a nanocube index for the Chicago Crime dataset (small example included)
# Inputs: (1) CSV data file, (2) mapping file (data/crime50k.map)
# Output: (1) nanocube index called data/crime50k.nanocube
nanocube create <(gunzip -c data/crime50k.csv.gz) data/crime50k.map data/crime50k.nanocube -header

# serve the nanocube index just created on port 51234
nanocube serve 51234 crimes=data/crime50k.nanocube &

# test querying the schema of the index
curl "localhost:51234/schema()"

# test querying the number of indexed records
curl "localhost:51234/format('text');q(crimes)"

# test querying the number of records per crime type
curl "localhost:51234/format('text');q(crimes.b('type',dive(1),'name'))"

For more information on .map files go to mapping files

For more query examples go to API

Viewer

# Setup a web viewer on port 8000 for the crimes nanocube previously opened 
# on port 51234.
#
# Parameters:
#     -s         nanocube backend server (any http heachable machine)
#     --ncport   nanocube backend port
#     -p         port of the webviewer to be open in the localhost
#

nanocube_webconfig -s http://`hostname -f` --ncport 51234 -p 8000

Zoom into the Chicago region to see a heatmap of crimes.

image

Extra

For more advanced information follow this link: extra

Branch with latest features: link

nanocube's People

Contributors

cscheid avatar domoritz avatar fabio-miranda avatar jklosow avatar laurolins avatar salivian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nanocube's Issues

Providing column schema via command line

nanocube-binning-csv expects the column names to be present in the first line of the file.
When we deal with files of size 15-20GB it is time consuming to insert that line.

It would be good if we could pass the column schema via command line and not as first line in file.

Problems about nanocube's configuration and running in ubuntu16.04 LTS

I installed all the dependencies as said in README as follows:

sudo apt-get install build-essential
sudo apt-get install automake
sudo apt-get install libtool
sudo apt-get install zlib1g-dev
sudo apt-get install libboost-all-dev
sudo apt-get install libcurl4-openssl-dev

Then I input the follow command into console:

wget https://github.com/laurolins/nanocube/archive/3.2.1.zip
unzip 3.2.1.zip
cd nanocube-3.2.1
export NANOCUBE_SRC=`pwd`
./bootstrap
mkdir build
cd build
../configure --prefix=$NANOCUBE_SRC CXXFLAGS="-O3"

Then the problem showed at the last command of configure....

The error shows below and I don't know if I should do some extra operations to solve it?

checking for Boost's header version... 
configure: error: invalid value: boost_major_version=

Any ideas about it? Thanks.

can you add hotMap ?

when the point is very large, you cann't know what point is hot . maybe the hotMap is userful.

different CORS headers

It would be a good idea to allow users to configure the cross-origin headers emitted by nanocube http servers. Right now we're hard-coding "Access-Control-Allow-Origin: *", which means that if we ever want to add a layer of security it needs to happen via a proxy. That is possible but painful for users to configure. We could add a command-line option for users to change * to whatever they want.

error when compiling on Debian unstable

checking for Boost headers version >= 1.48.0... yes
checking for Boost's header version... 1_58_0
checking for the flags needed to use pthreads... -pthread
checking for the toolset name used by Boost for g++... configure: WARNING: could not figure out which toolset name to use for g++

checking boost/system/error_code.hpp usability... yes
checking boost/system/error_code.hpp presence... yes
checking for boost/system/error_code.hpp... yes
checking for the Boost system library... yes
checking boost/thread.hpp usability... no
checking boost/thread.hpp presence... yes
configure: WARNING: boost/thread.hpp: present but cannot be compiled
configure: WARNING: boost/thread.hpp:     check for missing prerequisite headers?
configure: WARNING: boost/thread.hpp: see the Autoconf documentation
configure: WARNING: boost/thread.hpp:     section "Present But Cannot Be Compiled"
configure: WARNING: boost/thread.hpp: proceeding with the compiler's result
configure: WARNING:     ## ----------------------------------- ##
configure: WARNING:     ## Report this to [email protected] ##
configure: WARNING:     ## ----------------------------------- ##
checking for boost/thread.hpp... no
configure: error: cannot find boost/thread.hpp

~/nanocube-3.2.1/build$ uname -a
Linux mijn 2.6.32-042stab093.4 #1 SMP Mon Aug 11 18:47:39 MSK 2014 x86_64 GNU/Linux

Note I had to edit the configure script because it wasn't getting the Boost header version. Here's how I "fixed" that issue:

boost_cv_lib_version=1_58_0 #`cat conftest.i`

syntax errors timeout instead of failing

I'm generating my queries from a custom language, so I'm making a lot of syntax errors in my queries. Things like missing parentheses and extra parentheses.

Instead of getting an error, these queries wait for about a minute and then return an empty response. It would be more helpful for them to fail, even if they are not specific about the syntax error.

Expose Rest of RESTful API Methods

Noticed in src/nc.cc there are a fair amount of unavailable API methods defined.

  • topd
  • unique
  • words
  • ids

IDs are a concern to my team. We would prefer to use nanocubes for our solution, but without retrieving an original reference to the dataset returned from a query, we will be unable to feed the nanocube result to query an external service.

When will the API be fully developed?

instal/configuration in OSX. env variables issue

Dear @laurolins/nanobuce support,
I've recently discovered your toolkit and I'm testing it.
The test demo about crimes on my machine at localhost now works perfectly, BUT...

It's my behaviour to set in .profile of bash the export of variables.
I've seems that your configuration and installation depend on the name of the variables and not on their content.
That's the case of NANOCUBE_SRC, the dependency seems to be on the name of the variable and not on the content of itself.
I may suppose that there is some "hardcoded" dependency on it, I've tryed to rename it to "NANOCUBE_HOME" in every parameter used during the configuration but it fail at make/make install step.
I hope that my report would be helpfull for your development.
Best regards
Luca

Compile error

Please could you advise how to get past this error?

mongoose.o: In function `load_dll':
/home/bencevans/Development/nanocube/src/mongoose.c:3834: undefined reference to `dlopen'
/home/bencevans/Development/nanocube/src/mongoose.c:3846: undefined reference to `dlsym'

Data streaming

Is it possible to save(on disk) or cut a piece of the tree, in order to become viable for a data streaming problem?

which version of pandas is used?

Hi guys,

I want to use nanocube-binning-csv, but find following errors:

AttributeError: 'module' object has no attribute 'isnumeric'

So I downgrade my pandas version to 0.15=6.2, however it can not accept 'to_numeric'.
So my question is "which version should I use to make it work?"

Regards,
Hawk

Running over HTTPS

Hi there! We're wanting to deploy nanocubes through iframe in web portal(s) - we've implement a replace rule on application nginx server http://" and "https://" to just "//" (so protocol independent URL) - but this doesn't appear sufficient as nanocube server side JavaScript is making http requests which breaks if front end portal is https. Equally if we hard code https URLs in the nanocube backend, we can't support portals running over http. Is there a protocol independent solution we can implement on nanocube server side? Any guidance would be most appreciated. Kind regards, Mark

Warn when insertions are happening out of order

It might be worth adding a warning to the user anytime an out-of-order (in the TimeSeries dimension) insertion happens. It's an easy mistake to make that triggers a 100x slowdown (it's accidentally just bitten us), and it's not completely obvious that the out-of-order time variable is the culprit.

More general querying infrastructure

Our "core" query language is a conjunction of clauses across dimensions. This core language is good in that if the query is resolution-bounded, so is the time it takes to answer the query.

At the same time, very common and important queries fall outside this. (Typical example: difference between two heatmaps).

This issue will track the progress of a general querying infrastructure.

Compiling problem

Running the following command :

../configure --prefix=$NANOCUBE_SRC CXXFLAGS="-O3"

Gives me the following output :

configure: Detected BOOST_ROOT; continuing with --with-boost=/usr/include/boost157
checking for Boost headers version >= 1.48.0... no
configure: error: cannot find Boost headers version >= 1.48.0

Before that, using the ./bootstrap command, I received the following error :

configure.ac:30: error: possibly undefined macro: LT_LIB_DLLOAD
If this token and others are legitimate, please use m4_pattern_allow.
See the Autoconf documentation

PS : I have installed boost in my machine before, following this tutorial

Obtain Coordinates of Polygon Corners

According to our project, we would like to know the coordinates of the polygon corners while the user is drawing it, in particular on click event on the map; so we would like to use this code of the library Leaflet JS or something similar:

L.ClickHandler = L.Handler.extend({
  addHooks: function() {
    L.DomEvent.on(document, 'click', this._captureClick, this);
  },
 
  removeHooks: function() {
    L.DomEvent.off(document, 'click', this._captureClick, this);
  },
  _captureClick: function(event) {
    latElong = mymap.mouseEventToLatLng(event)
  alert(mymap.mouseEventToLatLng(event))
  return latElong
  }
});
 
L.Map.addInitHook('addHandler', 'click', L.ClickHandler);
 
var mymap = L.map('mapid', {
  click: true
}).setView([51.505, -0.09], 13);

and obtaining a similar result:
image

Our question is: where could we insert this function in your project? Where and how is defined the map in your project?

Thank you for your attention
Best regards,
Nicholas

i got this error while making , what should i do to fix this error

[root@centos nanocube-master]# make
make all-recursive
make[1]: Entering directory /tmp/nanocube-master' Making all in src make[2]: Entering directory/tmp/nanocube-master/src'
depbase=echo ncdmp.o | sed 's|[^/]*$|.deps/&|;s|\.o$||';
g++ -DHAVE_CONFIG_H -I. -I.. -I../src -I/usr/include -I/usr/include -D_GLIBCXX_USE_NANOSLEEP -D_GLIBCXX_USE_SCHED_YIELD -pthread -DVERSION="2014.03.25_13:26" -g -O2 -std=c++0x -MT ncdmp.o -MD -MP -MF $depbase.Tpo -c -o ncdmp.o ncdmp.cc &&
mv -f $depbase.Tpo $depbase.Po
In file included from ncdmp_base.hh:1,
from ncdmp.cc:1:
DumpFile.hh:111: error: function definition does not declare parameters
DumpFile.hh: In member function 'bool dumpfile::DumpFileDescription::isBinary() const':
DumpFile.hh:105: error: 'encoding' was not declared in this scope
DumpFile.hh: In member function 'bool dumpfile::DumpFileDescription::isText() const':
DumpFile.hh:107: error: 'encoding' was not declared in this scope
ncdmp.cc: In function 'int main(int, char*)':
ncdmp.cc:226: error: expected initializer before ':' token
ncdmp.cc:229: error: expected primary-expression before '}' token
ncdmp.cc:229: error: expected ';' before '}' token
ncdmp.cc:229: error: expected primary-expression before '}' token
ncdmp.cc:229: error: expected ')' before '}' token
ncdmp.cc:229: error: expected primary-expression before '}' token
ncdmp.cc:229: error: expected ';' before '}' token
ncdmp.cc:232: error: 'struct dumpfile::DumpFileDescription' has no member named 'encoding'
ncdmp.cc:234: error: 'struct dumpfile::DumpFileDescription' has no member named 'encoding'
make[2]: *
* [ncdmp.o] Error 1
make[2]: Leaving directory /tmp/nanocube-master/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory/tmp/nanocube-master'
make: *** [all] Error 2

configure problem

hello guys,

I met a stupid problem when I tried to configure it ....

configure: WARNING: boost/thread.hpp: present but cannot be compiled
configure: WARNING: boost/thread.hpp: check for missing prerequisite headers?
configure: WARNING: boost/thread.hpp: see the Autoconf documentation
configure: WARNING: boost/thread.hpp: section "Present But Cannot Be Compiled"
configure: WARNING: boost/thread.hpp: proceeding with the compiler's result

I googled a lot and tried all possible solutions, but still it does not work.

any clue?

system config:
os: ubuntu 14.04.1
gcc/g++ 4.8.2

thanks a lot.

nanocubes 1.0 compiled - now what?

Hi guys,

I compiled 1.0 successfully but now what shall I do to try out the webclient? I assume I have to give some data to stdin of the stree_serve binary?

Thanks so much!
Daniel

what is Coordinate System?

I am in china 。the lat and lng is change。I don't konw what is Coordinate System in nanocube。 Is it WGS84?

extents

How does one retrieve the extents of different dimensions?

In particular, the time dimensions have the lower bound specified in the tbin metadata, but is there an upper bound? Are bounds known for the spatial dimensions?

nanocube pseudocode obscurities

Hello,

I am trying to comprehend the algorithm to build a nanocube. My final goal is to multithread the building process.
To understand the pseudocode which comes with the nanocube paper, I tried to apply it to the illustration on page 2 (Fig. 2).
Adding the first point (o1) works fine and I get the exact same result as illustrated. If I try to add the second point (o2) I run into the following problem:
Starting from the nanocube #1 in Fig. 2, we want to add the second point o2 ((0,1), (01,10) ; IPhone). According to the pseudocode on page 3 (Fig. 3), the following instructions will be performed in this order:
updated_nodes = empty set
ADD(nano_cube, o2, 1, S, ltime, updated_nodes)
[l1, l2] = CHAIN(S, 1)
stack = TRAILPROPERPATH(nano_cube, [l1(o2), l2(o2)])
stack = STACK()
PUSH(stack, nano_cube)
node = nano_cube
child = CHILD(nano_cube, (0,1))
PUSH(stack, (0,1))
node = (0,1)
child = CHILD((0,1), (01,10))
PUSH(stack, (01,10))
node = (01,10)
return stack
child = null
node = POP(stack) // (01,10)
update = false
update = true // Content is proper
ADD(catNode, o2, 2, updated_nodes) // catNode is the node under (01,10)
[ld] = CHAIN(S, 2) //ld is the device labeling function
stack = TRAILPROPERPATH(catNode, [ld(o2)])
stack = STACK()
PUSH(stack, catNode)
node = catNode
child = CHILD(catNode, “IPhone”)
child = NEWPROPERCHILD(catNode, “IPhone”, NODE())
PUSH(stack, “IPhone”)
node = child
return stack
child = null
node = POP(stack) // “IPhone” node
update = false
SETPROPERCONTENT(“IPhone”, SUMMEDTABLETIMESERIES()) //IPhoneTimeSeries
update = true
INSERT(IPhoneTimeSeries, ltime(o2))
INSERT(updated_nodes, IPhoneTimeSeries)
child = “IPhone” // IPhone Node
node = POP(stack) // catNode
update = false
### Weirdness begins here ### Content of catNode is shared and not in updated_nodes
shallowCopy = SHALLOWCOPY(androidTimeSeries) //CONTENT(catNode) is the Android Timeseries, isn’t it?
node_sc = NODE()
SETSHAREDCONTENT(node_sc, CONTENT(androidTimeSeries)) //what is the value of the second argument in this case? What is the content of a timeseries?

Did I do something wrong?

nanocube.js has bug

line 168 has a bug。 var n_records = data.byteLength / record_size change to var n_records = Math.floor(data.byteLength / record_size);

Barchart selection and coordination bug on JS front-end for v1.0

In the BrightKite demo, if you select multiple bars from one histogram, it only uses the selection that is lowest in index. So if you select both 'Mon' and 'Tue', it will act as though only 'Mon' is selected, even though that isn't true (too see this note that the time series char doesn't change).

OSX nanocube-binnig-csv could not find program

Hello there,
I've checked on Nabble forum but haven't found the related topic soooo...
Here I'm, let's try this beautifull stuff :)
So

  1. my env is:
    -OSX
    -python installed with pandas in separeted env (I followed your guide)

  2. my csv is (header and first line):

type,date,longitude,latitude,gender,age,ethnicity,outcome,clothes_removal
Person search,2014-12-01T00:10:00+00:00,-2.571604,51.414716,Male,over 34,White,Nothing found - no further action,

...many other lines

  1. I execute those commands in a brand new bash:
unset PYTHONHOME
unset PYTHONPATH
source $NANOCUBE_SRC/myPy/bin/activate
  1. creating the dump (some warning on the date/deprecated etc.. but seems ok)
    nanocube-binning-csv --sep=',' --latcol='latitude' --loncol='longitude' --timecol='date' --catcol='type','gender','age','ethnicity','outcome','clothes_removal' path_to_csv/stop_search.csv > stop_search.dmp

  2. that's the .dmp header (first 37 lines+blank line)

name: same_path_to_csv/stop_search.csv
encoding: binary
metadata: location__origin degrees_mercator_quadtree25
field: location nc_dim_quadtree_25
field: type nc_dim_cat_1
valname: type 0 Person_and_Vehicle_search
valname: type 1 Person_search
field: gender nc_dim_cat_1
valname: gender 1 Male
valname: gender 2 Other
valname: gender 0 Female
field: age nc_dim_cat_1
valname: age 1 18-24
valname: age 2 25-34
valname: age 0 10-17
valname: age 4 under_10
valname: age 3 over_34
field: ethnicity nc_dim_cat_1
valname: ethnicity 3 White
valname: ethnicity 2 Other
valname: ethnicity 1 Black
valname: ethnicity 0 Asian
field: outcome nc_dim_cat_1
valname: outcome 0 Article_found_-Detailed_outcome_unavailable
valname: outcome 2 Nothing_found
-_no_further_action
valname: outcome 6 Suspect_arrested
valname: outcome 4 Offender_given_drugs_possession_warning
valname: outcome 7 Suspect_summonsed_to_court
valname: outcome 1 Local_resolution
valname: outcome 3 Offender_cautioned
valname: outcome 5 Offender_given_penalty_notice
field: clothes_removal nc_dim_cat_1
valname: clothes_removal 1 True
valname: clothes_removal 0 False
metadata: tbin 2014-12-01_00:00:00_3600s
field: date nc_dim_time_2
field: count nc_var_uint_4

  1. here comes the problems: trying nanocube-leaf
    cat stop_search.dmp | nanocube-leaf -q 29512 -f 10000
    that's the output:

VERSION: 3.2.1 Could not find program: /Users/my_user_name/nanocube-3.2.1/bin/nc_q25_c1_c1_c1_c1_c1_c1_u2_u4

Where do I mess something ? :(
Thanks in advance,
Luca
Sorry for my verbosity, just wish it could help

"Invalid Path Size"

I'm trying a new dataset, and during the build process, I get

terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid Path Size

I don't know if this error message is from FlatTree.hh or from FlatTreeN.hh, though.

I'm happy to share the dataset, but I wanted to confirm this is a bug and not a problem in my data formatting. It happens about 800k elements into the dataset, so some of the dataset does get processed correctly.

Blank page when try crime50k case

Hi All,
I followed the tutorial to install the nanocube from master branch.
I run this in scripts folder:
python csv2Nanocube.py --catcol='Primary Type' --latcol='Latitude' --loncol='Longitude' crime50k.csv | NANOCUBE_BIN=../src ../src/ncserve --rf=100000 --threads=100
In the browser: http://localhost:8000
I can see the control panel, but the map side is blank with right top and left down part gray.

Is that normal?
Thank you,
Colin

Include clean shutdown mechanism on master branch

Is there a more graceful way to stop a nanocube server than sending SIGQUIT or SIGKILL or whatever? I want to stop the server every night so that it can be reloaded with new data (there are updates released daily).

No, we don’t have that on the master branch. On v1.0 we had a mechanism by sending a shutdown request (something like http://host:port/shutdown=key). I will add an issue to include back this feature on the master branch.

dmp to nanocube (with SF Taxis example)

Hi,

I'm currently trying to pass a custom dmp file to nanocube in order to visualize my dataset.

But firstly, I wanted to test the example with the sftaxi.dmp given here :
https://github.com/laurolins/nanocube/blob/master/web/README

I follow the instructions, but when I go here :
http://localhost:8000/sftaxi_src.html

I see nothing and get the following error in firebug:
TypeError: str is undefined, leaflet-src.js (ligne 138)

I tried to update leaflet (to 0.7.3), but no more luck.

Do you have any ideas?

EDIT:
Another question, on the same thread. In the example.dmp given on the wiki, the data seems to be coded, and there is no "header" with fields like in the sftaxi.dmp. Besides, in the sftaxi.dmp, the data in written in "clear".. I'm a bit lost..

Eventually, could you explain to me the following command ? :
cat sftaxi.dmp |
ncdmp --encoding=b
dim-dmq=src,src_lat,src_lon,25
dim-dmq=dst,dst_lat,dst_lon,25
dim-tbin=time,time,2008-01-01_1h,2
var-one=count,4
| ncserve --rf=100000 --port=29513

Thank you in advance

Julien

the server isn't running. (crime50k case)

Hi All,

I followed the tutorial to install nanocube
And I run this script in scripts folder:
python csv2Nanocube.py --catcol='Primary Type' crime50k.csv | NANOCUBE_BIN=../src ../src/ncserve --rf=100000 --threads=100
and I got the following log:
//*********************************************
"VERSION: 2014.03.25_13:26
nc_dim_quadtree_25
quadtree dimension with 25 levels
nc_dim_cat_1
categorical dimension with 1 bytes
nc_dim_time_2
time dimension with 2 bytes
nc_var_uint_4
time dimension with 4 bytes
Dimensions: q25_c1
Variables: u2_u4
Registering handler: query
Registering handler: binquery
Registering handler: binqueryz
Registering handler: tile
Registering handler: tquery
Registering handler: bintquery
Registering handler: bintqueryz
Registering handler: stats
Registering handler: schema
Registering handler: valname
Registering handler: tbin
Registering handler: summary
Registering handler: graphviz
Registering handler: version
Registering handler: timing
Registering handler: start
Starting NanoCubeServer on port 29512
Mongoose starting 100 threads
ncserve: TaggedPointer.hh:47: void tagged_pointer::TaggedPointer::setPointer(T
) [with T = quadtree::Node<flattree::FlatTree<timeseries::TimeSeries<nanocube::TimeSeriesEntryType<boost::mpl::vector<nanocube::u2, nanocube::u4> > > > >]: Assertion `data.aux.tag == 0 || data.aux.tag == 0xFF' failed."
*
*************************************************************************************************************//

I think that the nanocubes server is not running and I tried to change the nanocubes server port by editing config.json (in web folder) but this file does not exit.

Could you give me some advices?

P/s: when I ran “make” command to complie nanocubes. I got:
//*********************************************
“In file included from ContentHolder.hh:3:0,
from QuadTreeNode.hh:11,
from QuadTree.hh:13,
from NanoCube.hh:34,
from nc.cc:12:
TaggedPointer.hh:14:41: warning: left shift count >= width of type [enabled by default]
static const UInt64 bit47 = (1UL << 47);”
***************************************************************************************************************//
Is that normal?

Best,
LinhTH

Nanocubes for time ranges?

Hi there! I'm grappling with an OLAP style problem and I'm hoping to apply nanocubes, but I'm not entirely sure how well my problem maps to this domain.

I've got an event stream representing changes to a set of entities. Something like 30 million entities, each of which might have a dozen dimensions. New events for each entity could arrive years or seconds apart. There is no spatial component to the data.

I mostly answer queries along the lines of 'at midnight every day between 2015-01-01 and 2015-07-31, how many entities had dimensions A = 1, B = 8, C = 3'. Maybe a colloquial way of stating the problem could be 'at midnight each day, how many people are watching netflix, eating popcorn, and wearing red socks'. My event stream only tells me when events change.

So in Postgres (after months of research into the validity of this approach) I end up building table partitions for each dimension, each row containing the entity id, the dimension's value, and the tsrange for which this fact was true. Then the problem reduces to intersecting time ranges, and building plain old macro scale cubes to cache aggregation results. But the bloat is staggering: ~6 GB of compressed data when unpacked this way and indexed tops 120 GB, and I'm not even considering all the possible dimensions yet. I feel like I'm forcing myself towards a big data problem I shouldn't have.

How might one introduce the concept of an event with a duration into a nanocube? If you can point me in the right direction I'll be sure to contribute some sample code back to the repo :)

Issue with year dates

Hi,

I've got in my csv file a time column with years, from 1901 to 2014.

When I execute that command, I get no error.
python csv2Nanocube.py --sep="," --catcol='type' --latcol="lat" --loncol="lon" --timecol="year" file.csv | NANOCUBE_BIN=../src ../src/ncserve --rf=100000 --threads=100

But in my browser, the time plot is not correct. It stops in 1908 and the curves are not correct.

How can I solve my problem? I tried to use the --datefmt but it didn't work.

Julien
plot

parallelizing the build process

Hello,

We at YP Mobile Labs are currently working with this interesting technology to visualize some of our datasets.
We compiled ourselves a nc_q25_c1_c1_c1_c1_c1_c1_c1_c1_c1_c1_c1_c1_u2_u4 cube, which works fine, but it takes approximately two weeks to put ~200M points in it.
How can we speed up the building process? We thought about multithreading the building process. If that is a suitable way, can you guys provide us with some information and tips on how to do this in the best way possible?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.