Comments (20)
I'll experiment. Have to see if it works with the slash on 'dataset'
from hdf5.node.
Should work without the slash; just the name
from hdf5.node.
Slash or no slash, I keep getting the same error when I increase the array's length from 10000 to 100000. I'll try to bisect until I find the exact length that triggers this behaviour.
from hdf5.node.
It breaks when going from a length of 73901 to 73902.
Also, when I examine the file with h5dump -d /datasetName
, I'm getting the JSON representation of the whole array as the first point in the dataset.
EDIT: I was wrong, apologies. It looks like JSON but it's not JSON.
This is the header for a Uint16 dataset within the same file:
DATASET "/station_id" {
DATATYPE H5T_STD_U16LE
DATASPACE SIMPLE { ( 100 ) / ( 100 ) }
ATTRIBUTE "type" {
DATATYPE H5T_STD_U32LE
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
}
}
This is the header for the string dataset:
DATASET "/station_name" {
DATATYPE H5T_ARRAY { [100] H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
} }
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
}
My code is based on the tutorial for variable length strings here: http://hdf-ni.github.io/hdf5.node/tut/dataset-tutorial.html .
from hdf5.node.
What are you filling in the Array entries with?
I suppose for a test a random string generator could b used. Or find a text document wth over 80,000 lines...
Testing
from hdf5.node.
The following code
var fs = require('fs');
var hdf5 = require('../common/hdf5').hdf5;
var h5lt = require('../common/hdf5').h5lt;
var h5gl = require('../common/hdf5').h5gl;
var path = require('path');
var shortid = require('shortid');
var filePath = path.join(__dirname, 'test-hdf5.h5');
var file = new hdf5.File(filePath, h5gl.Access.ACC_TRUNC);
var length = 10;
var dataset = new Array(length);
for (var i = 0; i < length; i++) {
dataset[i] = shortid.generate();
}
h5lt.makeDataset(file.id, 'test', dataset);
file.close();
produces a file that when examined through h5dump -d /test --stride 1 --start 0 --count 1 products/test-hdf5.h5
shows the following:
HDF5 "products/test-hdf5.h5" {
DATASET "/test" {
DATATYPE H5T_ARRAY { [10] H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
} }
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
SUBSET {
START ( 0 );
STRIDE ( 1 );
COUNT ( 1 );
BLOCK ( 1 );
DATA {
(0): [ "r1_oscv0", "SkeOjocDR", "Sy-doo5DC", "SJfujjcwR", "Bkmuio9PA", "BJNuoo9D0", "rkBdsjqPA", "ry8OssqvR", "ryv_ii5wR", "ryd_ojqw0" ]
}
}
}
}
This is what I was referring to before - it looks like the entire array of strings is being stored as the first point the dataset rather than each string being treated as a separate point.
from hdf5.node.
I got a test case setup by reading in a pdb of the rat liver molecule from https://pdb101.rcsb.org/motm/114
It's close to a million lines and cuts out between 70000 and 80000.
So able to repeat and test
from hdf5.node.
It might have to do with some handle limit on linux
from hdf5.node.
For example on my ubuntu
cat /proc/sys/fs/file-max
808097
from hdf5.node.
I guess there are two sides to this - the cut out and the array of strings vs strings dataset.
Happy to contribute in any way I can. Feel free to send tests my way. I'll check the fs limit as soon as I get back home.
On 9 Oct 2016 6:28 p.m., rimmartin [email protected] wrote:I got a test case setup by reading in a pdb of the rat liver molecule from https://pdb101.rcsb.org/motm/114
It's close to a million lines and cuts out between 70000 and 80000.
So able to repeat and test
—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or mute the thread.
from hdf5.node.
filename = '/home/jacopo/data-backend/products/gistemp/gistemp.h5', file descriptor = 12, errno = 14, error message = 'Bad address', buf = 0x55c61fcac378, total write size = 422496, bytes this sub-write = 422496, bytes actually written = 18446744073709551615, offset = 1179648
filename = './roothaan.h5', file descriptor = 9, errno = 14, error message = 'Bad address', buf = 0x487f858, total write size = 98400, bytes this sub-write = 98400, bytes actually written = 18446744073709551615, offset = 1183744
The actually written is crazy in both your test and mine; but there is a bad address as the error message
from hdf5.node.
With my test as-is, i.e. using shortid.generate()
, I can go up to a length of 73862. A length of 73863 breaks one every two runs (more or less) and 73864 always breaks.
However, switching to the following filler loop only got me up to 73820, breaking on all runs from 73821 going upward.
for (var i = 0; i < length; i++) {
dataset[i] = 'hello ' + i;
}
Lengthening the string to 'helloworldhelloworld ' + i
still got me up to 73820. Curiously enough, inverting the order to i + ' hello'
got me to a different number, 73746.
There must be a pattern but I can't see it ATM. Perhaps we're hitting some kind of limit on how big an array of strings can be within an array of strings
-typed dataset (even though we shouldn't be getting an array of strings
-typed dataset in the first place).
PS: My file-max
is 200676
.
PS: Can I store fixed-length strings using node.hdf5?
from hdf5.node.
Yea I was testing with
dataset[i] = 'hello ' + '\0';
It feels like some limit is being hit; a heap or stack. Something. I may put the question to the hdfgroup after I search their email.
Yes, fixed was done for table columns. Let me test some; to make it clean I may add an option:
h5lt.makeDataset(file.id, '/dataset', dataset, {fixed-width=7});
for example
Will continue to look at large sizes of everything to look for breaks in the system
from hdf5.node.
That'd be lovely. Happy to test any solution you come up with.
from hdf5.node.
fixed width is coming. Need to test and work on the reading back to javascript.
For writing there is no need to fixed the length of the strings; just know the maximum length of them all. If this is too short for one string entry in the Array an exception will be thrown from the native side to insure data doesn't get messed up
h5lt.makeDataset(group.id, "Rat Liver", lines, {fixed_width: maxLength});
should commit this evening
from hdf5.node.
Wonderful, wonderful, wonderful.
from hdf5.node.
Hi, sorry for delay.
h5lt.makeDataset(group.id, "Rat Liver", lines, {fixed_width: maxLength});
now saves nearly 1 million lines from a text file for the rat liver pdb chemistry model. The fixed width is 80 in this case.
Need to test reading back to javascript yet
from hdf5.node.
I'm building their c examples and extending them to work with large data. Otherwise I've mirrored these examples in this project. Their docs don't say chunking is necessary but may need to
from hdf5.node.
fixed width is now working. Tested on about a million entries and ~74 Mb h5 file
h5lt.makeDataset(group.id, "Rat Liver", lines, {fixed_width: maxLength});
var readArray=h5lt.readDataset(group.id, "Rat Liver");
where the array is filled from a text file read and split on"\n"
const lineArr = ratLiver.trim().split("\n");
var lines=new Array(lineArr.length);
var index=0;
var maxLength=0;
/* Loop over every line. */
lineArr.forEach(function (line) {
if(index<lines.length){
lines[index]=line;
if(maxLength<line.length)maxLength=line.length;
}
index++;
});
Relooking at variable length
from hdf5.node.
variable length io is now working
from hdf5.node.
Related Issues (20)
- Support for Single Writer Multiple Reader (SWMR)? HOT 7
- Getting dataset attribute (getDatasetAttribute) on a 32 bit floating point in NODE/javascript returns a totally different value HOT 6
- hdf5_home_win does not get set HOT 2
- Read dataset with 2d array stored fails on reading chunks HOT 10
- issues reading 4 dimensional dataset HOT 2
- SyntaxError: unsupported data type on compound datasets HOT 5
- Node 12? HOT 32
- Delete attribute HOT 6
- Tutorial: "Writing & Reading subsets" is not working. HOT 3
- Segfault reading HOT 4
- issue with appending to tables on windows 10
- Error: The specified procedure could not be found. (process.dlopen) HOT 1
- `hdf5.File is not a constructor` when bundled with webpack HOT 2
- cannot install HOT 3
- Win10+Node v14.15.5 Compilation errors HOT 6
- Getting data from Buffer
- Compile Error HOT 12
- install with yarn? HOT 2
- windows-build-tools / vs2017? HOT 37
- Error with handling variable length data (H5T_VLEN) HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hdf5.node.