Hey, so I am trying to make the system work with very specific data (non randomised train,valid,test data), No matter what, the accuracy never goes above 60%. With your randomised code, same dataset, it goes up to 90% (in accuracy measurement), but I need to add specific datasets for train/valid/test.
Any chance you could help somehow?
Newly created list looks something like this: https://gyazo.com/7c6222c37cc8176f3af541ef310709d6 .
Other thing I've noticed was that the main data file y IDs are different, so instead of "dask.array.image.imread('data/jpg/*.jpg')" I do each file individually, combine it. Still gives me 60% only, accuracy. This code seems to work, outputs of *.jpg and the code bellow match (with 2 layers ordered as needed)
# Manually load all images into one dataset
def loadAllImages(filePath = ""):
removeFile(filePath)
d_list = pd.read_csv("data/list_full.csv")
d = np.zeros(shape=(18000, 192, 192), dtype="uint8")
for i, j in d_list.iterrows():
di = dask.array.image.imread('data/jpg/' + j["filename"] + '.jpg')
dn = di.compute()
d[i] = dn
ddask = da.from_array(d, chunks=(1, 192, 192))
ddask.to_hdf5(filePath, 'data')
print("Finished compiling data", ddask, ddask.shape, ddask.size, type(ddask))
And instead of the random picking of train/valid/test I've changed to this code, which should in theory represent it perfectly. This matches the .csv list of files
full_idx = []
for i in range(data_size*out_dim):
full_idx.append(i)
full_idx = np.array(full_idx)
tr_idx = full_idx[:(tr_size*out_dim)]
va_idx = full_idx[(tr_size*out_dim):(tr_size*out_dim) + (va_size*out_dim)]
te_idx = full_idx[(tr_size*out_dim) + (va_size*out_dim):]
I would love you forever if I could get any more of your help on this one, as this is an amazing system that I would wish to get working propely.