Git Product home page Git Product logo

Comments (8)

fipoucat avatar fipoucat commented on August 25, 2024

Maybe not issue, but more a data handling to fulfill HiClimR data structure, I wonder if possible to attach a sample file?

from hiclimr.

hsbadr avatar hsbadr commented on August 25, 2024

The observations (time dimension) should not include any missing values. Since HiClimR does clustering based on correlation distance, all time steps for a specific location/point should be valid. It removes the rows (locations/points) that has any missing values and that could be all rows if one or more years are missing. You need to remove all columns with missing values manually because otherwise the dissimilarity measure (correlation distance) will represent something else. For example, if you are interested in interannual correlations, it is important to keep valid data every year instead of randomly providing information at different frequency.

Solution: Make sure that you have enough rows (>2) with no missing values or handle missing values before passing the data to HiClimR.

from hiclimr.

fipoucat avatar fipoucat commented on August 25, 2024

from hiclimr.

fipoucat avatar fipoucat commented on August 25, 2024

Sorry Hamada,

I had to update the post because the NAs were added by R where it is zero. I change it but still some problems: file look like this;
1961 1962 1963 1964 1965 1966
[1,] -35.25 -9.75 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
[2,] -35.25 -9.25 151.78334045 135.20834351 129.99166870 161.87500000 111.40833282 121.28334045
[3,] -35.25 -8.75 157.24166870 135.23333740 141.16667175 215.67500305 144.05833435 153.30000305
[4,] -35.25 -8.25 161.11666870 129.23333740 139.60833740 189.05000305 129.08334351 168.54167175
[5,] -35.25 -7.75 152.60833740 103.01667023 107.14167023 183.56666565 124.19166565 160.69166565
[6,] -35.25 -7.25 140.84167480 98.59166718 115.30833435 219.06666565 128.42500305 145.00000000
[7,] -35.25 -6.75 132.85000610 87.50833893 113.50833130 203.16667175 121.45833588 124.03333282
[8,] -35.25 -6.25 115.58333588 77.18333435 93.69166565 183.38333130 121.65833282 110.37500000
[9,] -35.25 -5.75 99.72499847 72.81666565 68.29167175 156.61666870 115.56666565 89.47500610
[10,] -35.25 -5.25 80.71666718 61.26666641 52.80000305 130.15834045 95.60833740 65.12500000
[11,] -35.25 -4.75 0

The command i use end up with an error:

y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE,

  •          continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE,
    
  •          standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL,
    
  •          members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE,
    
  •          validClimR = TRUE, k = 5, minSize = 1, alpha = 0.01,
    
  •          plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)
    

PROCESSING STARTED

Checking Multivariate Clustering (MVC)...
---> x is a matrix
---> single-variate clustering: 1 variable
Checking data...
---> Checking dimensions...
---> Checking row names...
---> Checking column names...
Data filtering...
---> Computing mean for each row...
---> Checking rows with mean bellow meanThresh...
---> 5697 rows found, mean ≤ 10
---> Computing variance for each row...
---> Checking rows with near-zero-variance...
---> 0 rows found, variance ≤ 0
Data preprocessing...
---> Applying mask...
---> Checking columns with missing values...
---> Removing linear trend...
Error in x - t(fitted(lm(t(x) ~ as.integer(colnames(x))))) :
non-conformable arrays

I extacted a region froma global date is this a problem? because I see you use continent like "Africa". How to it for a region? what you this is still creating the non conformable arrays?

from hiclimr.

hsbadr avatar hsbadr commented on August 25, 2024

What's the size of your matrix? You set the mean threshold to 10, which masks out 5697 rows (try to use meanThresh = 0). Also, check the column names or try to use coarseR (change the steps as you wish, 1 means keeping the original data):

colnames(x) <- NULL
xc <- coarseR(x = x, lon = lon, lat = lat, lonStep = 1, latStep = 1)
lon <- xc$lon
lat <- xc$lat
x <- xc$x

Finally, disable standardization and detrending: detrend = FALSE, standardize = FALSE.

It seems to me that HiClimR can't find valid rows in the matrix you provided.

from hiclimr.

fipoucat avatar fipoucat commented on August 25, 2024

Sorry Hamada,

I had to update the post because the NAs were added by R where it is zero. I change it but still some problems: file look like this;
1961 1962 1963 1964 1965 1966
[1,] -35.25 -9.75 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
[2,] -35.25 -9.25 151.78334045 135.20834351 129.99166870 161.87500000 111.40833282 121.28334045
[3,] -35.25 -8.75 157.24166870 135.23333740 141.16667175 215.67500305 144.05833435 153.30000305
[4,] -35.25 -8.25 161.11666870 129.23333740 139.60833740 189.05000305 129.08334351 168.54167175
[5,] -35.25 -7.75 152.60833740 103.01667023 107.14167023 183.56666565 124.19166565 160.69166565
[6,] -35.25 -7.25 140.84167480 98.59166718 115.30833435 219.06666565 128.42500305 145.00000000
[7,] -35.25 -6.75 132.85000610 87.50833893 113.50833130 203.16667175 121.45833588 124.03333282
[8,] -35.25 -6.25 115.58333588 77.18333435 93.69166565 183.38333130 121.65833282 110.37500000
[9,] -35.25 -5.75 99.72499847 72.81666565 68.29167175 156.61666870 115.56666565 89.47500610
[10,] -35.25 -5.25 80.71666718 61.26666641 52.80000305 130.15834045 95.60833740 65.12500000
[11,] -35.25 -4.75 0

The command i use end up with an error:

y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE,

  •          continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE,
    
  •          standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL,
    
  •          members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE,
    
  •          validClimR = TRUE, k = 5, minSize = 1, alpha = 0.01,
    
  •          plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)
    

PROCESSING STARTED

Checking Multivariate Clustering (MVC)...
---> x is a matrix
---> single-variate clustering: 1 variable
Checking data...
---> Checking dimensions...
---> Checking row names...
---> Checking column names...
Data filtering...
---> Computing mean for each row...
---> Checking rows with mean bellow meanThresh...
---> 5697 rows found, mean ≤ 10
---> Computing variance for each row...
---> Checking rows with near-zero-variance...
---> 0 rows found, variance ≤ 0
Data preprocessing...
---> Applying mask...
---> Checking columns with missing values...
---> Removing linear trend...
Error in x - t(fitted(lm(t(x) ~ as.integer(colnames(x))))) :
non-conformable arrays

I extacted a region froma global date is this a problem? because I see you use continent like "Africa". How to it for a region? what you this is still creating the non conformable arrays?

from hiclimr.

fipoucat avatar fipoucat commented on August 25, 2024

Using the setting you gave gone without error and produced a plot. My file have 57 years rainfall data for a window -10 to 25 lat and -30 to -25 lon

y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE,

  •          continent = "Africa", meanThresh = 0, varThresh = 0, detrend = FALSE,
    
  •          standardize = FALSE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, 
    
  •          members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE, 
    
  •          validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, 
    
  •          plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)
    

PROCESSING STARTED

Checking Multivariate Clustering (MVC)...
---> x is a matrix
---> single-variate clustering: 1 variable
Checking data...
---> Checking dimensions...
---> Checking row names...
---> Checking column names...
Data filtering...
---> Computing mean for each row...
---> Checking rows with mean bellow meanThresh...
---> 3735 rows found, mean ≤ 0
---> Computing variance for each row...
---> Checking rows with near-zero-variance...
---> 0 rows found, variance ≤ 0
Data preprocessing...
---> Applying mask...
---> Checking columns with missing values...
Agglomerative Hierarchical Clustering...
---> Computing correlation/dissimilarity matrix...
---> Starting clustering process...
---> Constructing dendrogram tree...
Calling cluster validation...
---> Computing cluster means...
---> Computing inter-cluster correlations...
---> Computing intra-cluster correlations...
---> Computing summary statistics...
Generating region map...

PROCESSING COMPLETED

Running Time:
user system elapsed
5.585 0.518 6.109
Time difference of 6.109582 secs
Maybe I need to adjust the settings to have more rows considered

from hiclimr.

hsbadr avatar hsbadr commented on August 25, 2024

You should be careful when setting thresholds for data processing. For example, meanThresh will mask out the points the receives rainfall less than the threshold value, which could be all of your data depending on the threshold value and data range/unit. Invalid data with near-zero variance (~constant year to year) will be excluded too.

I'm closing this issue now.

from hiclimr.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.