Git Product home page Git Product logo

Comments (2)

kbenoit avatar kbenoit commented on June 22, 2024

@koheiw I have noticed this too, and you can see it in the current CRAN results for r-patched-linux-x86_64:

Version: 4.0.1
Check: tests
Result: ERROR 
    Running ‘spelling.R’ [0s/1s]
    Running ‘testthat.R’ [127s/147s]
  Running the tests in ‘tests/testthat.R’ failed.
  Complete output:
    > Sys.setenv("R_TESTS" = "")
    > Sys.setenv("_R_CHECK_LENGTH_1_CONDITION_" = TRUE)
    > 
    > library(testthat)
    > library(quanteda)
    Package version: 4.0.1
    Unicode version: 15.0
    ICU version: 72.1
    Parallel computing: 3 of 32 threads used.
    See https://quanteda.io for tutorials and examples.
    > 
    > # for strong tests for Matrix deprecations
    > options(Matrix.warnDeprecatedCoerce = 2)
    > 
    > ops <- quanteda_options()
    > quanteda_options(reset = TRUE)
    > test_check("quanteda")
    b a c e f g 
    1 1 0 0 0 0 
    a c b e f g 
    1 0 1 0 0 0 
    [ FAIL 1 | WARN 0 | SKIP 26 | PASS 3099 ]
    
    ══ Skipped tests (26) ══════════════════════════════════════════════════════════
    • Behaviour changed - consider removing test (1): 'test-tokens.R:120:5'
    • Different behaviour for convert() v. old as.data.frame.dfm() (1):
      'test-as.dfm.R:131:5'
    • On CRAN (5): 'test-tokens-word4.R:108:5', 'test-tokens-word4.R:347:5',
      'test-tokens.R:373:5', 'test-tokens_ngrams.R:105:5', 'test-tokens_xptr.R:1:1'
    • as.numeric(stringi::stri_info()$Unicode.version) > 10 &&
      as.numeric(stringi::stri_info()$ICU.version) > (1): 'test-tokens.R:1036:5'
    • dplyr cannot be loaded (1): 'test-corpus.R:281:5'
    • lda cannot be loaded (1): 'test-convert.R:199:5'
    • not implemented yet (1): 'test-corpus.R:217:4'
    • purrr cannot be loaded (1): 'test-dfm.R:429:5'
    • requires spacyr installation to work (1): 'test-spacyr-methods.R:32:5'
    • skipping test of option setting when quanteda is not attached (1):
      'test-quanteda_options.R:52:5'
    • stm cannot be loaded (3): 'test-convert.R:6:5', 'test-convert.R:34:5',
      'test-convert.R:49:5'
    • text2vec cannot be loaded (1): 'test-fcm.R:2:5'
    • the verbose message has been changed (4): 'test-dfm.R:785:5',
      'test-tokens-word1.R:24:5', 'test-tokens-word4.R:607:5',
      'test-tokens.R:709:5'
    • tidytext cannot be loaded (1): 'test-as.dictionary.R:63:5'
    • topicmodels cannot be loaded (1): 'test-convert.R:174:5'
    • we no longer expect these to be the same (1): 'test-tokens.R:764:5'
    • whether these pass depends on the platform (1): 'test-tokens-custom.R:17:5'
    
    ══ Failed tests ════════════════════════════════════════════════════════════════
    ── Failure ('test-fcm.R:55:5'): fcm works with dfm and tokens in the same way ──
    diag(as.matrix(fcmat_toks_doc)) not equivalent to diag(as.matrix(fcmat_toks_win_ord)).
    2/6 mismatches (average diff: 1)
    [2] 1 - 0 ==  1
    [3] 0 - 1 == -1
    
    [ FAIL 1 | WARN 0 | SKIP 26 | PASS 3099 ]
    Error: Test failures
    Execution halted
Flavor: [r-patched-linux-x86_64](https://www.r-project.org/nosvn/R.check/r-patched-linux-x86_64/quanteda-00check.html)

You can see in 0a49557 that I already weakened the test, since I thought it might be something to do with how a new behaviour changed the vector element names (I suspected that one of the functions had deleted them. It's also why I printed them in test-fcm.R:53-54.

I should not have weakened that test - it should always pass with expect_equal(). This appears to be one of those random "multi-threading scrambles something in an unpredictable way" bugs. It didn't show up during the initial CRAN checks (or they would not have accepted it) but did arise, apparently randomly, during the most recent nightly check cycles. In case it disappears again, I've pasted the results above.

from quanteda.

koheiw avatar koheiw commented on June 22, 2024

I might need to do this in recomile() to token ID unique.

std::size_t I = xptr->texts[h].size();
Text text(I);
for (std::size_t i = 0; i < I; i++) {
if (xptr->texts[h][i] == 0) {
text[i] = 0;
count_pad++;
} else {
if (asis) {
text[i] = xptr->texts[h][i]; // for dictionary
} else {
if (ids[xptr->texts[h][i] - 1] == 0) {
ids[xptr->texts[h][i] - 1] = id;
id++;
}
text[i] = ids[xptr->texts[h][i] - 1];
}
}
}

from quanteda.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.