The following expectation from testing that "fcm works with dfm and tokens in the same

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I might need to do this in recomile() to token ID unique. <div class="Box Box-

"randomly" failing test-fcm about quanteda HOT 2 CLOSED

bastistician commented on June 22, 2024 1

"randomly" failing test-fcm

from quanteda.

Comments (2)

kbenoit commented on June 22, 2024

@koheiw I have noticed this too, and you can see it in the current CRAN results for r-patched-linux-x86_64:

Version: 4.0.1
Check: tests
Result: ERROR 
    Running ‘spelling.R’ [0s/1s]
    Running ‘testthat.R’ [127s/147s]
  Running the tests in ‘tests/testthat.R’ failed.
  Complete output:
    > Sys.setenv("R_TESTS" = "")
    > Sys.setenv("_R_CHECK_LENGTH_1_CONDITION_" = TRUE)
    > 
    > library(testthat)
    > library(quanteda)
    Package version: 4.0.1
    Unicode version: 15.0
    ICU version: 72.1
    Parallel computing: 3 of 32 threads used.
    See https://quanteda.io for tutorials and examples.
    > 
    > # for strong tests for Matrix deprecations
    > options(Matrix.warnDeprecatedCoerce = 2)
    > 
    > ops <- quanteda_options()
    > quanteda_options(reset = TRUE)
    > test_check("quanteda")
    b a c e f g 
    1 1 0 0 0 0 
    a c b e f g 
    1 0 1 0 0 0 
    [ FAIL 1 | WARN 0 | SKIP 26 | PASS 3099 ]
    
    ══ Skipped tests (26) ══════════════════════════════════════════════════════════
    • Behaviour changed - consider removing test (1): 'test-tokens.R:120:5'
    • Different behaviour for convert() v. old as.data.frame.dfm() (1):
      'test-as.dfm.R:131:5'
    • On CRAN (5): 'test-tokens-word4.R:108:5', 'test-tokens-word4.R:347:5',
      'test-tokens.R:373:5', 'test-tokens_ngrams.R:105:5', 'test-tokens_xptr.R:1:1'
    • as.numeric(stringi::stri_info()$Unicode.version) > 10 &&
      as.numeric(stringi::stri_info()$ICU.version) > (1): 'test-tokens.R:1036:5'
    • dplyr cannot be loaded (1): 'test-corpus.R:281:5'
    • lda cannot be loaded (1): 'test-convert.R:199:5'
    • not implemented yet (1): 'test-corpus.R:217:4'
    • purrr cannot be loaded (1): 'test-dfm.R:429:5'
    • requires spacyr installation to work (1): 'test-spacyr-methods.R:32:5'
    • skipping test of option setting when quanteda is not attached (1):
      'test-quanteda_options.R:52:5'
    • stm cannot be loaded (3): 'test-convert.R:6:5', 'test-convert.R:34:5',
      'test-convert.R:49:5'
    • text2vec cannot be loaded (1): 'test-fcm.R:2:5'
    • the verbose message has been changed (4): 'test-dfm.R:785:5',
      'test-tokens-word1.R:24:5', 'test-tokens-word4.R:607:5',
      'test-tokens.R:709:5'
    • tidytext cannot be loaded (1): 'test-as.dictionary.R:63:5'
    • topicmodels cannot be loaded (1): 'test-convert.R:174:5'
    • we no longer expect these to be the same (1): 'test-tokens.R:764:5'
    • whether these pass depends on the platform (1): 'test-tokens-custom.R:17:5'
    
    ══ Failed tests ════════════════════════════════════════════════════════════════
    ── Failure ('test-fcm.R:55:5'): fcm works with dfm and tokens in the same way ──
    diag(as.matrix(fcmat_toks_doc)) not equivalent to diag(as.matrix(fcmat_toks_win_ord)).
    2/6 mismatches (average diff: 1)
    [2] 1 - 0 ==  1
    [3] 0 - 1 == -1
    
    [ FAIL 1 | WARN 0 | SKIP 26 | PASS 3099 ]
    Error: Test failures
    Execution halted
Flavor: [r-patched-linux-x86_64](https://www.r-project.org/nosvn/R.check/r-patched-linux-x86_64/quanteda-00check.html)

You can see in 0a49557 that I already weakened the test, since I thought it might be something to do with how a new behaviour changed the vector element names (I suspected that one of the functions had deleted them. It's also why I printed them in test-fcm.R:53-54.

I should not have weakened that test - it should always pass with expect_equal(). This appears to be one of those random "multi-threading scrambles something in an unpredictable way" bugs. It didn't show up during the initial CRAN checks (or they would not have accepted it) but did arise, apparently randomly, during the most recent nightly check cycles. In case it disappears again, I've pasted the results above.

from quanteda.

koheiw commented on June 22, 2024

I might need to do this in recomile() to token ID unique.

quanteda/src/tokens_xptr.cpp

Lines 142 to 159 in 8d0179b

 std::size_t I = xptr->texts[h].size(); 

 Text text(I); 

 for (std::size_t i = 0; i < I; i++) { 

 if (xptr->texts[h][i] == 0) { 

 text[i] = 0; 

 count_pad++; 

 } else { 

 if (asis) { 

 text[i] = xptr->texts[h][i]; // for dictionary 

 } else { 

 if (ids[xptr->texts[h][i] - 1] == 0) { 

 ids[xptr->texts[h][i] - 1] = id; 

 id++; 

 } 

 text[i] = ids[xptr->texts[h][i] - 1]; 

 } 

 } 

 }

from quanteda.

"randomly" failing test-fcm about quanteda HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	std::size_t I = xptr->texts[h].size();
	Text text(I);
	for (std::size_t i = 0; i < I; i++) {
	if (xptr->texts[h][i] == 0) {
	text[i] = 0;
	count_pad++;
	} else {
	if (asis) {
	text[i] = xptr->texts[h][i]; // for dictionary
	} else {
	if (ids[xptr->texts[h][i] - 1] == 0) {
	ids[xptr->texts[h][i] - 1] = id;
	id++;
	}
	text[i] = ids[xptr->texts[h][i] - 1];
	}
	}
	}