Comments (13)
@kbenoit ,
Thank you for your quick response. I checked .libPaths() for both R Studio and R Console, and now I have achieved consistency between the R Studio and R Console. In both cases, the packages are installed in my versioned library. Although the issue still persists on R Console, I am happy that quanteda works seamlessly in R Studio. I could have done more if I had more technical expertise, so I am closing this thread. Anyway, your support is greatly appreciated, and I am grateful for the prompt assistance! Wishing you the best in your endeavors.
from quanteda.
In the upcoming version, RcppParallel::defaultNumThreads()
, which is causing the error, is not used. Can you install from Github and test?
from quanteda.
Hello, koheiw!
Thank you for your suggestions! I have installed "RcppCore/RcppParallel" from the Github. After that, I have been able to successfully install and call quanteda. However, when I tried to perform tokenization, I faced the same seg fault:
_> library(RcppParallel)
library(quanteda)
Package version: 3.3.1
Unicode version: 14.0
ICU version: 71.1
Parallel computing: 8 of 8 threads used.
See https://quanteda.io for tutorials and examples.
texts <- c("I love programming in R.", "Text analysis is interesting.", "R is a powerful language.")
corpus <- corpus(texts)
tokens <- tokens(corpus)
*** caught segfault ***
address 0x9ffffffe7, cause 'invalid permissions'
Traceback:
1: qatd_cpp_tokens_select(x, type, ids, 2, padding, window[1], window[2], startpos, endpos)
2: tokens_select.tokens(x, ..., selection = "remove")
3: tokens_select(x, ..., selection = "remove")
4: tokens_remove(x, removals[["separators"]], valuetype = "regex", verbose = FALSE)
5: tokens.tokens(result, remove_punct = remove_punct, remove_symbols = remove_symbols, remove_numbers = remove_numbers, remove_url = remove_url, remove_separators = remove_separators, split_hyphens = FALSE, split_tags = FALSE, include_docvars = TRUE, padding = padding, verbose = verbose)
6: tokens.corpus(corpus)
7: tokens(corpus)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:_
I then tried to reduce the number of threads to 1. However, it still remained equal to 8, and the issue with the seg fault persisted:
_> library(RcppParallel)
num_threads <- RcppParallel::defaultNumThreads()
cat("Default number of threads:", num_threads, "\n")
Default number of threads: 8
RcppParallel::setThreadOptions(numThreads = 1)
actual_num_threads <- RcppParallel::defaultNumThreads()
num_threads_after_setting = RcppParallel::defaultNumThreads()
cat("Number of threads after setting:", num_threads_after_setting, "\n")
Number of threads after setting: 8_
Do you have any further ideas to resolve this issue? Thank you.
from quanteda.
Your quanteda is still v3x, please install it from Github too.
# remotes package required to install quanteda from Github
remotes::install_github("quanteda/quanteda")
from quanteda.
koheiw, thank you so much for your prompt response and valuable suggestion. I tried to install quanteda from GitHub, but I received a warning message about the non-zero exit status that prevented the installation:
Downloading GitHub repo quanteda/quanteda@HEAD
── R CMD build
building ‘quanteda_4.0.0.tar.gz’ation ...x/l8408syn1x95x6wvrp35d9g00000gn/T/RtmpM7FzQB/remotesef067af610b/quanteda-quanteda-cb80e23/DESCRIPTION’ ...
[...]
Warning message:
In i.p(...) :
installation of package ‘/var/folders/bx/l8408syn1x95x6wvrp35d9g00000gn/T//RtmpWKR3pI/filedbb6b127bb6/quanteda_4.0.0.tar.gz’ had non-zero exit status
from quanteda.
Do you have all those tools installed to compile the code on your machine?
https://cran.r-project.org/bin/macosx/
from quanteda.
Hello again, koheiw! I'm sorry it took me longer to follow your suggestion and respond. I appreciate your assistance!
I've installed R 4.3.2 binary for macOS 11 and XQuartz. When I tried to install binaries "Big Sur" for arm64-based Macs, I could not find quanteda in the list of packages "contrib" - I hope it's not a cause of a problem.
After that, I reinstalled quanteda from the GitHub again and faced the same warning message as I wrote above.
If there are any specific binaries or tools from your link above that I need to install? Thanks a lot.
from quanteda.
@kbenoit do you have any idea?
from quanteda.
- Can you paste the first three lines of output from starting up R? e.g.
R version 4.3.2 (2023-10-31) -- "Eye Holes"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: aarch64-apple-darwin20 (64-bit)
- what is the output of
.libPaths()
? e.g.
> .libPaths()
[1] "/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library"
sometimes conflicting installations can exist if you have a local path defined as well.
As for there not being an arm64 binary, that's just not the case. There is definitely that binary built and on CRAN.
from quanteda.
Hello, @kbenoit ! Thank you for your assistance.
-
R version 4.3.2 (2023-10-31) -- "Eye Holes"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: aarch64-apple-darwin20 (64-bit) -
[1] "/Library/Frameworks/R.framework/Versions/3.6/Resources/library"
Can it cause a problem if I have an older library path and a newer version of R?
from quanteda.
Yes, I suspect that is the problem. Check your .R / .Rprofile etc files to see if that libpath is hardwired. If you remove the hard path reference, then packages will install to your versioned library automatically.
from quanteda.
Thank you so much for your guidance on this issue.
I removed the hard path reference and updated R to the latest version. I also downloaded R Studio (previously, I used R console only). Luckily, there has been no issue with quanteda on R Studio - I am very happy about it!!!
install.packages("quanteda")
trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.3/quanteda_3.3.1.tgz'
Content type 'application/x-gzip' length 4232594 bytes (4.0 MB)
==================================================
downloaded 4.0 MB
The downloaded binary packages are in
/var/folders/bx/l8408syn1x95x6wvrp35d9g00000gn/T//RtmpZGYJkn/downloaded_packages
library(quanteda)
Package version: 3.3.1
Unicode version: 14.0
ICU version: 71.1
Parallel computing: 8 of 8 threads used.
See https://quanteda.io for tutorials and examples.
txt <- c(doc1 = "A sentence, showing how tokens() works.",
-
doc2 = "@quantedainit and #textanalysis https://example.com?p=123.",
-
doc3 = "Self-documenting code??",
-
doc4 = "£1,000,000 for 50¢ is gr8 4ever \U0001f600")
tokens(txt)
Tokens consisting of 4 documents.
doc1 :
[1] "A" "sentence" "," "showing" "how" "tokens" "("
[8] ")" "works" "."
doc2 :
[1] "@quantedainit" "and"
[3] "#textanalysis" "https://example.com?p=123."
doc3 :
[1] "Self-documenting" "code" "?" "?"
doc4 :
[1] "£" "1,000,000" "for" "50" "¢" "is"
[7] "gr8" "4ever" "😀"
However, when I tried to run the same code on R Console, it showed the same seg fault I encountered before:
library(quanteda)
Package version: 3.3.1
Unicode version: 14.0
ICU version: 71.1
Parallel computing: 8 of 8 threads used.
See https://quanteda.io for tutorials and examples.
txt <- c(doc1 = "A sentence, showing how tokens() works.",
-
doc2 = "@quantedainit and #textanalysis https://example.com?p=123.",
-
doc3 = "Self-documenting code??",
-
doc4 = "£1,000,000 for 50¢ is gr8 4ever \U0001f600")
tokens(txt)
*** caught segfault ***
address 0x9ffffffe7, cause 'invalid permissions'
Traceback:
1: qatd_cpp_tokens_select(x, type, ids, 2, padding, window[1], window[2], startpos, endpos)
2: tokens_select.tokens(x, ..., selection = "remove")
3: tokens_select(x, ..., selection = "remove")
4: tokens_remove(x, removals[["separators"]], valuetype = "regex", verbose = FALSE)
5: tokens.tokens(result, remove_punct = remove_punct, remove_symbols = remove_symbols, remove_numbers = remove_numbers, remove_url = remove_url, remove_separators = remove_separators, split_hyphens = FALSE, split_tags = FALSE, include_docvars = TRUE, padding = padding, verbose = verbose)
6: tokens.corpus(corpus(x), what = what, remove_punct = remove_punct, remove_symbols = remove_symbols, remove_numbers = remove_numbers, remove_url = remove_url, remove_separators = remove_separators, split_hyphens = split_hyphens, split_tags = split_tags, include_docvars = include_docvars, padding = padding, verbose = verbose, ...)
7: tokens.character(txt)
8: tokens(txt)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
Still, I am glad that it works well on R studio. Any thoughts on why R Studio is doing a better job than R Console?
Thanks.
from quanteda.
For whatever reason, R console and RStudio are reading different environment variables that determine where your packages are installed. You can compare both using .libPaths()
within each instance of running R (console v. RStudio). Check your various .R files to make sure there is not a different one somewhere that is affecting R console's R versus Rstudio.
from quanteda.
Related Issues (20)
- tbb::parallel_for crash
- Changes to generics and UseMethod in R development version breaking tokens generic
- NOTE created by clean script not removing Makevars.win
- Create safe pattern converter
- Install failure Fedora 39 g++ 13.2.1 quanteda 4.0.0; configure.ac error HOT 11
- TBB requirements need updating
- Homebrew installation of TBB still not found on macOS when compiling HOT 1
- quanteda cannot be updated HOT 1
- configure script could use tidying up HOT 1
- "randomly" failing test-fcm HOT 2
- Remove links to quanteda.textstats
- rolling stylometry HOT 2
- Quanteda build_tokens error when using rsplit from collapse package HOT 2
- Inconsistent results of corpus_reshape(to="sentences") HOT 1
- Consider passing ... to print
- Make tokens_substitute() to replace characters in tokens?
- Add more explicit information on enabling parallelization in quanteda >v4.0.0 HOT 1
- Experiencing problem with textmodel_mlp
- Add apply_if to tokens_ngrams()
- Error in parallel computing HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from quanteda.