Git Product home page Git Product logo

rdrpostagger's People

Contributors

jwijffels avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rdrpostagger's Issues

Error > Transform .CSV into a sentence (POS tagging)

Hello, im a newbie and im trying to get to work with Rstudio for my final project. I got error while doing POS Tagging, i want to transform my .CSV data into a sentence. The code:

#menentukan working directory
setwd("F:/SKRIPSI/POS TAGGING/RDRPOSTagger-master")

#membersihkan
rm(list = ls())

#install.packages("rJava")
#install.packages("data.table")
#install.packages("RDRPOSTagger", repos = "http://www.datatailor.be/rcube", type = "source")
#install.packages("remotes")
#remotes::install_github("bnosac/RDRPOSTagger")

library(tm)
library(NLP)
library(RDRPOSTagger)
library(tokenizers)

models <- rdr_available_models()
models$POS$language
models$MORPH$language
models$UniversalPOS$language

#x <- c("Oleg Borisovich Kulik is a Ukrainian-born Russian performance artist")
#tagger <- rdr_model(language = "English", annotation = "POS")
#rdr_pos(tagger, x = x)

x <- c("aku mau makan ku ingat kamu aku mau tidur juga ku ingat kamu", 
       "aku sedang bosan ku ingat kamu oh tuhan mungkin kah ku jatuh cinta",
       "  ", "", NA)
#tagger <- rdr_model(language = "Indonesian", annotation = "MORPH")
#rdr_pos(tagger, x = x)

tagger <- rdr_model(language = "Indonesian", annotation = "UniversalPOS")
rdr_pos(tagger, x = x)

x <- read.csv("dataset_twitter_stopword.csv", stringsAsFactors = FALSE, header = FALSE)
tagger <- rdr_model(language = "Indonesian", annotation = "UniversalPOS")
rdr_pos(tagger, x = x)

the following error (token on .csv process didnt show up and the POS only written as PUNCT)

#menentukan working directory
> setwd("F:/SKRIPSI/POS TAGGING/RDRPOSTagger-master")
> 
> #membersihkan
> rm(list = ls())
> 
> #install.packages("rJava")
> #install.packages("data.table")
> #install.packages("RDRPOSTagger", repos = "http://www.datatailor.be/rcube", type = "source")
> #install.packages("remotes")
> #remotes::install_github("bnosac/RDRPOSTagger")
> 
> library(tm)
> library(NLP)
> library(RDRPOSTagger)
> library(tokenizers)
> 
> models <- rdr_available_models()
> models$POS$language
[1] "English"    "French"     "German"     "Hindi"      "Italian"    "Thai"       "Vietnamese"
> models$MORPH$language
[1] "Bulgarian"  "Czech"      "Dutch"      "French"     "German"     "Portuguese" "Spanish"    "Swedish"   
> models$UniversalPOS$language
 [1] "Ancient_Greek-PROIEL" "Ancient_Greek"        "Arabic"               "Basque"               "Belarusian"           "Bulgarian"            "Catalan"              "Chinese"             
 [9] "Coptic"               "Croatian"             "Czech-CAC"            "Czech-CLTT"           "Czech"                "Danish"               "Dutch-LassySmall"     "Dutch"               
[17] "English-LinES"        "English-ParTUT"       "English"              "Estonian"             "Finnish-FTB"          "Finnish"              "French-ParTUT"        "French-Sequoia"      
[25] "French"               "Galician-TreeGal"     "Galician"             "German"               "Gothic"               "Greek"                "Hebrew"               "Hindi"               
[33] "Hungarian"            "Indonesian"           "Irish"                "Italian-ParTUT"       "Italian"              "Japanese"             "Korean"               "Latin-ITTB"          
[41] "Latin-PROIEL"         "Latin"                "Latvian"              "Lithuanian"           "Norwegian-Bokmaal"    "Norwegian-Nynorsk"    "Old_Church_Slavonic"  "Persian"             
[49] "Polish"               "Portuguese-BR"        "Portuguese"           "Romanian"             "Russian-SynTagRus"    "Russian"              "Slovak"               "Slovenian-SST"       
[57] "Slovenian"            "Spanish-AnCora"       "Spanish"              "Swedish-LinES"        "Swedish"              "Tamil"                "Turkish"              "Urdu"                
[65] "Vietnamese"          
> 
> #x <- c("Oleg Borisovich Kulik is a Ukrainian-born Russian performance artist")
> #tagger <- rdr_model(language = "English", annotation = "POS")
> #rdr_pos(tagger, x = x)
> 
> x <- c("aku mau makan ku ingat kamu aku mau tidur juga ku ingat kamu", 
+        "aku sedang bosan ku ingat kamu oh tuhan mungkin kah ku jatuh cinta",
+        "  ", "", NA)
> #tagger <- rdr_model(language = "Indonesian", annotation = "MORPH")
> #rdr_pos(tagger, x = x)
> 
> tagger <- rdr_model(language = "Indonesian", annotation = "UniversalPOS")
> rdr_pos(tagger, x = x)
Column 1 ['doc.id'] of item 3 is missing in item 1. Use fill=TRUE to fill with NA (NULL for list columns), or use.names=FALSE to ignore column names. use.names='check' (default from v1.12.2) emits this message and proceeds as if use.names=FALSE for  backwards compatibility. See news item 5 in v1.12.2 for options to control this message.
   doc_id token_id   token  pos
1      d1        1     aku PRON
2      d1        2     mau  ADV
3      d1        3   makan VERB
4      d1        4      ku PRON
5      d1        5   ingat VERB
6      d1        6    kamu PRON
7      d1        7     aku PRON
8      d1        8     mau  ADV
9      d1        9   tidur VERB
10     d1       10    juga  ADV
11     d1       11      ku PRON
12     d1       12   ingat VERB
13     d1       13    kamu PRON
14     d2        1     aku PRON
15     d2        2  sedang  ADV
16     d2        3   bosan NOUN
17     d2        4      ku PRON
18     d2        5   ingat VERB
19     d2        6    kamu PRON
20     d2        7      oh NOUN
21     d2        8   tuhan NOUN
22     d2        9 mungkin  ADV
23     d2       10     kah NOUN
24     d2       11      ku PRON
25     d2       12   jatuh VERB
26     d2       13   cinta NOUN
27     d3        0    <NA> <NA>
28     d4        0    <NA> <NA>
29     d5        0    <NA> <NA>
> 
> x <- read.csv("dataset_twitter_stopword.csv", stringsAsFactors = FALSE, header = FALSE)
> tagger <- rdr_model(language = "Indonesian", annotation = "UniversalPOS")
> rdr_pos(tagger, x = x)
    doc_id token_id token   pos
1       d1        1    '' PUNCT
2       d1        2     , PUNCT
3       d1        3    '' PUNCT
4       d1        4     , PUNCT
5       d1        5    '' PUNCT
6       d1        6     , PUNCT
7       d1        7    '' PUNCT
8       d1        8     , PUNCT
9       d1        9    '' PUNCT
10      d1       10     , PUNCT
11      d1       11    '' PUNCT
12      d1       12     , PUNCT
13      d1       13    '' PUNCT
14      d1       14     , PUNCT
15      d1       15    '' PUNCT
16      d1       16     , PUNCT
17      d1       17    '' PUNCT
18      d1       18     , PUNCT
19      d1       19    '' PUNCT
20      d1       20     , PUNCT
21      d1       21    '' PUNCT
22      d1       22     , PUNCT
23      d1       23    '' PUNCT
24      d1       24     , PUNCT
25      d1       25    '' PUNCT
26      d1       26     , PUNCT
27      d1       27    '' PUNCT
28      d1       28     , PUNCT
29      d1       29    '' PUNCT
30      d1       30     , PUNCT
31      d1       31    '' PUNCT
32      d1       32     , PUNCT
33      d1       33    '' PUNCT
34      d1       34     , PUNCT
35      d1       35    '' PUNCT
36      d1       36     , PUNCT
37      d1       37    '' PUNCT
38      d1       38     , PUNCT
39      d1       39    '' PUNCT
40      d1       40     , PUNCT
41      d1       41    '' PUNCT
42      d1       42     , PUNCT
43      d1       43    '' PUNCT
44      d1       44     , PUNCT
45      d1       45    '' PUNCT
46      d1       46     , PUNCT
47      d1       47    '' PUNCT
48      d1       48     , PUNCT
49      d1       49    '' PUNCT
50      d1       50     , PUNCT
51      d1       51    '' PUNCT
52      d1       52     , PUNCT
53      d1       53    '' PUNCT
54      d1       54     , PUNCT
55      d1       55    '' PUNCT
56      d1       56     , PUNCT
57      d1       57    '' PUNCT
58      d1       58     , PUNCT
59      d1       59    '' PUNCT
60      d1       60     , PUNCT
61      d1       61    '' PUNCT
62      d1       62     , PUNCT
63      d1       63    '' PUNCT
64      d1       64     , PUNCT
65      d1       65    '' PUNCT
66      d1       66     , PUNCT
67      d1       67    '' PUNCT
68      d1       68     , PUNCT
69      d1       69    '' PUNCT
70      d1       70     , PUNCT
71      d1       71    '' PUNCT
72      d1       72     , PUNCT
73      d1       73    '' PUNCT
74      d1       74     , PUNCT
75      d1       75    '' PUNCT
76      d1       76     , PUNCT
77      d1       77    '' PUNCT
78      d1       78     , PUNCT
79      d1       79    '' PUNCT
80      d1       80     , PUNCT
81      d1       81    '' PUNCT
82      d1       82     , PUNCT
83      d1       83    '' PUNCT
84      d1       84     , PUNCT
85      d1       85    '' PUNCT
86      d1       86     , PUNCT
87      d1       87    '' PUNCT
88      d1       88     , PUNCT
89      d1       89    '' PUNCT
90      d1       90     , PUNCT
91      d1       91    '' PUNCT
92      d1       92     , PUNCT
93      d1       93    '' PUNCT
94      d1       94     , PUNCT
95      d1       95    '' PUNCT
96      d1       96     , PUNCT
97      d1       97    '' PUNCT
98      d1       98     , PUNCT
99      d1       99    '' PUNCT
100     d1      100     , PUNCT
101     d1      101    '' PUNCT
102     d1      102     , PUNCT
103     d1      103    '' PUNCT
104     d1      104     , PUNCT
105     d1      105    '' PUNCT
106     d1      106     , PUNCT
107     d1      107    '' PUNCT
108     d1      108     , PUNCT
109     d1      109    '' PUNCT
110     d1      110     , PUNCT
111     d1      111    '' PUNCT
112     d1      112     , PUNCT
113     d1      113    '' PUNCT
114     d1      114     , PUNCT
115     d1      115    '' PUNCT
116     d1      116     , PUNCT
117     d1      117    '' PUNCT
118     d1      118     , PUNCT
119     d1      119    '' PUNCT
120     d1      120     , PUNCT
121     d1      121    '' PUNCT
122     d1      122     , PUNCT
123     d1      123    '' PUNCT
124     d1      124     , PUNCT
125     d1      125    '' PUNCT
126     d1      126     , PUNCT
127     d1      127    '' PUNCT
128     d1      128     , PUNCT
129     d1      129    '' PUNCT
130     d1      130     , PUNCT
131     d1      131    '' PUNCT
132     d1      132     , PUNCT
133     d1      133    '' PUNCT
134     d1      134     , PUNCT
135     d1      135    '' PUNCT
136     d1      136     , PUNCT
137     d1      137    '' PUNCT
138     d1      138     , PUNCT
139     d1      139    '' PUNCT
140     d1      140     , PUNCT
141     d1      141    '' PUNCT
142     d1      142     , PUNCT
143     d1      143    '' PUNCT
144     d1      144     , PUNCT
145     d1      145    '' PUNCT
146     d1      146     , PUNCT
147     d1      147    '' PUNCT
148     d1      148     , PUNCT
149     d1      149    '' PUNCT
150     d1      150     , PUNCT
151     d1      151    '' PUNCT
152     d1      152     , PUNCT
153     d1      153    '' PUNCT
154     d1      154     , PUNCT
155     d1      155    '' PUNCT
156     d1      156     , PUNCT
157     d1      157    '' PUNCT
158     d1      158     , PUNCT
159     d1      159    '' PUNCT
160     d1      160     , PUNCT
161     d1      161    '' PUNCT
162     d1      162     , PUNCT
163     d1      163    '' PUNCT
164     d1      164     , PUNCT
165     d1      165    '' PUNCT
166     d1      166     , PUNCT
167     d1      167    '' PUNCT
168     d1      168     , PUNCT
169     d1      169    '' PUNCT
170     d1      170     , PUNCT
171     d1      171    '' PUNCT
172     d1      172     , PUNCT
173     d1      173    '' PUNCT
174     d1      174     , PUNCT
175     d1      175    '' PUNCT
176     d1      176     , PUNCT
177     d1      177    '' PUNCT
178     d1      178     , PUNCT
179     d1      179    '' PUNCT
180     d1      180     , PUNCT
181     d1      181    '' PUNCT
182     d1      182     , PUNCT
183     d1      183    '' PUNCT
184     d1      184     , PUNCT
185     d1      185    '' PUNCT
186     d1      186     , PUNCT
187     d1      187    '' PUNCT
188     d1      188     , PUNCT
189     d1      189    '' PUNCT
190     d1      190     , PUNCT
191     d1      191    '' PUNCT
192     d1      192     , PUNCT
193     d1      193    '' PUNCT
194     d1      194     , PUNCT
195     d1      195    '' PUNCT
196     d1      196     , PUNCT
197     d1      197    '' PUNCT
198     d1      198     , PUNCT
199     d1      199    '' PUNCT
200     d1      200     , PUNCT
201     d1      201    '' PUNCT
202     d1      202     , PUNCT
203     d1      203    '' PUNCT
204     d1      204     , PUNCT
205     d1      205    '' PUNCT
206     d1      206     , PUNCT
207     d1      207    '' PUNCT
208     d1      208     , PUNCT
209     d1      209    '' PUNCT
210     d1      210     , PUNCT
211     d1      211    '' PUNCT
212     d1      212     , PUNCT
213     d1      213    '' PUNCT
214     d1      214     , PUNCT
215     d1      215    '' PUNCT
216     d1      216     , PUNCT
217     d1      217    '' PUNCT
218     d1      218     , PUNCT
219     d1      219    '' PUNCT
220     d1      220     , PUNCT
221     d1      221    '' PUNCT
222     d1      222     , PUNCT
223     d1      223    '' PUNCT
224     d1      224     , PUNCT
225     d1      225    '' PUNCT
226     d1      226     , PUNCT
227     d1      227    '' PUNCT
228     d1      228     , PUNCT
229     d1      229    '' PUNCT
230     d1      230     , PUNCT
231     d1      231    '' PUNCT
232     d1      232     , PUNCT
233     d1      233    '' PUNCT
234     d1      234     , PUNCT
235     d1      235    '' PUNCT
236     d1      236     , PUNCT
237     d1      237    '' PUNCT
238     d1      238     , PUNCT
239     d1      239    '' PUNCT
240     d1      240     , PUNCT
241     d1      241    '' PUNCT
242     d1      242     , PUNCT
243     d1      243    '' PUNCT
244     d1      244     , PUNCT
245     d1      245    '' PUNCT
246     d1      246     , PUNCT
247     d1      247    '' PUNCT
248     d1      248     , PUNCT
249     d1      249    '' PUNCT
250     d1      250     , PUNCT
 [ reached 'max' / getOption("max.print") -- omitted 5831 rows ]

the .csv data:
https://drive.google.com/file/d/10x_mhVQWSsFM6zkCV9acSm2YJc4jnMoD/view?usp=sharing

i hope you can help me to fix my problem, thank you in advance!

installation failed because of Java?

My installation failed. Notice that the command 'library(rJava)' works fine and the R package 'wordnet' importing rJava works fine... So I don't understand exactly the problem because java doesn't seem to be a cause of problems with other packages...

`devtools::install_github("bnosac/RDRPOSTagger")
Downloading GitHub repo bnosac/RDRPOSTagger@master
√ checking for file 'C:\Users\Ludovic\AppData\Local\Temp\RtmpGqPYPs\remotes4de47ddc3d23\bnosac-RDRPOSTagger-af51e38/DESCRIPTION' ...

  • preparing 'RDRPOSTagger': (5.8s)
    √ checking DESCRIPTION meta-information ...
  • checking for LF line-endings in source and make files and shell scripts (3.1s)
  • checking for empty or unneeded directories
  • building 'RDRPOSTagger_1.1.tar.gz'

Installing package into ‘C:/Users/Ludovic/Documents/R/win-library/3.6’
(as ‘lib’ is unspecified)

  • installing source package 'RDRPOSTagger' ...
    ** using staged installation
    ** R
    ** inst
    ** byte-compile and prepare package for lazy loading
    ** help
    *** installing help indices
    converting help for package 'RDRPOSTagger'
    finding HTML links ... fini
    rdr_add_space_around_punctuations html
    rdr_available_models html
    rdr_model html
    rdr_pos html
    ** building package indices
    ** installing vignettes
    ** testing if installed package can be loaded from temporary location
    *** arch - i386
    Error: package or namespace load failed for 'rJava':
    .onLoad failed in loadNamespace() for 'rJava', details:
    call: fun(libname, pkgname)
    error: No CurrentVersion entry in Software/JavaSoft registry! Try re-installing Java and make sure R and Java have matching architectures.
    Error : package 'rJava' could not be loaded
    Erreur : le chargement a échoué
    Exécution arrêtée
    *** arch - x64
    ERROR: loading failed for 'i386'
  • removing 'C:/Users/Ludovic/Documents/R/win-library/3.6/RDRPOSTagger'
    Error in i.p(...) :
    (converted from warning) installation of package ‘C:/Users/Ludovic/AppData/Local/Temp/RtmpGqPYPs/file4de46a333221/RDRPOSTagger_1.1.tar.gz’ had non-zero exit status`

Tokens dropped with quoted text

Thanks for writing this great package!

I am trying to parse tweets and came across an issue with the tagger when passing it quoted text. The tokens after and before the quotation marks are deleted in the tagging process.

Here is an example:

rdr_pos(rdr_model(language = "English", annotation = "UniversalPOS"), "Some guy asked -\"what is the issue\"")

The returned object is missing "what" and "issue".

For the time being, I am simply gsub'ing the \" but this would obviously be better addressed internal to the function.

wrong upos tag

Using universal POS tag for a phrase: "supervise correctional procedures", the word "supervise" was tagged as a noun.

Installation failed: Command failed (1)

Hi,

I try to install RDRPOSTagger package. If I run your stated code I get the following result.

devtools::install_github("bnosac/RDRPOSTagger", build_vignettes = TRUE)
Downloading GitHub repo bnosac/RDRPOSTagger@master
from URL https://api.github.com/repos/bnosac/RDRPOSTagger/zipball/master
Installing RDRPOSTagger
"C:/PROGRA~1/R/R-35~1.0/bin/x64/R" --no-site-file --no-environ --no-save  \
  --no-restore --quiet CMD build  \
  "C:\Users\X1\AppData\Local\Temp\RtmpMjuZpO\devtools2f7c59287379\bnosac-RDRPOSTagger-af51e38"  \
  --no-resave-data --no-manual 

* checking for file 'C:\Users\X1\AppData\Local\Temp\RtmpMjuZpO\devtools2f7c59287379\bnosac-RDRPOSTagger-af51e38/DESCRIPTION' ... OK
* preparing 'RDRPOSTagger':
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building 'RDRPOSTagger_1.1.tar.gz'

"C:/PROGRA~1/R/R-35~1.0/bin/x64/R" --no-site-file --no-environ --no-save  \
  --no-restore --quiet CMD INSTALL  \
  "C:/Users/X1/AppData/Local/Temp/RtmpMjuZpO/RDRPOSTagger_1.1.tar.gz"  \
  --library="C:/Users/X1/Documents/R/win-library/3.5" --install-tests 

* installing *source* package 'RDRPOSTagger' ...
** R
** inst
** tests
** byte-compile and prepare package for lazy loading
Warnung: package 'rJava' was built under R version 3.5.2
** help
*** installing help indices
  converting help for package 'RDRPOSTagger'
    finding HTML links ... fertig
    rdr_add_space_around_punctuations       html  
    rdr_available_models                    html  
    rdr_model                               html  
    rdr_pos                                 html  
** building package indices
** installing vignettes
** testing if installed package can be loaded
*** arch - i386
Warnung: package 'rJava' was built under R version 3.5.2
Error: package or namespace load failed for 'rJava':
 .onLoad failed in loadNamespace() for 'rJava', details:
  call: fun(libname, pkgname)
  error: No CurrentVersion entry in Software/JavaSoft registry! Try re-installing Java and make sure R and Java have matching architectures.
Error : package 'rJava' could not be loaded
Fehler: Laden fehlgeschlagen
Ausführung angehalten
*** arch - x64
Warnung: package 'rJava' was built under R version 3.5.2
ERROR: loading failed for 'i386'
* removing 'C:/Users/X1/Documents/R/win-library/3.5/RDRPOSTagger'
In R CMD INSTALL
Installation failed: Command failed (1)

Java Error

Hi,

while installing RDRPOSTagger, it comes up with the error. Can someone help me solve the problem?
image

Best...

String index out of range error with leading symbols

I came across an error when passing the tagger sentences that have a leading symbol like - or ?.

Here is an example:

rdr_pos(rdr_model(language = "English", annotation = "UniversalPOS"), "- what is wrong?")

Returns the following error:

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.StringIndexOutOfBoundsException: String index out of range: 0

It seems like this is an rJava error but I thought I'd post it here first.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.