Git Product home page Git Product logo

Comments (10)

jwijffels avatar jwijffels commented on August 19, 2024 4

Tokens are linked to each other by the token_id and head_token_id. The type of dependency relationship indicating how the words are linked is defined in dep_rel

See http://universaldependencies.org/guidelines.html for details on the output and all possible values of upos/xpos/feats and dep_rel

So for your example it shows that

  • the term weak has token_id 4, economy has a dependency relationship to weak as the head_token_id for economy is 4 (which is the token_id of weak).
  • the type of relationship given in dep_rel shows nsubj. nsubj means nominal subject as defined in http://universaldependencies.org/u/dep/index.html
    so economy is the nominal subject of weak

Similar comment for bright and outloook. You'll see in the table that outloook is the nominal subject of bright.

If you want to visualise this, you can easily use the igraph R package to visualise the word network which follows from this.

So this means that with dependency parsing output you can easily answer the questions like
'What is bright?' Answer: outloook
'What is weak?' Answer: economy

from udpipe.

jwijffels avatar jwijffels commented on August 19, 2024 2

FYI. Below some examples on how to put the dependency network in a graph.

library(udpipe)
udpipe_download_model("english")
m <- udpipe_load_model("english-ud-2.0-170801.udpipe")
x <- udpipe_annotate(m, "The economy is weak but the outlook is bright")
x <- as.data.frame(x)
library(igraph)
edges <- subset(x, head_token_id != 0, select = c("token_id", "head_token_id", "dep_rel"))
edges$label <- edges$dep_rel
g <- graph_from_data_frame(edges,
                           vertices = x[, c("token_id", "token", "lemma", "upos", "xpos", "feats")], 
                           directed = TRUE)
plot(g, vertex.label = x$token)

dep_example1

library(ggraph)
library(ggplot2)
ggraph(g, layout = "fr") +
  geom_edge_link(aes(label = dep_rel), arrow = arrow(length = unit(4, 'mm')), end_cap = circle(3, 'mm')) + 
  geom_node_point(color = "lightblue", size = 5) +
  theme_void(base_family = "") +
  geom_node_text(ggplot2::aes(label = token), vjust = 1.8) +
  ggtitle("Showing dependencies")

More graphical visualisations here: https://www.data-imaginist.com/2017/ggraph-introduction-edges/

dep_example2

from udpipe.

jwijffels avatar jwijffels commented on August 19, 2024 1

To my knowledge there is no such tutorial.
I was hoping the NLP R community will build upon the output to generate whichever they had in mind using the wealth of existing R packages.

About spacy, I've made a comparison between spacy and udpipe yesterday here: https://github.com/jwijffels/udpipe-spacy-comparison - comparing mainly accuracy.

from udpipe.

jwijffels avatar jwijffels commented on August 19, 2024 1

I believe the documentation at universaldependencies.org is rather good.
The question you have is what can you do with the output? Let me list up some elements which I can directly come up with.

Use cases of pos tagging & lemmatisation

  • better and easier exploratory text visualisations due to richer features
  • better topic modelling by taking only specific parts-of-speech tags in the topic model
  • automation of topic modelling for all languages instead of working with stopwords by using the right pos tags
  • using lemmatisation as a better replacement than stemming in topic modelling
  • noun phrase extraction or chunking
  • automatic text summarisation (e.g. using the textrank R package)
  • automatic keyword detection
  • look for co-occurrences between words which are relevant based on the POS tag
  • do better sentence or document similarities by using only the words of a specific POS tag
  • identification of authors

Use cases of dependency parsing

  • question answering
  • semantic parsing & semantic role labelling
  • information extraction for example
    • finding the subject of negative sentiments
    • in protein-protein interaction extraction we may want to extract the subject and object of a verb such as phosphorylates
    • negation detection can be achieved by finding the governor the negation word.
  • finding the subject of an object
  • chat bots
  • automatic generation of poetry
  • using all the annotation features as predictive elements in predictive models.
    E.g. if you have mails and you want to do topic detection, you want to filter out the header/footer element of mails.
    This can be done with using the features from the annotation as predictive input to find the location of the mail header/footer.
  • as input to machine translation

If you have other idea's what can be done with the annotation results, feel free to add.
About courses. Follow the course at https://lstat.kuleuven.be/training/coursedescriptions/text-mining-with-r, it's given by me so that answer is opinionated. About books, it's hard to find good ones on the topic of text mining.

from udpipe.

randomgambit avatar randomgambit commented on August 19, 2024

thanks! extremely helpful! This package is amazing!!

Is there a tutorial somewhere to use the igraph package with udpipe?

Also, I remember you wanted to do some performance comparisons with spacy and the other competitors. Have you got the time to do it?

Thanks!

from udpipe.

randomgambit avatar randomgambit commented on August 19, 2024

OK thanks! Will have a look shortly

from udpipe.

randomgambit avatar randomgambit commented on August 19, 2024

really amazing, thanks!!!

from udpipe.

arademaker avatar arademaker commented on August 19, 2024

I suspect you can have much better results with graphviz , see https://eli.thegreenplace.net/2009/11/23/visualizing-binary-trees-with-graphviz

from udpipe.

randomgambit avatar randomgambit commented on August 19, 2024

@jwijffels I guess one big picture question is: which textbooks/books/resources do you recommend to get the most out of this package (and NLP in general?). The resources on http://universaldependencies.org/u/dep/index.html are ... very light

from udpipe.

randomgambit avatar randomgambit commented on August 19, 2024

thanks @jwijffels ! very clear - I was hoping that your course was free material though :) hehe

from udpipe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.