Git Product home page Git Product logo

Comments (6)

TyJK avatar TyJK commented on July 2, 2024 1

We would be looking at document based, with each website being assigned labels for sentiment and topic, as well as each post, comment, entry (however it's organized) being given a unique document id which would just be assigned through enumeration most likely. So when it came to model construction, a given document would be given a unique ID, but would also be a part of larger groups based on the other tags.

from echoburst.

PSanni avatar PSanni commented on July 2, 2024 1

Great, if we are using document than we need to select websites and topic content carefully because there are higher chances of diverse information on same content or website. And that can easily screw up model. We can include a subjectivity classification, so by subjectivity, we can remove unuseful sentences/information.

I am not sure but, I think Word2Vec might be able to do this? I haven't tried it. Does anyone aware of that ???

from echoburst.

PSanni avatar PSanni commented on July 2, 2024

Classification: What classification approach you are planning to use, Documents or Sentence based ??. Because, as you might know, if you are taking sentence based approach then you need set of labeled sentences. :)

from echoburst.

PikioopSo avatar PikioopSo commented on July 2, 2024

@TyJK, the document id could be assigned VIA date data, so that you can do a analysis through spans of time, but I wasn't quite sure what type of enumeration system you were going with.

from echoburst.

TyJK avatar TyJK commented on July 2, 2024

@PiReel I was going to use a simple count. Doc2Vec only requires that documents be unique in order to keep them separate (all documents sharing the same tag are treated as one document). It probably doesn't matter for the number of documents we'll get but by enumerating linearly it saves memory. Luckily in my experiments so far, it naturally organizes by date, since that's usually how it's organized on the site archive. I'll have a few examples of test runs up later today.

from echoburst.

TyJK avatar TyJK commented on July 2, 2024

Word2Vec probably could do this, but I think we might need to use it as a secondary filter rather than a primary. ie, I think we could run it on each website/video transcript after it had been scraped and cleaned to make sure nothing that wasn't a part of the category got through, but I don't know how we could use it to help in the selection process itself. Hopefully people will be careful, we do mention it numerous times but if you have any suggestions for ways to make it clearer to people I'm all ears.

from echoburst.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.