Git Product home page Git Product logo

Comments (8)

edsu avatar edsu commented on September 15, 2024 2

Hi @blas-ko -- it really ought to be possible to hydrate directly to CSV for people who don't want the JSON at all. Also, I think it would help in situations like this to write to a compressed file.

Unfortunately until #15 is resolved you will notice degraded performance as Hydrator works its way into very large tweet id files. Hopefully it will be resolved soon though.

Since you are using Linux I'm guessing you may have some familiarity with the command line? If that is the case, for working with large files, and having more control over how things are written you can try our other tool twarc.

For example if you want you can hydrate your ids and write them as a gzip compressed CSV file with the following command:

twarc --format csv hydrate ids.txt | gzip - > tweets.csv.gz

If you run this on a dedicated vm in a tmux or screen session you can let it run for as long as it needs to. You can also sample it as it is written by just streaming the data to another program:

zcat tweets.csv.gz | analyze.jl

from hydrator.

edsu avatar edsu commented on September 15, 2024 1

Wow, thats a new one! I am glad you figured out where the command was installed. It looks your operating system's default encoding is not utf8--which is unusual these days, but not unheard of.

I recall you are working on a shared system? So you may not have control over the default encoding. Could you try setting this in your shell before you run twarc?

export PYTHONIOENCODING="utf-8"

If that works you might want to add it to your ~/.profile so you don't have to remember to do it every time you open a new terminal session.

terminal.

https://docs.python.org/3.8/using/cmdline.html#environment-variables

from hydrator.

blas-ko avatar blas-ko commented on September 15, 2024

Hey @edsu! Thanks for pointing me out to twarc! It's great :).

I've been able to install twarc on my personal computer successfully. However, in the server where I have access to (but where I don't have root privileges) I have been only able to import twarc from python, but not running it from the command line (I get a Command 'twarc' not found message).

Should this be addressed in a separate issue? Let me know! Thanks!

from hydrator.

edsu avatar edsu commented on September 15, 2024

@blas-ko Did you install on your server with pip install --user twarc?

from hydrator.

edsu avatar edsu commented on September 15, 2024

@blas-ko Just following up, if you did install with --user you should be able to find the twarc executable in your "user base". This location is platform dependent, but you can find it by running this at the command line:

python3 -m site --user-base

(omit the 3 from python3 if you are using another version)

I suspect that the directory you see on output is not in your PATH. If you add it to your PATH then typing twarc on the command line will work. Let me know if you need any help adjusting your PATH to include that directory.

from hydrator.

blas-ko avatar blas-ko commented on September 15, 2024

Hey @edsu! I installed it both as normally pip3 install twarc and with the user flag pip3 install --user twarc.

I tried the python3 -m site --user-base with no success.

I found the file in ~./local/bin/twarc and created and alias to it, and now I'm able to run it from the command line. However, when I try to configure twarc via twarc configure, I get the following error

Traceback (most recent call last):
  File "/home/kolic/.local/bin/twarc", line 11, in <module>
    sys.exit(main())
  File "/home/kolic/.local/lib/python3.6/site-packages/twarc/command.py", line 219, in main
    t.configure()
  File "/home/kolic/.local/lib/python3.6/site-packages/twarc/client.py", line 939, in configure
    print('\n\u2728 \u2728 \u2728  Happy twarcing! \u2728 \u2728 \u2728\n')
UnicodeEncodeError: 'ascii' codec can't encode character '\u2728' in position 1: ordinal not in range(128)

Any ideas on what could be happening? Maybe I shouldn't have done an alias?

Sorry for all the mess I'm making!

from hydrator.

blas-ko avatar blas-ko commented on September 15, 2024

It worked!!!
Thanks a ton! :)

from hydrator.

edsu avatar edsu commented on September 15, 2024

Nice, please feel free to open new issues here or over in the twarc repository if you run into more issues.

from hydrator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.