Comments (8)
Hi @blas-ko -- it really ought to be possible to hydrate directly to CSV for people who don't want the JSON at all. Also, I think it would help in situations like this to write to a compressed file.
Unfortunately until #15 is resolved you will notice degraded performance as Hydrator works its way into very large tweet id files. Hopefully it will be resolved soon though.
Since you are using Linux I'm guessing you may have some familiarity with the command line? If that is the case, for working with large files, and having more control over how things are written you can try our other tool twarc.
For example if you want you can hydrate your ids and write them as a gzip compressed CSV file with the following command:
twarc --format csv hydrate ids.txt | gzip - > tweets.csv.gz
If you run this on a dedicated vm in a tmux or screen session you can let it run for as long as it needs to. You can also sample it as it is written by just streaming the data to another program:
zcat tweets.csv.gz | analyze.jl
from hydrator.
Wow, thats a new one! I am glad you figured out where the command was installed. It looks your operating system's default encoding is not utf8--which is unusual these days, but not unheard of.
I recall you are working on a shared system? So you may not have control over the default encoding. Could you try setting this in your shell before you run twarc?
export PYTHONIOENCODING="utf-8"
If that works you might want to add it to your ~/.profile so you don't have to remember to do it every time you open a new terminal session.
terminal.
https://docs.python.org/3.8/using/cmdline.html#environment-variables
from hydrator.
Hey @edsu! Thanks for pointing me out to twarc! It's great :).
I've been able to install twarc on my personal computer successfully. However, in the server where I have access to (but where I don't have root privileges) I have been only able to import twarc from python, but not running it from the command line (I get a Command 'twarc' not found
message).
Should this be addressed in a separate issue? Let me know! Thanks!
from hydrator.
@blas-ko Did you install on your server with pip install --user twarc
?
from hydrator.
@blas-ko Just following up, if you did install with --user
you should be able to find the twarc executable in your "user base". This location is platform dependent, but you can find it by running this at the command line:
python3 -m site --user-base
(omit the 3 from python3 if you are using another version)
I suspect that the directory you see on output is not in your PATH. If you add it to your PATH then typing twarc on the command line will work. Let me know if you need any help adjusting your PATH to include that directory.
from hydrator.
Hey @edsu! I installed it both as normally pip3 install twarc
and with the user flag pip3 install --user twarc
.
I tried the python3 -m site --user-base
with no success.
I found the file in ~./local/bin/twarc
and created and alias to it, and now I'm able to run it from the command line. However, when I try to configure twarc via twarc configure
, I get the following error
Traceback (most recent call last):
File "/home/kolic/.local/bin/twarc", line 11, in <module>
sys.exit(main())
File "/home/kolic/.local/lib/python3.6/site-packages/twarc/command.py", line 219, in main
t.configure()
File "/home/kolic/.local/lib/python3.6/site-packages/twarc/client.py", line 939, in configure
print('\n\u2728 \u2728 \u2728 Happy twarcing! \u2728 \u2728 \u2728\n')
UnicodeEncodeError: 'ascii' codec can't encode character '\u2728' in position 1: ordinal not in range(128)
Any ideas on what could be happening? Maybe I shouldn't have done an alias?
Sorry for all the mess I'm making!
from hydrator.
It worked!!!
Thanks a ton! :)
from hydrator.
Nice, please feel free to open new issues here or over in the twarc repository if you run into more issues.
from hydrator.
Related Issues (20)
- Cannot link to my account HOT 3
- Not recognizing new consumer key and secret key HOT 1
- Contents of tweets are not complete HOT 7
- Reconstructing Threads HOT 7
- Losing lots of tweets when hydrating HOT 1
- javascript error HOT 1
- Number of hydrated tweets vastly smaller than rows in resulting CSV HOT 4
- When the Tweet ID file has windows line endings, the app throws an error
- Perpetually "Verifying Microsoft Excel.app" HOT 5
- Respect HTTP Proxy environment variables
- [Security] Workflow main.yml is using vulnerable action actions/checkout
- Cannot link to twitter account in 3.0 version. HOT 3
- Only 5 % tweets hydrated HOT 10
- problem with Hydrator HOT 1
- Hydrator slowing down HOT 1
- ids in jsonl file don't match original tweet ids HOT 1
- Can't find .env file HOT 1
- The link Twitter account is invalid HOT 5
- Hydrator Error HOT 6
- Invalid Tweet ID Error HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hydrator.