jeroenjanssens / data-science-at-the-command-line Goto Github PK
View Code? Open in Web Editor NEWData Science at the Command Line
Home Page: https://datascienceatthecommandline.com
License: Other
Data Science at the Command Line
Home Page: https://datascienceatthecommandline.com
License: Other
Dear @jeroenjanssens,
first of all thank you: I learned a lot from your book, reading it there is a bit 'more light in my brain.
I try to use scrape with an XML file that seems properly formatted, I use a rigth XPATH query, but I obtain empy result.
This is my command:
curl 'http://referendum2016.comune.palermo.it/AFFLSEZ_1_82053_R1.xml' | scrape -be '//SV'
What's wrong in it?
Thank you
Hi,
I'm writing here because you have great experience with csvkit.
Did you ever have test cron with a csvsql process? If yes, are you able to produce an output file?
I have always 0 kb output file.
I have just opened an issue wireservice/csvkit#342. I think it could be something related to the creation of the temp sqlite file, but I'm not able to read anything also in the log (it's a zero kb file too).
Thank you very much
Hi, I am following your book Data Science at the Command Line and its awesome. While most things have worked so far as I am installing individual components on my CentOS, there are issues off and on. Is this a good place to ask? For example the plotting code fashion.csv Rio -ge... throws the error display: no decode delegate for this image format `/tmp/magick-02f7rn9B' @ error/constitute.c/ReadImage/544. I have tried re-installing different versions of ImageMagick but without success. Your suggestion will be greatly appreciated. Thanks.
Hi,
I need to do this query
curl -L -s -A "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0" "http://www.unionesulserio.it" | /usr/local/bin/nadir/scrape -be '(//a[(contains(.,"pretorio")) or (contains(.,"Pretorio"))])[1]'
I have tested (//a[(contains(.,"pretorio")) or (contains(.,"Pretorio"))])[1]
XPath query with other tools and it seems to work, but scrape gives me "Invalid CSS selector".
Is my XPath really wrong?
Thank you
I installed everything according to http://datascienceatthecommandline.com/. After logging in with putty, I simply can't run the dst command.
-r flag loads the dplyr library, but I got error:
cat iris.csv | Rio -e -r -v "df %>% group_by(species) %>% mean(sepal_length)"
Error: object 'r' not found
Execution halted
cat: /var/folders/h5/3xs5c90n0njgp7n9_qdwwj2m0000gn/T//Rio-OLVgmOU2.err: No such file or directory
this command cannot get the result as shown in the book.
if I just type scrape -b -e 'table.wikitalbe',then the content of the table is printed
so how to remove the first 'tr' as said in the book, please give me some help
Allow me to introduce my dateutils, along with its dateseq
command to produce sequences of dates (or date/times) not only faster but more portable and flexible.
Your semantics (you specify dates as day difference relative to today) would have to go through dateadd
first though, as all tools in the toolkit take absolute dates. Your example from the book would hence become:
$ dateseq `dateadd today -2` today
2016-03-19
2016-03-20
2016-03-21
So I suppose one could write a wrapper so your dseq
tool wouldn't break its API.
The toolbox is great, as well as the book. But insofar as the book and the tools are so easy to use, perhaps, it's worth adding the tools to the homebrew
repository as a single package? Or maybe to add readme describing how to install the tools without installing the environment, which is described in the book?
I just released jsontsv yesterday. It is similar to json2csv
which you mention on your original blog post "Seven command line tools for data science." But I believe jsontsv is strictly speaking more powerful and not strictly speaking easier to use. Feedback is appreciated if you have time but in any case thanks for writing about this whole topic.
hi,
I am reading your book at data-science-at-the-command-line.
I wanna install some of tools , such as 'Rio' , the book mentioned 'The installation instructions are for Ubuntu only'.
Would you be kindly to tell me where to get 'The installation instructions '.
Thanks in advance!
FengYu
This will dump a PNG file to the tty:
< iris.csv Rio-scatter sepal_length sepal_width species
You never want that. If the output is not redirected to a file, display it on the screen with R's native displayer.
To see if stdout is a terminal you can use:
if [ -t 1 ] ; then
...
fi
Example provided will cause an error with paste using Bash & Zsh on MacOS. Believe this will cause an issue on all variants, but haven't checked on Linux/Docker as of yet.
$ fac() { (echo 1; seq $1) | paste -s -d\* | bc; }
$ fac 5
> usage: paste [-s] [-d delimiters] file ...
Suggestion is to add in a -
character, which will cause it to run correctly, making me think this is just an unchecked typo.
$ fac() { (echo 1; seq $1) | paste -s -d\* - | bc; }
$ fac 5
> 120
Hello:
Very nick tools!! Running Rio on Mac OS X, But have always have this information “ ARGUMENT…ignored”. How to correct/remove this? Any helps?
seq 100 | Rio -nf summary
ARGUMENT '',~+file='/var/folders/_0/yvrx0nkx76nfvz85q1dxg27183vg3m/T/Rio-X9BIzDnt.pdf')}else+~if(exists('is.ggplot')&&is.ggplot(last)){ggsave('/var/folders/_0/yvrx0nkx76nfvz85q1dxg27183vg3m/T/Rio-X9BIzDnt.pdf',last,dpi=72,units='cm',width=20,height=15);}else{sink('/var/folders/_0/yvrx0nkx76nfvz85q1dxg27183vg3m/T/Rio-X9BIzDnt.pdf');print(last);}' ignored
Resource http://www.gutenberg.org/cache/epub/76/pg76.txt is now stored as a compressed file.
That breaks samples.
There are couple of ways to fix it:
I think option one is better, because adding new command will no longer match samples in the book, and might confuse readers.
Rio
started out as a proof-of-concept (see: http://jeroenjanssens.com/2013/09/19/seven-command-line-tools-for-data-science.html). However, over time, this quick-and-dirty Bash script has been proven to be useful to me (and who knows, perhaps even to others). Unfortunately, the code is quite messy making it difficult to maintain and extend.
Because of this, and also because it's playing a role in my upcoming book, I believe that Rio
deserves to be cleaned up. (I've attempted to add some whitespace and newlines to that horrible SCRIPT
string, but either Rscript or Bash didn't like that.) Please let me know if you have any suggestions.
Hi,
I have created (with pyinstaller) a standalone version of scrape, because I need it in a python 3 PC https://github.com/aborruso/scrape-cli/releases
Scrape is a great tool, thank you to its author
Hi,
first of all this is a great book.
In "Executing a Command-line Tool" paragraph of chapter 2, page 21, there is "cd book/ch02/" command, but this directory does not exist in my vagrant installation.
Is it normal?
Thank you
The link (chapter 3.6) currently in the downloads something in binary. The correct link Huckleberry Finn that produces the desired result is:
I'm trying to pull the latest version but getting an error message as shown below:
docker pull datascienceworkshops/data-science-at-the-command-line
Using default tag: latest
Error response from daemon: Get https://registry-1.docker.io/v2/datascienceworkshops/data-science-at-the-command-line/manifests/latest: unauthorized: incorrect username or password
My docker info output in case it may help:
docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 18.03.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.87-linuxkit-aufs
Operating System: Docker for Windows
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.934GiB
Name: linuxkit-00155d01344c
ID: SUNZ:CTDE:UKFI:MPMJ:KJRG:ODSS:7LGL:MNUF:NKKI:GHRH:CSP3:DYLO
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 19
Goroutines: 35
System Time: 2018-04-29T21:39:50.4780016Z
EventsListeners: 1
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Last line of "header" fails if the header contains spaces.
Change from
print_header $OLDHEADER
to
print_header "${OLDHEADER}"
It would be awesome to have a Dockerfile, to autoinstall the project and run with. I'll open a Pull Request for this tomorrow.
Hi Jeroen, performing vagrant up led to the following SSL error, can you please update the certificates or guide me to a solution. Thanks.
[centos@localhost MyDataScienceToolbox]$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Box 'data-science-toolbox/data-science-at-the-command-line' could not be found. Attempting to find and install...
default: Box Provider: virtualbox
default: Box Version: >= 0
==> default: Loading metadata for box 'data-science-toolbox/data-science-at-the-command-line'
default: URL: https://atlas.hashicorp.com/data-science-toolbox/data-science-at-the-command-line
==> default: Adding box 'data-science-toolbox/data-science-at-the-command-line' (v1.0.0) for provider: virtualbox
default: Downloading: https://atlas.hashicorp.com/data-science-toolbox/boxes/data-science-at-the-command-line/versions/1.0.0/providers/virtualbox.box
An error occurred while downloading the remote file. The error
message, if any, is reproduced below. Please fix this error and try
again.
SSL certificate problem: unable to get local issuer certificate
More details here: http://curl.haxx.se/docs/sslcerts.html
curl performs SSL certificate verification by default, using a "bundle"
of Certificate Authority (CA) public keys (CA certs). If the default
bundle file isn't adequate, you can specify an alternate file
using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
the bundle, the certificate verification probably failed due to a
problem with the certificate (it might be expired, or the name might
not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
the -k (or --insecure) option.
Hi
I alreday have a VM belonging to Mining Social Web of Matthew Russell.
I would like to know the precautions to install Command Line VM.
Do I have to execute vagrant also?
I am using windows 8.1 and I dont have any issues with Mattehw VM.
Any suggestion of recomendation are appreciated.
Thanks for your help
Franco
@jeroenjanssens
On page 56 the following bash snippet does not work "out of the box" in the vagrant / virtualbox environment running Ubuntu 14.04 LTS as suggested in the book:
$ echo 'foo\nbar\nfoo' | sort | uniq -c | sort -nr
1 foo\nbar\nfoo
$ echo 'foo\nbar\nfoo' | sort | uniq -c | sort -nr | awk '{print $2","$1}' | header -a value,count
value,count
foo\nbar\nfoo,1
The following snippet works correctly with -e parameter given to echo:
$ echo -e 'foo\nbar\nfoo' | sort | uniq -c | sort -nr
2 foo
1 bar
$ echo -e 'foo\nbar\nfoo' | sort | uniq -c | sort -nr | awk '{print $2","$1}' | header -a value,count
value,count
foo,2
bar,1
Alternative way would be using printf
instead of echo
. There might be also differences between distributions and versions of Unix in terms of using -e parameter with echo.
The following snippet might be more universally acceptable:
$ echo $'foo\nbar\nfoo' | sort | uniq -c | sort -nr
2 foo
1 bar
$ echo $'foo\nbar\nfoo' | sort | uniq -c | sort -nr | awk '{print $2","$1}' | header -a value,count
value,count
foo,2
bar,1
Hi Jeroen,
on page 39 you call an outdated Twitter API:
$ curlicue -f credentials \
> 'https://api.twitter.com/1/statuses/home_timeline.xml'
It returns
<?xml version="1.0" encoding="UTF-8"?>
<errors>
<error code="64">
The Twitter REST API v1 is no longer active. Please migrate to API v1.1.
https://dev.twitter.com/docs/api/1.1/overview.
</error>
</errors>
The actual API endpoint is
https://api.twitter.com/1.1/statuses/home_timeline.json
So the proper call would be
$ curlicue -f credentials \
> 'https://api.twitter.com/1.1/statuses/home_timeline.json'
however it returns JSON and not XML as expected in v1.0 call, but since it is the end of the pipeline, and the returned value is not processed, it is not a big issue. I guess the API was deactivated at the end of last year.
Here: https://github.com/jeroenjanssens/data-science-at-the-command-line/blob/master/book/ch06/Drakefile
change last line ed
to sed
.
Working from book, on reproducing Fig 7-4 in the book, I run this from the ch07 directory:
$ <data/tips.csv Rio -ge 'g+geom_bar(aes(factor(size)))'
ARGUMENT '',~+~file='/var/folders/4s/gd5gw2bd2n16njbfqsg0j5yc0000gn/T/Rio-zQ6olMgv.png')}else~+~if(exists('is.ggplot')&&is.ggplot(last)){ggsave('/var/folders/4s/gd5gw2bd2n16njbfqsg0j5yc0000gn/T/Rio-zQ6olMgv.png',last,dpi=72,units='cm',width=20,height=15);}else{sink('/var/folders/4s/gd5gw2bd2n16njbfqsg0j5yc0000gn/T/Rio-zQ6olMgv.png');print(last);}' __ignored__
Loading required package: ggplot2
Loading required package: methods
I would expect that binary would stream to stdout that I could pipe to display
or a .png file, but I get only this. Also, If I run the following, I get the results below:
$ <data/tips.csv Rio -e 'head(df)'
ARGUMENT '',~+~file='/var/folders/4s/gd5gw2bd2n16njbfqsg0j5yc0000gn/T/Rio-xGYRL7dR.png')}else~+~if(exists('is.ggplot')&&is.ggplot(last)){ggsave('/var/folders/4s/gd5gw2bd2n16njbfqsg0j5yc0000gn/T/Rio-xGYRL7dR.png',last,dpi=72,units='cm',width=20,height=15);}else{sink('/var/folders/4s/gd5gw2bd2n16njbfqsg0j5yc0000gn/T/Rio-xGYRL7dR.png');print(last);}' __ignored__
bill tip sex smoker day time size
1 16.99 1.01 Female No Sun Dinner 2
2 10.34 1.66 Male No Sun Dinner 3
3 21.01 3.50 Male No Sun Dinner 3
4 23.68 3.31 Male No Sun Dinner 2
5 24.59 3.61 Female No Sun Dinner 4
6 25.29 4.71 Male No Sun Dinner 4
What I'm not expecting is the line starting with ARGUMENT ...
. Could not figure out what is wrong. Any ideas?
I'm sure it must be possible to install this set of tools directly onto a server running on AWS; I don't know yet if the supplied information supports this directly.
I have the vagrant-aws
plugin for vagrant
running.
Ah - some more searching - the directions (which are quite good) are at
http://datasciencetoolbox.org/
and select the tab "In the cloud"; that should be enough info to get it running.
$ seq 12 | Rio -e 'df**2'
mktemp: failed to create file via template `/tmp/user/1001Rio-XXXXXXXX': Permission denied
mktemp: failed to create file via template `/tmp/user/1001Rio-XXXXXXXX': Permission denied
mktemp: failed to create file via template `/tmp/user/1001Rio-XXXXXXXX': Permission denied
/opt/Rio: line 115: $IN: ambiguous redirect
Rscript execution error: No such file or directory
Hi, do you mind writing a version of the scrape utility in Golang for better cross platform support? Thanks in advance!
It seems like cols is not working.
No matter how I run it, I always get this response:
usage: mktemp [-d] [-q] [-t prefix] [-u] template ...
mktemp [-d] [-q] [-u] -t prefix
mkfifo: /other_columns: Permission denied
/Applications/command-line-tools/cols: line 24: ${ARG~~}: bad substitution
tee: /other_columns: Permission denied
Hi,
I am having trouble starting the VM using vagrant up. I have tried on both windows and ubuntu.
I get to the password/private key authentication part and can progress no further. Working with the VirtualBox GUI doesn't help either.
Is there any other information I can provide to help diagnose the problem? My Vagrant config file is quite basic (mostly defaults, except that I tried to set password instead of relying on private key due to error- didn't help), but I can share that if needed.
Thanks alot.
Rohail
The book Data Science at the Command Line has been getting some really good reviews on the O'Reilly product page. I have to say: it's great to get feedback from readers. (It's also a nice confirmation that I'm not the only person crazy enough to think that the command line can be used for doing data science.) ;)
On Amazon and other book websites, there are unfortunately currently very few or no reviews. This makes it difficult for someone to decide whether the book would be useful to them or not.
If you have read the book and you have an opinion about it, whether it's positive or negative, then it would be greatly appreciated if you would spend a few minutes writing it down and submitting it as a review to Amazon (or any other book website). More reviews means that potential readers can better inform themselves, which could potentially lead to more command-line users.
Thanks for helping me out!
Cheers,
Jeroen
PS. I realize that not everybody who's using this repository has actually read the book, so forgive me for posting this question here. If you're still wondering whether you should buy the book or not, the first chapter is available for free at O'Reilly.
PPS. Of course, if you have any feedback not suited for a review, then you can always open a GitHub issue or contact me on Twitter.
Hi there, I have a tab delimited file, and when I use Rio -d "\t" -e "summary(df)", I got some errors.
cat iris.tsv | head | Rio -d"\t" -e "summary(df)"
ARGUMENT '',stringsAsFactors=F);summary(df);last<-.Last.value;if(is.matrix(last)){last<-as.data.frame(last)};if(is.data.frame(last)){write.table(last,'/var/folders/h5/3xs5c90n0njgp7n9_qdwwj2m0000gn/T//Rio-fK1bGblg.png',sep=',',quote=T,qmethod='double',row.names=F,col.names=T);}else~+if(is.vector(last)){cat(last,sep='\n',+file='/var/folders/h5/3xs5c90n0njgp7n9_qdwwj2m0000gn/T//Rio-fK1bGblg.png')}else+~if(exists('is.ggplot')&&is.ggplot(last)){ggsave('/var/folders/h5/3xs5c90n0njgp7n9_qdwwj2m0000gn/T//Rio-fK1bGblg.png',last,dpi=72,units='cm',width=20,height=15);}else{sink('/var/folders/h5/3xs5c90n0njgp7n9_qdwwj2m0000gn/T//Rio-fK1bGblg.png');print(last);}' ignored
Do you have any idea?
Thanks,
Ming Tang
I'm on Ubuntu 16.04.6 LTS
Trying to run the ch 3.2.1 example on cols, I got a strange output :
| day | bill | tip | sex | smoker | time | size |
| ---------- | ----- | ---- | ------ | ------ | ------ | ---- |
| 0001-01-07 | 16,99 | 1,01 | Female | False | Dinner | 2 |
| 0001-01-07 | 10,34 | 1,66 | Male | False | Dinner | 3 |
| 0001-01-07 | 21,01 | 3,50 | Male | False | Dinner | 3 |
| 0001-01-07 | 23,68 | 3,31 | Male | False | Dinner | 2 |
When switching 'day' w/ 'sex', the upper-case operation works but the day column is still messep up :
| sex | bill | tip | smoker | day | time | size |
| ------ | ----- | ---- | ------ | ---------- | ------ | ---- |
| FEMALE | 16,99 | 1,01 | False | 0001-01-07 | Dinner | 2 |
| MALE | 10,34 | 1,66 | False | 0001-01-07 | Dinner | 3 |
| MALE | 21,01 | 3,50 | False | 0001-01-07 | Dinner | 3 |
| MALE | 23,68 | 3,31 | False | 0001-01-07 | Dinner | 2 |
I printed the head of my tips.csv (downloaded latest version) and it's all in order.
Any idea what went wrong ?
There are some places that return a non-successful error code (anything but 0) when there's no error. For example, https://github.com/jeroenjanssens/data-science-at-the-command-line/blob/master/tools/header#L65
This is a big problem because some interprocess communication tools fail because they depend on a successful exit status. Should I submit a pull-request?
vagrant@data-science-toolbox:~$ sql2csv --db "mysql://user:[email protected]:3306/database" --query "select count(*) from session"
You don't appear to have the necessary database backend installed for connection string you're trying to use.. Available backends include:
Postgresql: pip install psycopg2
MySQL: pip install MySQL-python
For details on connection strings and other backends, please see the SQLAlchemy documentation on dialects at:
http://www.sqlalchemy.org/docs/dialects/
To fix I did:
sudo apt-get update
sudo apt-get install libmysqlclient-dev
sudo pip install MySQL-python
I use virtual linux on my windows 7.
After I installed virtualbox and vagrant and followed the environment set up steps, the terminal show up the warming like this :
Progress: 90%There was an error while executing VBoxManage
, a CLI used by Vagrant
for controlling VirtualBox. The command and stderr is shown below.
Command: ["import", "/home/datatest1/.vagrant.d/boxes/data-science-toolbox-VAGRANTSLASH-data-science-at-the-command-line/1.0.0/virtualbox/box.ovf", "--vsys", "0", "--vmname", "packer-virtualbox-iso_1523945099207_28351", "--vsys", "0", "--unit", "7", "--disk", "/home/datatest1/VirtualBox VMs/packer-virtualbox-iso_1523945099207_28351/packer-virtualbox-iso-disk1.vmdk"]
Stderr: 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Interpreting /home/datatest1/.vagrant.d/boxes/data-science-toolbox-VAGRANTSLASH-data-science-at-the-command-line/1.0.0/virtualbox/box.ovf...
OK.
0%...10%...20%...30%...40%...50%...60%...70%...
Progress state: VBOX_E_FILE_ERROR
VBoxManage: error: Appliance import failed
VBoxManage: error: Could not create the imported medium '/home/datatest1/VirtualBox VMs/packer-virtualbox-iso_1523945099207_28351/packer-virtualbox-iso-disk1.vmdk'.
VBoxManage: error: VMDK: cannot write allocated data block in '/home/datatest1/VirtualBox VMs/packer-virtualbox-iso_1523945099207_28351/packer-virtualbox-iso-disk1.vmdk' (VERR_DISK_FULL)
VBoxManage: error: Details: code VBOX_E_FILE_ERROR (0x80bb0004), component ApplianceWrap, interface IAppliance
VBoxManage: error: Context: "RTEXITCODE handleImportAppliance(HandlerArg*)" at line 886 of file VBoxManageAppliance.cpp
How can I fix it !!!!
Hi,
I have this CSV (input_out.csv):
nome,id,url,start,creato,venue_id,logo
Palermo (Sicilia) - 25 Maggio 2016 25/5/2016 Karaoke di beneficenza di Chi ama la Sicilia,25180566753,http://www.eventbrite.it/e/biglietti-palermo-sicilia-25-maggio-2016-2552016-karaoke-di-beneficenza-di-chi-ama-la-sicilia-25180566753?aff=ebapi,2016-05-25T20:00:00,2016-05-03T20:40:12Z,15149091,https://img.evbuc.com/https%3A%2F%2Fimg.evbuc.com%2Fhttp%253A%252F%252Fcdn.evbuc.com%252Fimages%252F20814649%252F175077385541%252F1%252Foriginal.jpg%3Frect%3D0%252C172%252C860%252C430%26s%3Dd9a4bfa29cc27f85d8428de320cc9b3c?h=200&w=450&s=7ed859da13004f17403a9a3b0e1b2f7b
Tech-Marketplace & StartupItalia! Open Summit Tour 2016,25161107550,http://www.eventbrite.it/e/biglietti-tech-marketplace-startupitalia-open-summit-tour-2016-25161107550?aff=ebapi,2016-05-17T15:00:00,2016-05-03T10:49:31Z,15134493,https://img.evbuc.com/https%3A%2F%2Fimg.evbuc.com%2Fhttp%253A%252F%252Fcdn.evbuc.com%252Fimages%252F20794591%252F68137449621%252F1%252Foriginal.jpg%3Frect%3D473%252C0%252C3674%252C1837%26s%3De156d1a692274f2b3e48fa61b9e3964d?h=200&w=450&s=cfa6f9b49d4dee5180e53c71931a167d
I would like to replace the &
characther that I have in the first column with &
.
If I write:
cat input_out.csv | sed -e 's/[\/&]/test/g' > out.txt
I have what I want. But I would like to work only in the first column. But if I write:
< input_out.csv cols -c nome body sed -e 's/[\/&]/test/g' > out.txt
I have
sed: -e expression #1, char 4: unterminated `s' command
What's wrong in my command?
Thank you
man sample
DESCRIPTION
sample is a command-line tool for gathering data about the running behav-
ior of a process. It suspends the process at specified intervals (by
default, every 1 millisecond), records the call stacks of all threads in
the process at that time, then resumes the process. The analysis done by
Running your example curl -s 'http://en.wikipedia.org/wiki/List_of_countries_and_territories_by_border/area_ratio' | scrape -b -e 'table.wikitable > tr:not(:first-child)' | head
I get:
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
Seems not to be working with python 3 versions.
args.expression = [e.decode('utf-8') for e in args.expression]
AttributeError: 'str' object has no attribute 'decode'
First of all, I would like to thank Crevax for solving the previous issue I had - I was able to pull the data and access the files in the book! But now, I am running into another issue. This may not be a problem with the data or files, but I believe it is a problem for those who are new to the command line because I am not able to resolve this issue myself.
Following the examples in Chapter 2 to get to the directory for chapter to and examine the 'movies.txt' file, I can see that the file is there, but when i try to run the command:
head -n 3 data/movies.txt
I get the error "head: cannot open 'data/movies.txt' for reading: No such file or directory" even though I can see the file with the ls command!
I have included a screenshot of my command line prompt for someone better versed in docker to see if I have made a mistake.
I think it is worth noting that I cannot exactly follow the docker run command from the book as the ` symbol throws errors in the windows cmd, so maybe my arbitrary directory naming of "dsacm" is the problem. Also there is the fact that I have to take a roundabout path to get to the actual directory containing the data - not sure if this has something to do with it either.
If this is an issue with my docker experience I would greatly appreciate some sources of how to learn docker (preferably other than the official docker website, it's descriptions are a bit abstract for someone with 0 formal CS education). But really if anyone could shed some light on the issue I am having I would be very grateful. Thanks!
Additional info: I am running Windows 10 x64 with Docker version 17.12.0-ce, build c97c6d6
The online version of the book is missing the appendix, which helpfully lists all of the command-line tools mentioned in the book with a brief description of each. It is mentioned several times in the text but seems to be missing from this repository and the hosted version of the book.
This is a super useful reference that I refer to often when trying to figure out how to do something on the command line, or to remind myself of tools that I don't use often—it should be included in the online version!
Hi, Jeroen!
First of all, thanks for the awesome work and clear explanations about Data Science and Command Line Tools.
I've found the site when reading HN today. Then, I followed the through the first chapters until I got stuck at 2.2 Installing the Docker Image.
I'm aware it is currently a WIP to update the online version of the book. But I was wondering if there is something I could do to keep going once I ran the Docker container. Now, it seems the files you use are not there yet (e.g. book/
folder), which makes it difficult to follow your first examples.
Am I missing something here? I would appreciate some help 😃
Thanks in advance!
I'm releasing a new tool that might be useful to the data science toolkit. (I don't know how else to inform you about it @jeroenjanssens other than a GitHub issue, since I don't use Twitter.)
https://github.com/danchoi/table
table
formats lines of TSV, CSV, or DSV (delimiter-separated values) into a pretty plain text table, wrappings cells with long content to try to fit the table in the screen.
Perhaps the URL of the API is outdated in page 38
curl -s http://api.randomuser.me | jq '.'
It returns nothing at all
Best,
Òscar
Hi all,
I believe I managed to have the environment up and running in Windows for thus of us who are interested. This requires the following steps, those are not too hard but can be time consuming. I'll assume you're using Windows 10.
This involves to 1.a) ensure you are a Windows Insider user, explained here
and that you have 1.b) WSL up and running, explained here.
I had the following issue but the fix is in the thread.
Next, and point 2) you'll need a valid docker setup with WSL 2, which is explained
in 2.a) the docker documentation
The following 2.b) article was also helpful, be sure to run Windows containers.
Finally, with docker up and running as seen in the pictures and with a WSL prompt open you can simply use the steps describe in point 2 of the book.
The only thing I didn't get right is mapping the book to a directory with a docker volume
, if anyone has any ideas I'm all ears.
Hope it helps!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.