scraperwiki / cobalt Goto Github PK
View Code? Open in Web Editor NEWQuickCode service to run code in a sandboxed Unix shell account
Home Page: https://quickcode.io/
License: Other
QuickCode service to run code in a sandboxed Unix shell account
Home Page: https://quickcode.io/
License: Other
This command
time curl -d apikey=THISISNOTTHESECRETKEY -d cmd='sleep 6%26' https://premium.scraperwiki.com/ecuaz6q/exec
(you'll have to use the right apikey) takes more than 6 seconds to complete. The exec endpoint is waiting for the background task to complete.
Right now, if you want to start a background task you have to detach it from stdin, stdout, and stderr by adding < /dev/null > /dev/null 2> /dev/null
.
This is too awkward.
@drj11 and I speculate that a setTimeout is being non-deterministically called such that frequently when mocha test/exec_endpoint.coffee
is run it takes a very long time to complete.
This can be tested with http://linux.die.net/man/8/tcpkill.
This error checking needs improving. Currently it just panics. Instead it should recreate the database connection and retry. It may require thought about synchronization logic and resource leakage.
https://github.com/scraperwiki/cobalt/blob/master/go/daemons/check-token/main.go#L57
I've put the passwd files on ~frabcus/ on free. gen-passwd.sorted
is one I made using scraperwiki-generate-extrafiles
. passwd.sorted
is the one I found in /var/lib/extrausers
[co]drj$ time mocha test/update_listener.coffee │···············
done │···············
│···············
│···············
Box update subscriptions │···············
when we receive a message on the box channel │···············
✓ subscribes to the correct channel pattern │···············
1) execs the update hook in the boxes │···············
Gracefully stopping... │···············
│···············
│···············
✖ 1 of 2 tests failed: │···············
│···············
1) Box update subscriptions when we receive a message on the box channel execs the update hook in the boxes│···············
: │···············
AssertionError: expected false to be true │···············
at Object.true (/home/drj/sw/cobalt/node_modules/should/lib/should.js:255:10) │···············
at null.<anonymous> (/home/drj/sw/cobalt/test/update_listener.coffee:53:30) │···············
at args.(anonymous function) [as _onTimeout] (/home/drj/sw/cobalt/node_modules/nodetime/lib/core/proxy.│···············
js:131:20) │···············
at Timer.listOnTimeout [as ontimeout] (timers.js:110:15)
I can't now remember which ones, but I've found a card on my desk, so I was going to put the notes here:
There were 3 possible packages to install to fix locales can't now find the website that listed them.
'export LANG=C' in ~/.bashrc is a workaround
Edit: see 'Hints' section of help.ubuntu.com/community/DebootstrapChroot
11 Sept 2013, last night between 2300 and 0600 the mailcheck cron job appears to have not sent any e-mail.
Sep 10 23:20:38 ds-live-0 postfix/smtpd[16532]: connect from localhost[127.0.0.1]
Sep 10 23:20:38 ds-live-0 postfix/smtpd[16532]: D9E4E427E1: client=localhost[127.0.0.1]
Sep 10 23:20:38 ds-live-0 postfix/cleanup[16535]: D9E4E427E1: message-id=<20130910232038.D9E4E427E1@ds-liv
e-0.scraperwiki.net>
Sep 10 23:20:38 ds-live-0 postfix/qmgr[1226]: D9E4E427E1: from=<[email protected]>, size=4
75, nrcpt=1 (queue active)
Sep 10 23:20:38 ds-live-0 postfix/smtpd[16532]: disconnect from localhost[127.0.0.1]
Sep 10 23:20:38 ds-live-0 postfix/smtpd[16532]: connect from localhost[127.0.0.1]
Sep 10 23:20:38 ds-live-0 postfix/smtpd[16532]: F1ACC430A7: client=localhost[127.0.0.1]
Sep 10 23:20:39 ds-live-0 postfix/cleanup[16535]: F1ACC430A7: message-id=<20130910232038.F1ACC430A7@ds-liv
e-0.scraperwiki.net>
Sep 10 23:20:39 ds-live-0 postfix/qmgr[1226]: F1ACC430A7: from=<[email protected]>, size=7
50, nrcpt=1 (queue active)
Sep 10 23:20:39 ds-live-0 postfix/smtpd[16532]: disconnect from localhost[127.0.0.1]
Sep 10 23:20:39 ds-live-0 postfix/local[16536]: F1ACC430A7: to=<[email protected]>, orig_t
o=<dmpky2q>, relay=local, delay=0.05, delays=0.04/0/0/0.01, dsn=2.0.0, status=sent (delivered to mailbox)
Sep 10 23:20:39 ds-live-0 postfix/qmgr[1226]: F1ACC430A7: removed
Sep 10 23:20:39 ds-live-0 postfix/smtp[17834]: D9E4E427E1: to=<[email protected]>, relay=aspmx.l.g
oogle.com[173.194.76.27]:25, delay=0.67, delays=0.02/0.01/0.06/0.58, dsn=2.0.0, status=sent (250 2.0.0 OK
1378855248 d6si11759489qej.77 - gsmtp)
Sep 10 23:20:39 ds-live-0 postfix/qmgr[1226]: D9E4E427E1: removed
Sep 10 23:20:39 ds-live-0 postfix/smtpd[16532]: connect from localhost[127.0.0.1]
Sep 10 23:20:39 ds-live-0 postfix/smtpd[16532]: CCD34427E1: client=localhost[127.0.0.1]
Sep 10 23:20:39 ds-live-0 postfix/cleanup[16535]: CCD34427E1: message-id=<20130910232039.CCD34427E1@ds-liv
e-0.scraperwiki.net>
Sep 10 23:20:39 ds-live-0 postfix/qmgr[1226]: CCD34427E1: from=<[email protected]>, size=6
65, nrcpt=1 (queue active)
Sep 10 23:20:39 ds-live-0 postfix/smtpd[16532]: disconnect from localhost[127.0.0.1]
Sep 10 23:20:39 ds-live-0 postfix/local[16536]: CCD34427E1: to=<[email protected]>, orig_t
o=<bxjbemy>, relay=local, delay=0.02, delays=0.01/0/0/0.01, dsn=2.0.0, status=sent (delivered to mailbox)
Sep 10 23:20:39 ds-live-0 postfix/qmgr[1226]: CCD34427E1: removed
Sep 10 23:24:40 ds-live-0 postfix/smtpd[20928]: connect from localhost[127.0.0.1]
Sep 10 23:24:40 ds-live-0 postfix/smtpd[20928]: 29395427E1: client=localhost[127.0.0.1]
Sep 10 23:24:40 ds-live-0 postfix/cleanup[20931]: 29395427E1: message-id=<20130910232440.29395427E1@ds-liv
e-0.scraperwiki.net>
Sep 10 23:24:40 ds-live-0 postfix/qmgr[1226]: 29395427E1: from=<[email protected]>, size=665, nrcpt=1 (queue active)
Sep 10 23:24:40 ds-live-0 postfix/smtpd[20928]: disconnect from localhost[127.0.0.1]
Sep 10 23:24:40 ds-live-0 postfix/local[20932]: 29395427E1: to=<[email protected]>, orig_to=<g23mp5i>, relay=local, delay=0.03, delays=0.02/0.01/0/0.01, dsn=2.0.0, status=sent (delivered to mailbox)
Sep 10 23:24:40 ds-live-0 postfix/qmgr[1226]: 29395427E1: removed
[[ INTERRUPTION OF MAILS ]]
Sep 11 06:30:38 ds-live-0 postfix/smtpd[24178]: connect from localhost[127.0.0.1]
Sep 11 06:30:38 ds-live-0 postfix/smtpd[24178]: 50F7C43AA3: client=localhost[127.0.0.1]
Sep 11 06:30:38 ds-live-0 postfix/cleanup[24181]: 50F7C43AA3: message-id=<20130911063038.50F7C43AA3@ds-liv
e-0.scraperwiki.net>
Sep 11 06:30:38 ds-live-0 postfix/qmgr[1226]: 50F7C43AA3: from=<[email protected]>, size=4
75, nrcpt=1 (queue active)
Sep 11 06:30:38 ds-live-0 postfix/qmgr[1226]: 50F7C43AA3: from=<[email protected]>, size=4
75, nrcpt=1 (queue active)
Sep 11 06:30:38 ds-live-0 postfix/smtpd[24178]: disconnect from localhost[127.0.0.1]
Sep 11 06:30:38 ds-live-0 postfix/smtpd[24178]: connect from localhost[127.0.0.1]
Sep 11 06:30:38 ds-live-0 postfix/smtpd[24178]: 67347470A2: client=localhost[127.0.0.1]
Sep 11 06:30:38 ds-live-0 postfix/cleanup[24181]: 67347470A2: message-id=<20130911063038.67347470A2@ds-liv
e-0.scraperwiki.net>
Sep 11 06:30:38 ds-live-0 postfix/qmgr[1226]: 67347470A2: from=<[email protected]>, size=7
50, nrcpt=1 (queue active)
Sep 11 06:30:38 ds-live-0 postfix/smtpd[24178]: disconnect from localhost[127.0.0.1]
Sep 11 06:30:38 ds-live-0 postfix/local[24182]: 67347470A2: to=<[email protected]>, orig_t
o=<dmpky2q>, relay=local, delay=0.02, delays=0.01/0/0/0.01, dsn=2.0.0, status=sent (delivered to mailbox)
Sep 11 06:30:38 ds-live-0 postfix/qmgr[1226]: 67347470A2: removed
Sep 11 06:30:39 ds-live-0 postfix/smtp[24800]: 50F7C43AA3: to=<[email protected]>, relay=aspmx.l.google.com[173.194.74.27]:25, delay=0.92, delays=0.01/0.01/0.07/0.83, dsn=2.0.0, status=sent (250 2.0.0 OK 1378881048 v15si13135669qef.44 - gsmtp)
Sep 11 06:30:39 ds-live-0 postfix/qmgr[1226]: 50F7C43AA3: removed
Sep 11 06:32:39 ds-live-0 postfix/smtpd[26553]: connect from localhost[127.0.0.1]
Sep 11 06:32:39 ds-live-0 postfix/smtpd[26553]: 3802943AA3: client=localhost[127.0.0.1]
Sep 11 06:32:39 ds-live-0 postfix/cleanup[26556]: 3802943AA3: message-id=<[email protected]>
Sep 11 06:32:39 ds-live-0 postfix/qmgr[1226]: 3802943AA3: from=<[email protected]>, size=688, nrcpt=1 (queue active)
Sep 11 06:32:39 ds-live-0 postfix/smtpd[26553]: disconnect from localhost[127.0.0.1]
Sep 11 06:32:39 ds-live-0 postfix/local[26557]: 3802943AA3: to=<[email protected]>, orig_to=<dmzeliy>, relay=local, delay=0.03, delays=0.02/0.01/0/0.01, dsn=2.0.0, status=sent (delivered to mailbox)
Sep 11 06:32:39 ds-live-0 postfix/qmgr[1226]: 3802943AA3: removed
Suggest we put $HOME/.local/bin
on the PATH for all boxes (add to /etc/basejail/etc/profile
or equivalent).
This will mean that binaries installed via pip with pip install --user
will just appear on the PATH.
The code for the tool has definitely been updated, but the output files haven't changed since July 16th. The scraper has run (and called the status endpoint) lots of times since then.
Currently cgroups is only configured for free, and a hack is put in place to turn it off for ds.
We'd like to be able to protect ds against e.g. a rouge script eating all of the resources, so we should have a configurable limit per service.
I can't work out what is making this - can't find any refernce to cobalt-disk-usage in our source code, or in /etc on free. There's an email referencing something of the same name on DS server from 9th April.
My guess is @pwaller has deleted it already since yesterday morning?
Date: Mon, 5 May 2014 01:02:51 +0000 (UTC)
From: Cron Daemon <[email protected]>
To: [email protected]
Subject: Cron <root@cobalt-f> ionice -c3 (echo date: $(date); time du -h --time --time-style=+%s -d1 /var/lib/cobalt/home | sort -hr) | tee -a /var/log/cobalt-disk-usage
/bin/sh: 1: Syntax error: "(" unexpected
It would potentially be useful if we could read from different sqlite files.
See also:
scraperwiki-archive/loggit#6
It's getting significant, see:
Attempting to wget
a 2.7G zip file from a box appears to truncate it with no errors after:
1,128,729,092 bytes, 10m4s [as reported by wget]
1,243,913,260 bytes, 10m24s [as reported by wget; actual filesize same]
Previous attempts to download the same file have presumably succeeded in the past.
https://ds-ec2.scraperwiki.com/cyrtub8/vorlqzxnepf5kzg/http/aaib.zip (dated Sep 19 22:51, sounds about right.)
Some datasets are getting a 502 bad gateway for sqlite endpoint queries.
These look something like this in /var/log/nginx/error.log
:
2015/07/08 21:01:31 [error] 959#0: *368 FastCGI sent in stderr: "Traceback (most recent call last):
File "/opt/dumptruck-web/dumptruck_web.py", line 271, in <module>
print sql()
File "/opt/dumptruck-web/dumptruck_web.py", line 167, in sql
body = json.dumps(body)
File "/usr/lib/python2.7/json/__init__.py", line 243, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python2.7/json/encoder.py", line 184, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <read-write buffer ptr 0x7fa6b5fe3a30, size 40 at 0x7fa6b5fe39f0> is not JSON serializable" while reading response header from upstream
The problem appears to be that dumptruck barfs with the latest python setup if you attempt to view a dataset with BLOB columns.
We were accidentally creating blob columns briefly, but I don't believe that ever made it to production, so I guess these datasets may always have had that issue (though it may only just now be manifesting in this way).
as per the BSC chat with @zarino this morning I think the sql.meta endpoint should return status 200 in this case with a JSON object that describes the missing database.
Should probably make the result look something like this:
{"databaseType": "none",
"table": {}
}
It was removed from 69e142b. Code cleanup needed.
This is bad.
Please can you install fastkml and its requirements in order that the geojson tool can process KML.
As a ScraperWiki DevOp
I want to handle more than 30,000 boxes on each server
So that we can scale to tens of thousands of datasets/tools
We need coffeescript, and whatever other hipster languages we can think of.
Sometimes a box doesn't get its unix username in the passwd file. The result is bad for users, because they see lots of errors like:
My tweet scraping setups say "No passwd entry for user.." for a random username
Looks like someone forgot to install cgroups on cobalt-dev2 ?
How can we fix our process so that this doesn't happen?
/usr/share/libpam-script/pam_script_ses_open: line 51: cgcreate: command not found
/usr/share/libpam-script/pam_script_ses_open: line 54: /sys/fs/cgroup/memory/bvbojti/memory.limit_in_bytes: No such file or directory
/usr/share/libpam-script/pam_script_ses_open: line 60: /sys/fs/cgroup/cpu/bvbojti/cpu.shares: No such file or directory
/usr/share/libpam-script/pam_script_ses_open: line 72: /sys/fs/cgroup/cpu/bvbojti/tasks: No such file or directory
/usr/share/libpam-script/pam_script_ses_open: line 73: /sys/fs/cgroup/memory/bvbojti/tasks: No such file or directory
/usr/share/libpam-script/pam_script_ses_open: line 74: /sys/fs/cgroup/cpuacct/bvbojti/tasks: No such file or directory
/usr/share/libpam-script/pam_script_ses_open: line 51: cgcreate: command not found
/usr/share/libpam-script/pam_script_ses_open: line 54: /sys/fs/cgroup/memory/bvbojti/memory.limit_in_bytes: No such file or directory
/usr/share/libpam-script/pam_script_ses_open: line 60: /sys/fs/cgroup/cpu/bvbojti/cpu.shares: No such file or directory
/usr/share/libpam-script/pam_script_ses_open: line 72: /sys/fs/cgroup/cpu/bvbojti/tasks: No such file or directory
/usr/share/libpam-script/pam_script_ses_open: line 73: /sys/fs/cgroup/memory/bvbojti/tasks: No such file or directory
/usr/share/libpam-script/pam_script_ses_open: line 74: /sys/fs/cgroup/cpuacct/bvbojti/tasks: No such file or directory
Absolute symbolic links in the http directory don't work because they look different when seen by nginx outside of the chroot.
See https://tools.ietf.org/html/rfc3875#section-4.1.5:
It identifies the resource or sub-resource to be returned by
the CGI script, and is derived from the portion of the URI path
hierarchy following the part that identifies the script itself
So basically, PATH_INFO
should be extra bit after SCRIPT_NAME
. It should not be, as it is now, SCRIPT_NAME
plus some more junk.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.