src-d / gitcollector Goto Github PK
View Code? Open in Web Editor NEWLicense: GNU General Public License v3.0
License: GNU General Public License v3.0
Right now only docker containers are built.
For src-d organization the github api returns 134 public repositories:
{
"login": "src-d",
"id": 15128793,
"node_id": "MDEyOk9yZ2FuaXphdGlvbjE1MTI4Nzkz",
"url": "https://api.github.com/orgs/src-d",
"repos_url": "https://api.github.com/orgs/src-d/repos",
"events_url": "https://api.github.com/orgs/src-d/events",
"hooks_url": "https://api.github.com/orgs/src-d/hooks",
"issues_url": "https://api.github.com/orgs/src-d/issues",
"members_url": "https://api.github.com/orgs/src-d/members{/member}",
"public_members_url": "https://api.github.com/orgs/src-d/public_members{/member}",
"avatar_url": "https://avatars2.githubusercontent.com/u/15128793?v=4",
"description": "",
"name": "source{d}",
"company": null,
"blog": "https://sourced.tech",
"location": "Remote first",
"email": "[email protected]",
"is_verified": false,
"has_organization_projects": true,
"has_repository_projects": false,
"public_repos": 134,
"public_gists": 0,
"followers": 0,
"following": 0,
"html_url": "https://github.com/src-d",
"created_at": "2015-10-14T17:13:24Z",
"updated_at": "2018-12-14T18:32:06Z",
"type": "Organization"
}
gitcollector only fetchs 133:
time="2019-06-18T17:59:52.113977357+01:00" level=info msg="metrics updated" discover=133 download=133 fail=0 metrics=library source="metrics/metrics.go:207" update=0
Likely the bug is in the GHProvider iterator.
In an organization with 169 repositories only 80 are downloaded + 27 errors.
Some orgs might have repos which are convenient to exclude- they might be for instance big datasets without code which can take a lot of time to process and make it more difficult to extract insight from the org (e.g. they might skew statistics).
Reported by @se7entyse7en
It seems that the problem is the library used to discover repositories:
Organization src-d
works but others like google
or jenkinsci
doesn't. Maybe using an old version solves the problem:
This function will get a location and a token and performs fetch to all its remotes. Only uses one transaction.
If there are no new changes do not perform the commit. Log error but do not take any action.
Hi,
Can we also obtain users' repos, not only organization by plain command?
Export to an external DB number of repos to download and downloaded
Sometimes during metrics tests from PR #73 empty metrics are written
https://travis-ci.com/src-d/gitcollector/jobs/235932304
--- FAIL: TestPostgres (5.66s)
--- FAIL: TestPostgres/testPostgresSendMetricsSuccess (0.22s)
require.go:157:
Error Trace: postgres_test.go:97
postgres_test.go:63
Error: Not equal:
expected: []integration.metric{integration.metric{org:"git-fixtures", discovered:8, downloaded:7, updated:0, failed:1}}
actual : []integration.metric{integration.metric{org:"git-fixtures", discovered:0, downloaded:0, updated:0, failed:0}}
Diff:
--- Expected
+++ Actual
@@ -3,6 +3,6 @@
org: (string) (len=12) "git-fixtures",
- discovered: (int) 8,
- downloaded: (int) 7,
+ discovered: (int) 0,
+ downloaded: (int) 0,
updated: (int) 0,
- failed: (int) 1
+ failed: (int) 0
}
I suspect gitcollector
in data loss
Cannot reproduce locally
Need more info
Update
OK, seems like the reason is the following
[2019-09-17T14:30:38.901550957+03:00] WARN discovery stopped: rate limit requests exceeded: GET https://api.github.com/orgs/git-fixtures/repos?per_page=100: 403 API rate limit exceeded for 212.7.22.138. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.) [rate reset in 60m00s] source=subcmd/download.go:226
This will be fixed in CI as soon as #74 and https://github.com/src-d/infrastructure/issues/1154 will be closed
I have a doubt about this behavior, because it may confuse users, maybe it would be better to add errors field to metrics schema so it would be more clear why all metrics are 0?
wdyt @mcarmonaa @jfontan ?
They point to the same file:
time="2019-06-13T16:53:16.698967058+02:00" level=error msg=failed error="index read failed: seek /home/jfontan/work/gitcollector/sivas-new/f2/f281ab6f2e0e38dcc3af05360667d8f530c00103.siva: invalid argument" id=41441ce5-149a-4df6-be0e-3ea5c187eac4 job=download source="downloader/download.go:61" url="https://github.com/src-d/core-retrieval"
time="2019-06-13T16:53:16.702707441+02:00" level=error msg=failed error="index read failed: seek /home/jfontan/work/gitcollector/sivas-new/f2/f281ab6f2e0e38dcc3af05360667d8f530c00103.siva: invalid argument" id=eb7040ac-3148-4a61-a86c-bdf811ab294e job=download source="downloader/download.go:61" url="https://github.com/src-d/notebooks"
time="2019-06-13T16:53:16.709913362+02:00" level=error msg=failed error="index read failed: seek /home/jfontan/work/gitcollector/sivas-new/f2/f281ab6f2e0e38dcc3af05360667d8f530c00103.siva: invalid argument" id=d1075a87-50fe-421e-9cda-b1c40ef5a14e job=download source="downloader/download.go:61" url="https://github.com/src-d/platform"
time="2019-06-13T16:53:16.724095577+02:00" level=error msg=failed error="index read failed: seek /home/jfontan/work/gitcollector/sivas-new/f2/f281ab6f2e0e38dcc3af05360667d8f530c00103.siva: invalid argument" id=7ef265a5-659a-4223-b790-d00e69f438a5 job=download source="downloader/download.go:61" url="https://github.com/src-d/guide"
Client also fails reading:
MySQL [(none)]> select * from repositories;
ERROR 1105 (HY000): unknown error: index read failed: seek /opt/repos/d2/d27a55c20fc832063a510bf70c90e2a7e977005a.siva: invalid argument
Improve testing generally, for example with the cases commented in #62 (comment)
Add log traces using https://github.com/src-d/go-log
Make use of the contexts in download/update functions for cancelation/timeouts
Currently, gitcollector has a flag --no-updates
which disables update for existing siva files but still downloads new repositories.
For srcd-ce we need to disable updates completely in case of initial import completed.
(in short, if there are siva files or something written in status table don't do anything).
You can see more information why do we need it in the issue: src-d/sourced-ce#27
Please let us know what you think. Is it possible?
Maybe a limit-cpu
which divides the number of workers between the given number.
Implement the command to run gitcollector
related to src-d/ghsync#54
When source{d} CE is inited with a local workdir, gitcollector
is started and then it fails trying to get org repos.
$ GITCOLLECTOR_LIBRARY=/tmp/gitcollector GITHUB_ORGANIZATIONS= gitcollector download
WARN GET https://api.github.com/orgs//repos?per_page=100: 404 Not Found []
INFO collection finished in 436.35442ms
imo, if no org is passed, it should exit 0
with a warning, e.g. no organizations to scrape
instead of trying to get repos from a no-org
Queries executed one after another (first gitbase):
Logs:
$ docker logs -f srcd-tuxvbknvzgu_gitcollector_1
time="2019-06-18T12:23:13.5011812Z" level=debug msg="temporal dir: /tmp/gitcollector-downloader309492289" source="subcmd/download.go:59"
time="2019-06-18T12:23:13.5014106Z" level=debug msg="acces token found" source="subcmd/download.go:70"
time="2019-06-18T12:23:13.5017067Z" level=debug msg="allow updates on downloads: true" source="subcmd/download.go:80"
time="2019-06-18T12:23:13.5280681Z" level=debug msg="number of workers in the pool 2" source="subcmd/download.go:114"
time="2019-06-18T12:23:13.5283908Z" level=debug msg="worker pool is running" source="subcmd/download.go:117"
time="2019-06-18T12:23:13.5287481Z" level=debug msg="github provider started" source="subcmd/download.go:127"
time="2019-06-18T12:23:14.5927277Z" level=info msg=started id=0b17df5c-e47a-4c54-9ca6-33ece8fa658b job=download source="downloader/download.go:80" url="https://github.com/MLonCode/memlayout"
time="2019-06-18T12:23:14.6009875Z" level=info msg=started id=8772b554-d4f9-446e-8463-b3ffdf55092c job=download source="downloader/download.go:80" url="https://github.com/MLonCode/sonic"
time="2019-06-18T12:23:17.1232431Z" level=debug msg=cloned elapsed=2.5220538s id=8772b554-d4f9-446e-8463-b3ffdf55092c job=download source="downloader/download.go:121" url="https://github.com/MLonCode/sonic"
time="2019-06-18T12:23:17.1249695Z" level=debug msg="head commit found" head=76ed928082383174bd469ece51352dd17ac98a00 id=8772b554-d4f9-446e-8463-b3ffdf55092c job=download source="downloader/download.go:136" url="https://github.com/MLonCode/sonic"
time="2019-06-18T12:23:17.1265827Z" level=debug msg="root commit found" elapsed=1.5009ms id=8772b554-d4f9-446e-8463-b3ffdf55092c job=download root=15e56329858f20e662085ab51b9e3872e9589b03 source="downloader/download.go:148" url="https://github.com/MLonCode/sonic"
time="2019-06-18T12:23:17.1509094Z" level=debug msg=copied elapsed=16.4413ms id=8772b554-d4f9-446e-8463-b3ffdf55092c job=download source="downloader/download.go:183" url="https://github.com/MLonCode/sonic"
time="2019-06-18T12:23:17.6263435Z" level=debug msg=cloned elapsed=3.0334211s id=0b17df5c-e47a-4c54-9ca6-33ece8fa658b job=download source="downloader/download.go:121" url="https://github.com/MLonCode/memlayout"
time="2019-06-18T12:23:17.6277775Z" level=debug msg="head commit found" head=8f85636900147b9550e64beb1e05a7e532cbd67c id=0b17df5c-e47a-4c54-9ca6-33ece8fa658b job=download source="downloader/download.go:136" url="https://github.com/MLonCode/memlayout"
time="2019-06-18T12:23:17.6307986Z" level=debug msg="root commit found" elapsed=2.8949ms id=0b17df5c-e47a-4c54-9ca6-33ece8fa658b job=download root=a9796d6f1577e59f55eba5ce1df8f769f1fdc9ff source="downloader/download.go:148" url="https://github.com/MLonCode/memlayout"
time="2019-06-18T12:23:17.6480528Z" level=debug msg=copied elapsed=16.5896ms id=0b17df5c-e47a-4c54-9ca6-33ece8fa658b job=download source="downloader/download.go:183" url="https://github.com/MLonCode/memlayout"
time="2019-06-18T12:23:18.7432751Z" level=debug msg=fetched elapsed=1.093811s id=0b17df5c-e47a-4c54-9ca6-33ece8fa658b job=download source="downloader/download.go:217" url="https://github.com/MLonCode/memlayout"
time="2019-06-18T12:23:18.7507575Z" level=debug msg=commited elapsed=7.173ms id=0b17df5c-e47a-4c54-9ca6-33ece8fa658b job=download source="downloader/download.go:225" url="https://github.com/MLonCode/memlayout"
time="2019-06-18T12:23:18.7542478Z" level=info msg=finished elapsed=4.1613843s id=0b17df5c-e47a-4c54-9ca6-33ece8fa658b job=download source="downloader/download.go:95" url="https://github.com/MLonCode/memlayout"
time="2019-06-18T12:23:18.7555873Z" level=info msg=started id=bdb59e73-d630-4bfc-8557-086b5f164408 job=download source="downloader/download.go:80" url="https://github.com/MLonCode/evop"
time="2019-06-18T12:23:19.0736851Z" level=debug msg=fetched elapsed=1.9188181s id=8772b554-d4f9-446e-8463-b3ffdf55092c job=download source="downloader/download.go:217" url="https://github.com/MLonCode/sonic"
time="2019-06-18T12:23:19.0802461Z" level=debug msg=commited elapsed=6.1763ms id=8772b554-d4f9-446e-8463-b3ffdf55092c job=download source="downloader/download.go:225" url="https://github.com/MLonCode/sonic"
time="2019-06-18T12:23:19.092771Z" level=info msg=finished elapsed=4.4916263s id=8772b554-d4f9-446e-8463-b3ffdf55092c job=download source="downloader/download.go:95" url="https://github.com/MLonCode/sonic"
time="2019-06-18T12:23:19.0935319Z" level=info msg=started id=4739b6a6-e2da-4f01-9fab-19e0da252c45 job=download source="downloader/download.go:80" url="https://github.com/MLonCode/code2seq"
time="2019-06-18T12:23:19.6686057Z" level=debug msg=cloned elapsed=906.594ms id=bdb59e73-d630-4bfc-8557-086b5f164408 job=download source="downloader/download.go:121" url="https://github.com/MLonCode/evop"
time="2019-06-18T12:23:19.6696591Z" level=debug msg="head commit found" head=95dd16974e21c2076a583748107719a1830b8885 id=bdb59e73-d630-4bfc-8557-086b5f164408 job=download source="downloader/download.go:136" url="https://github.com/MLonCode/evop"
time="2019-06-18T12:23:19.6704667Z" level=debug msg="root commit found" elapsed="510.3µs" id=bdb59e73-d630-4bfc-8557-086b5f164408 job=download root=3855424c699a2262fea3d2aa8549cbfd57ea4da2 source="downloader/download.go:148" url="https://github.com/MLonCode/evop"
time="2019-06-18T12:23:19.6723609Z" level=debug msg=copied elapsed=1.2275ms id=bdb59e73-d630-4bfc-8557-086b5f164408 job=download source="downloader/download.go:183" url="https://github.com/MLonCode/evop"
time="2019-06-18T12:23:20.1712836Z" level=debug msg=fetched elapsed=498.4842ms id=bdb59e73-d630-4bfc-8557-086b5f164408 job=download source="downloader/download.go:217" url="https://github.com/MLonCode/evop"
time="2019-06-18T12:23:20.175682Z" level=debug msg=commited elapsed=3.1023ms id=bdb59e73-d630-4bfc-8557-086b5f164408 job=download source="downloader/download.go:225" url="https://github.com/MLonCode/evop"
time="2019-06-18T12:23:20.1781269Z" level=info msg=finished elapsed=1.422356s id=bdb59e73-d630-4bfc-8557-086b5f164408 job=download source="downloader/download.go:95" url="https://github.com/MLonCode/evop"
time="2019-06-18T12:23:20.1804704Z" level=info msg=started id=1a194f8e-b912-4329-aeb3-353175172ec2 job=download source="downloader/download.go:80" url="https://github.com/MLonCode/ncc"
time="2019-06-18T12:23:21.1734223Z" level=debug msg=cloned elapsed=2.0794445s id=4739b6a6-e2da-4f01-9fab-19e0da252c45 job=download source="downloader/download.go:121" url="https://github.com/MLonCode/code2seq"
time="2019-06-18T12:23:21.1739694Z" level=debug msg="head commit found" head=f6fb85ea05ad96535cc6cc087f133830ab6b26c1 id=4739b6a6-e2da-4f01-9fab-19e0da252c45 job=download source="downloader/download.go:136" url="https://github.com/MLonCode/code2seq"
time="2019-06-18T12:23:21.1745408Z" level=debug msg="root commit found" elapsed="338.9µs" id=4739b6a6-e2da-4f01-9fab-19e0da252c45 job=download root=f22be3e739b784c0daa27d3d4aac022a0459a14e source="downloader/download.go:148" url="https://github.com/MLonCode/code2seq"
time="2019-06-18T12:23:21.1799215Z" level=debug msg=copied elapsed=4.9723ms id=4739b6a6-e2da-4f01-9fab-19e0da252c45 job=download source="downloader/download.go:183" url="https://github.com/MLonCode/code2seq"
time="2019-06-18T12:23:23.1977548Z" level=debug msg=cloned elapsed=3.0166444s id=1a194f8e-b912-4329-aeb3-353175172ec2 job=download source="downloader/download.go:121" url="https://github.com/MLonCode/ncc"
time="2019-06-18T12:23:23.1982726Z" level=debug msg="head commit found" head=4e9eaeb6e7d3687cc73e73ae0fdf513f4e429e83 id=1a194f8e-b912-4329-aeb3-353175172ec2 job=download source="downloader/download.go:136" url="https://github.com/MLonCode/ncc"
time="2019-06-18T12:23:23.199415Z" level=debug msg="root commit found" elapsed=1.0163ms id=1a194f8e-b912-4329-aeb3-353175172ec2 job=download root=fc688735a2a79acfb3365f4423878c8fd9536c30 source="downloader/download.go:148" url="https://github.com/MLonCode/ncc"
time="2019-06-18T12:23:23.2109243Z" level=debug msg=copied elapsed=11.0515ms id=1a194f8e-b912-4329-aeb3-353175172ec2 job=download source="downloader/download.go:183" url="https://github.com/MLonCode/ncc"
time="2019-06-18T12:23:23.6000845Z" level=debug msg=fetched elapsed=388.3955ms id=1a194f8e-b912-4329-aeb3-353175172ec2 job=download source="downloader/download.go:217" url="https://github.com/MLonCode/ncc"
time="2019-06-18T12:23:23.6152337Z" level=debug msg=commited elapsed=14.7819ms id=1a194f8e-b912-4329-aeb3-353175172ec2 job=download source="downloader/download.go:225" url="https://github.com/MLonCode/ncc"
time="2019-06-18T12:23:23.6190663Z" level=info msg=finished elapsed=3.437998s id=1a194f8e-b912-4329-aeb3-353175172ec2 job=download source="downloader/download.go:95" url="https://github.com/MLonCode/ncc"
time="2019-06-18T12:23:23.6199417Z" level=info msg=started id=5faf236b-7043-4b59-9eb5-5cbed55474bb job=download source="downloader/download.go:80" url="https://github.com/MLonCode/ggnn_graph_classification"
time="2019-06-18T12:23:23.7704598Z" level=debug msg=fetched elapsed=2.5900296s id=4739b6a6-e2da-4f01-9fab-19e0da252c45 job=download source="downloader/download.go:217" url="https://github.com/MLonCode/code2seq"
time="2019-06-18T12:23:23.7784882Z" level=debug msg=commited elapsed=7.8475ms id=4739b6a6-e2da-4f01-9fab-19e0da252c45 job=download source="downloader/download.go:225" url="https://github.com/MLonCode/code2seq"
time="2019-06-18T12:23:23.780564Z" level=info msg=finished elapsed=4.6866314s id=4739b6a6-e2da-4f01-9fab-19e0da252c45 job=download source="downloader/download.go:95" url="https://github.com/MLonCode/code2seq"
time="2019-06-18T12:23:23.7816651Z" level=info msg=started id=82d7765e-6ad8-4115-86e5-db6a81610158 job=download source="downloader/download.go:80" url="https://github.com/MLonCode/astminer"
time="2019-06-18T12:23:25.5383275Z" level=debug msg=cloned elapsed=1.7562361s id=82d7765e-6ad8-4115-86e5-db6a81610158 job=download source="downloader/download.go:121" url="https://github.com/MLonCode/astminer"
time="2019-06-18T12:23:25.5398515Z" level=debug msg="head commit found" head=d303f59581a4b42e39938677441d4a75ad05313e id=82d7765e-6ad8-4115-86e5-db6a81610158 job=download source="downloader/download.go:136" url="https://github.com/MLonCode/astminer"
time="2019-06-18T12:23:25.542616Z" level=debug msg="root commit found" elapsed=2.6189ms id=82d7765e-6ad8-4115-86e5-db6a81610158 job=download root=cb50054904c2f63654d65bf1d1f472efdb7ed5cf source="downloader/download.go:148" url="https://github.com/MLonCode/astminer"
time="2019-06-18T12:23:25.5446253Z" level=debug msg=copied elapsed=1.4631ms id=82d7765e-6ad8-4115-86e5-db6a81610158 job=download source="downloader/download.go:183" url="https://github.com/MLonCode/astminer"
time="2019-06-18T12:23:26.6332225Z" level=debug msg=fetched elapsed=1.0879254s id=82d7765e-6ad8-4115-86e5-db6a81610158 job=download source="downloader/download.go:217" url="https://github.com/MLonCode/astminer"
time="2019-06-18T12:23:26.6345237Z" level=debug msg=commited elapsed=1.1933ms id=82d7765e-6ad8-4115-86e5-db6a81610158 job=download source="downloader/download.go:225" url="https://github.com/MLonCode/astminer"
time="2019-06-18T12:23:26.6354998Z" level=info msg=finished elapsed=2.8535362s id=82d7765e-6ad8-4115-86e5-db6a81610158 job=download source="downloader/download.go:95" url="https://github.com/MLonCode/astminer"
time="2019-06-18T12:23:26.6365526Z" level=info msg=started id=bf3cfe8b-2597-42f8-8f64-b8dc2c7146d9 job=download source="downloader/download.go:80" url="https://github.com/MLonCode/structured-neural-summarization"
time="2019-06-18T12:23:27.7212567Z" level=debug msg=cloned elapsed=1.1199898s id=bf3cfe8b-2597-42f8-8f64-b8dc2c7146d9 job=download source="downloader/download.go:121" url="https://github.com/MLonCode/structured-neural-summarization"
time="2019-06-18T12:23:27.721759Z" level=debug msg="head commit found" head=9b0535ce760ad7af293b0de5b82c4070d900b915 id=bf3cfe8b-2597-42f8-8f64-b8dc2c7146d9 job=download source="downloader/download.go:136" url="https://github.com/MLonCode/structured-neural-summarization"
time="2019-06-18T12:23:27.7220215Z" level=debug msg="root commit found" elapsed="8.9µs" id=bf3cfe8b-2597-42f8-8f64-b8dc2c7146d9 job=download root=9b0535ce760ad7af293b0de5b82c4070d900b915 source="downloader/download.go:148" url="https://github.com/MLonCode/structured-neural-summarization"
time="2019-06-18T12:23:27.7401005Z" level=debug msg=copied elapsed=1.2315ms id=bf3cfe8b-2597-42f8-8f64-b8dc2c7146d9 job=download source="downloader/download.go:183" url="https://github.com/MLonCode/structured-neural-summarization"
time="2019-06-18T12:23:28.7461138Z" level=debug msg=fetched elapsed=1.005425s id=bf3cfe8b-2597-42f8-8f64-b8dc2c7146d9 job=download source="downloader/download.go:217" url="https://github.com/MLonCode/structured-neural-summarization"
time="2019-06-18T12:23:28.7500071Z" level=debug msg=commited elapsed=2.8174ms id=bf3cfe8b-2597-42f8-8f64-b8dc2c7146d9 job=download source="downloader/download.go:225" url="https://github.com/MLonCode/structured-neural-summarization"
time="2019-06-18T12:23:28.7547999Z" level=info msg=finished elapsed=2.153722s id=bf3cfe8b-2597-42f8-8f64-b8dc2c7146d9 job=download source="downloader/download.go:95" url="https://github.com/MLonCode/structured-neural-summarization"
time="2019-06-18T12:23:33.7560767Z" level=debug msg="waiting new metrics" metrics=library source="metrics/metrics.go:138"
... many repeated "waiting new metrics" ...
These repos don't share a common initial commit:
time="2019-06-13T16:53:16.698967058+02:00" level=error msg=failed error="index read failed: seek /home/jfontan/work/gitcollector/sivas-new/f2/f281ab6f2e0e38dcc3af05360667d8f530c00103.siva: invalid argument" id=41441ce5-149a-4df6-be0e-3ea5c187eac4 job=download source="downloader/download.go:61" url="https://github.com/src-d/core-retrieval"
time="2019-06-13T16:53:16.702707441+02:00" level=error msg=failed error="index read failed: seek /home/jfontan/work/gitcollector/sivas-new/f2/f281ab6f2e0e38dcc3af05360667d8f530c00103.siva: invalid argument" id=eb7040ac-3148-4a61-a86c-bdf811ab294e job=download source="downloader/download.go:61" url="https://github.com/src-d/notebooks"
time="2019-06-13T16:53:16.709913362+02:00" level=error msg=failed error="index read failed: seek /home/jfontan/work/gitcollector/sivas-new/f2/f281ab6f2e0e38dcc3af05360667d8f530c00103.siva: invalid argument" id=d1075a87-50fe-421e-9cda-b1c40ef5a14e job=download source="downloader/download.go:61" url="https://github.com/src-d/platform"
time="2019-06-13T16:53:16.724095577+02:00" level=error msg=failed error="index read failed: seek /home/jfontan/work/gitcollector/sivas-new/f2/f281ab6f2e0e38dcc3af05360667d8f530c00103.siva: invalid argument" id=7ef265a5-659a-4223-b790-d00e69f438a5 job=download source="downloader/download.go:61" url="https://github.com/src-d/guide"
caused by https://github.com/src-d/backlog/issues/1442
Since source{d} CE expects a homogeneous way to import repos and metadata between ghsync
and gitcollector
, and status is something needed to let the user know about the progress of the process, gitcollector
should fail if the progress status could not be updated.
If I'm not wrong, if gitcollector
cannot persist the metric, it just logs a warn and it continues the importation metrics/metrics.go#L170
`.
Is support for GitLab self-hosted and public something that would be useful for this project?
I've used this before which worked well: https://github.com/xanzy/go-gitlab
We can make GHProvider
accepts different iterators as with orgReposIter
so it would be easier to build new Providers
.
Add --no-forks
option to skip forked repos.
See: src-d/sourced-ce#109
summary one of repositories downloads fails with timeout error
environment
Distributor ID: Ubuntu
Description: Ubuntu 18.04.3 LTS
Release: 18.04
Codename: bionic
GOMAXPROCS is 8
8 CPU 16G RAM
steps to reproduce
just run this command
./gitcollector download --library=/home/lwsanty/gitcollector/ --orgs=src-d,bblfsh
expected all corresponding repositories downloaded
observed one of downloads fails with timeout
[2019-09-10T16:54:41.685787348+03:00] ERROR failed error=timeout exceeded: unable to retrieve repository from location cc761bed408a1d6a20457d48d8bf807d3531db4f in transactional mode.
github.com/src-d/go-borges/siva.(*transactioner).Start
/home/lwsanty/goproj/gopath/pkg/mod/github.com/src-d/go-borges@v0.0.0-20190704083038-44867e8f2a2a/siva/transactioner.go:45
github.com/src-d/go-borges/siva.(*Location).repository
/home/lwsanty/goproj/gopath/pkg/mod/github.com/src-d/go-borges@v0.0.0-20190704083038-44867e8f2a2a/siva/location.go:528
github.com/src-d/go-borges/siva.(*Location).Init
/home/lwsanty/goproj/gopath/pkg/mod/github.com/src-d/go-borges@v0.0.0-20190704083038-44867e8f2a2a/siva/location.go:242
github.com/src-d/gitcollector/downloader.PrepareRepository
/home/lwsanty/goproj/lwsanty/gitcollector/downloader/git.go:199
github.com/src-d/gitcollector/downloader.downloadRepository
/home/lwsanty/goproj/lwsanty/gitcollector/downloader/download.go:171
github.com/src-d/gitcollector/downloader.Download
/home/lwsanty/goproj/lwsanty/gitcollector/downloader/download.go:78
github.com/src-d/gitcollector/library.(*Job).Process
/home/lwsanty/goproj/lwsanty/gitcollector/library/job.go:57
github.com/src-d/gitcollector.(*worker).consumeJob.func1
/home/lwsanty/goproj/lwsanty/gitcollector/worker.go:61
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1337 id=72f28f0b-50f1-40e1-8457-e27d138525e4 job=download url=https://github.com/src-d/go-mysql-server
note issue was reproduced only once and cannot be reproduced after 3 attempts
question it looks like the problem is in transactions timeout of borges
, we could avoid this problem if we propagate transaction timeout from gitcollector
to borges
by adding new flag to gitcollector
Create downloader
package with functionality extracted from:
https://github.com/jfontan/borges/tree/new_borges
Add error control and logging. Part of the logic is in the main package.
time="2019-06-24T17:18:51.153575758Z" level=debug msg="temporal dir: /tmp/gitcollector-downloader097521558" source="subcmd/download.go:70"
time="2019-06-24T17:18:51.15442459Z" level=debug msg="acces token found" source="subcmd/download.go:82"
time="2019-06-24T17:18:51.154750993Z" level=debug msg="allow updates on downloads: true" source="subcmd/download.go:98"
time="2019-06-24T17:18:51.178458962Z" level=debug msg="metrics collection activated: sync timeout 30" source="subcmd/download.go:126"
time="2019-06-24T17:18:51.178528544Z" level=debug msg="number of workers in the pool 2" source="subcmd/download.go:132"
time="2019-06-24T17:18:51.1787704Z" level=debug msg="worker pool is running" source="subcmd/download.go:135"
time="2019-06-24T17:18:51.179911077Z" level=debug msg="opennebula organization provider started" source="subcmd/download.go:208"
time="2019-06-24T17:18:52.5755099Z" level=info msg=started id=b0443e10-2dc8-4a94-a014-6cf9c3abb22b job=download source="downloader/download.go:80" url="https://github.com/OpenNebula/addon-iscsi"
time="2019-06-24T17:18:52.578585097Z" level=info msg=started id=0067d43d-1e97-4f09-b326-b65066969cef job=download source="downloader/download.go:80" url="https://github.com/OpenNebula/one"
time="2019-06-24T17:18:53.663989463Z" level=debug msg="opennebula organization provider stopped" source="subcmd/download.go:204"
time="2019-06-24T17:18:54.05826056Z" level=debug msg=cloned elapsed=1.482295251s id=b0443e10-2dc8-4a94-a014-6cf9c3abb22b job=download source="downloader/download.go:153" url="https://github.com/OpenNebula/addon-iscsi"
[...]
time="2019-06-24T17:20:28.19196274Z" level=debug msg="root commit found" elapsed=775ns id=9322eda0-1938-4e24-a0a2-28025aa9d905 job=download root=2569c757c1eb9dc17414399419097caf63770dfb source="downloader/download.go:180" url="https://github.com/OpenNebula/addon-terraform"
time="2019-06-24T17:20:28.197779036Z" level=debug msg=copied elapsed=5.381465ms id=9322eda0-1938-4e24-a0a2-28025aa9d905 job=download source="downloader/download.go:215" url="https://github.com/OpenNebula/addon-terraform"
time="2019-06-24T17:20:28.547910983Z" level=debug msg=fetched elapsed=349.103447ms id=9322eda0-1938-4e24-a0a2-28025aa9d905 job=download source="downloader/download.go:249" url="https://github.com/OpenNebula/addon-terraform"
time="2019-06-24T17:20:28.551868571Z" level=debug msg=commited elapsed=3.746051ms id=9322eda0-1938-4e24-a0a2-28025aa9d905 job=download source="downloader/download.go:257" url="https://github.com/OpenNebula/addon-terraform"
time="2019-06-24T17:20:28.554792675Z" level=info msg=finished elapsed=1.322057606s id=9322eda0-1938-4e24-a0a2-28025aa9d905 job=download source="downloader/download.go:96" url="https://github.com/OpenNebula/addon-terraform"
time="2019-06-24T17:20:29.281474141Z" level=debug msg=fetched elapsed=1.434356058s id=281adc17-2260-4cde-b7b5-e13b8fe73046 job=download source="downloader/download.go:249" url="https://github.com/OpenNebula/addon-linstor_un"
time="2019-06-24T17:20:29.282272046Z" level=debug msg=commited elapsed="755.43µs" id=281adc17-2260-4cde-b7b5-e13b8fe73046 job=download source="downloader/download.go:257" url="https://github.com/OpenNebula/addon-linstor_un"
time="2019-06-24T17:20:29.2827552Z" level=info msg=finished elapsed=2.610350605s id=281adc17-2260-4cde-b7b5-e13b8fe73046 job=download source="downloader/download.go:96" url="https://github.com/OpenNebula/addon-linstor_un"
time="2019-06-24T17:20:29.282858124Z" level=info msg="metrics updated" discover=0 download=0 fail=0 metrics=library org=opennebula source="metrics/metrics.go:204" update=0
time="2019-06-24T17:20:29.282886831Z" level=debug msg="worker pool stopped successfully" source="subcmd/download.go:140"
time="2019-06-24T17:20:29.282920121Z" level=info msg="collection finished in 1m38.131065017s" source="subcmd/download.go:143"
--- SKIP: TestGitHubSkipForks (0.00s)
github_test.go:94: test running on travis CI but couldn't find GITHUB_TOKEN
PASS
Use a real postgresql to test.
It will get an organization name and a token. It should work similar to a Rovers provider and support rate limiting and provide an iterator interface. It should return, at least, the https
endpoint.
It should be contained in discovery
package for later reuse.
See #17 (comment)
Hey, congrats on the first stable release!
I have a few question that may be documented somewhere and I just did not find it - sorry in advance! What is the difference between this tool and a borges CLI tool? Does both support the same use-cases? When one should be using one of another?
Thanks in advance!
It should check jobs in two queues, download and update:
Download queue take precedence over update queue. This can be implemented with a channel for each queue and selecting over the download queue first to check if there's any job waiting.
Also create the "cron" scheduler for filling the update queue from time to time. The time between updates should be configurable.
Note: It may be interesting to only add locations to the update queue that are not already there or only add new locations if it's empty but only as a bonus for now. Probably not needed for the first version.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.