Git Product home page Git Product logo

gitcollector's People

Contributors

alexpdp7 avatar dpordomingo avatar erizocosmico avatar jfontan avatar lwsanty avatar mcarmonaa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gitcollector's Issues

Discovered gh repositories number doesn't look correct

For src-d organization the github api returns 134 public repositories:

{
"login": "src-d",
"id": 15128793,
"node_id": "MDEyOk9yZ2FuaXphdGlvbjE1MTI4Nzkz",
"url": "https://api.github.com/orgs/src-d",
"repos_url": "https://api.github.com/orgs/src-d/repos",
"events_url": "https://api.github.com/orgs/src-d/events",
"hooks_url": "https://api.github.com/orgs/src-d/hooks",
"issues_url": "https://api.github.com/orgs/src-d/issues",
"members_url": "https://api.github.com/orgs/src-d/members{/member}",
"public_members_url": "https://api.github.com/orgs/src-d/public_members{/member}",
"avatar_url": "https://avatars2.githubusercontent.com/u/15128793?v=4",
"description": "",
"name": "source{d}",
"company": null,
"blog": "https://sourced.tech",
"location": "Remote first",
"email": "[email protected]",
"is_verified": false,
"has_organization_projects": true,
"has_repository_projects": false,
"public_repos": 134,
"public_gists": 0,
"followers": 0,
"following": 0,
"html_url": "https://github.com/src-d",
"created_at": "2015-10-14T17:13:24Z",
"updated_at": "2018-12-14T18:32:06Z",
"type": "Organization"
}

gitcollector only fetchs 133:

time="2019-06-18T17:59:52.113977357+01:00" level=info msg="metrics updated" discover=133 download=133 fail=0 metrics=library source="metrics/metrics.go:207" update=0

Likely the bug is in the GHProvider iterator.

Allow excluding repos

Some orgs might have repos which are convenient to exclude- they might be for instance big datasets without code which can take a lot of time to process and make it more difficult to extract insight from the org (e.g. they might skew statistics).

Add location update function to downloader package

This function will get a location and a token and performs fetch to all its remotes. Only uses one transaction.

If there are no new changes do not perform the commit. Log error but do not take any action.

Export metrics

Export to an external DB number of repos to download and downloaded

Empty metrics on metrics tests

Sometimes during metrics tests from PR #73 empty metrics are written
https://travis-ci.com/src-d/gitcollector/jobs/235932304

--- FAIL: TestPostgres (5.66s)
    --- FAIL: TestPostgres/testPostgresSendMetricsSuccess (0.22s)
        require.go:157: 
                Error Trace:    postgres_test.go:97
                                                        postgres_test.go:63
                Error:          Not equal: 
                                expected: []integration.metric{integration.metric{org:"git-fixtures", discovered:8, downloaded:7, updated:0, failed:1}}
                                actual  : []integration.metric{integration.metric{org:"git-fixtures", discovered:0, downloaded:0, updated:0, failed:0}}
                                
                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -3,6 +3,6 @@
                                   org: (string) (len=12) "git-fixtures",
                                -  discovered: (int) 8,
                                -  downloaded: (int) 7,
                                +  discovered: (int) 0,
                                +  downloaded: (int) 0,
                                   updated: (int) 0,
                                -  failed: (int) 1
                                +  failed: (int) 0
                                  }

I suspect gitcollector in data loss
Cannot reproduce locally

Need more info

Update
OK, seems like the reason is the following

[2019-09-17T14:30:38.901550957+03:00]  WARN discovery stopped: rate limit requests exceeded: GET https://api.github.com/orgs/git-fixtures/repos?per_page=100: 403 API rate limit exceeded for 212.7.22.138. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.) [rate reset in 60m00s] source=subcmd/download.go:226

This will be fixed in CI as soon as #74 and https://github.com/src-d/infrastructure/issues/1154 will be closed

I have a doubt about this behavior, because it may confuse users, maybe it would be better to add errors field to metrics schema so it would be more clear why all metrics are 0?

wdyt @mcarmonaa @jfontan ?

Errors reading/writing while downloading

They point to the same file:

time="2019-06-13T16:53:16.698967058+02:00" level=error msg=failed error="index read failed: seek /home/jfontan/work/gitcollector/sivas-new/f2/f281ab6f2e0e38dcc3af05360667d8f530c00103.siva: invalid argument" id=41441ce5-149a-4df6-be0e-3ea5c187eac4 job=download source="downloader/download.go:61" url="https://github.com/src-d/core-retrieval"
time="2019-06-13T16:53:16.702707441+02:00" level=error msg=failed error="index read failed: seek /home/jfontan/work/gitcollector/sivas-new/f2/f281ab6f2e0e38dcc3af05360667d8f530c00103.siva: invalid argument" id=eb7040ac-3148-4a61-a86c-bdf811ab294e job=download source="downloader/download.go:61" url="https://github.com/src-d/notebooks"
time="2019-06-13T16:53:16.709913362+02:00" level=error msg=failed error="index read failed: seek /home/jfontan/work/gitcollector/sivas-new/f2/f281ab6f2e0e38dcc3af05360667d8f530c00103.siva: invalid argument" id=d1075a87-50fe-421e-9cda-b1c40ef5a14e job=download source="downloader/download.go:61" url="https://github.com/src-d/platform"
time="2019-06-13T16:53:16.724095577+02:00" level=error msg=failed error="index read failed: seek /home/jfontan/work/gitcollector/sivas-new/f2/f281ab6f2e0e38dcc3af05360667d8f530c00103.siva: invalid argument" id=7ef265a5-659a-4223-b790-d00e69f438a5 job=download source="downloader/download.go:61" url="https://github.com/src-d/guide"

Client also fails reading:

MySQL [(none)]> select * from repositories;
ERROR 1105 (HY000): unknown error: index read failed: seek /opt/repos/d2/d27a55c20fc832063a510bf70c90e2a7e977005a.siva: invalid argument

Allow to disable update completely

Currently, gitcollector has a flag --no-updates which disables update for existing siva files but still downloads new repositories.
For srcd-ce we need to disable updates completely in case of initial import completed.
(in short, if there are siva files or something written in status table don't do anything).

You can see more information why do we need it in the issue: src-d/sourced-ce#27

Please let us know what you think. Is it possible?

gitcollector should not start when no org is passed

related to src-d/ghsync#54

When source{d} CE is inited with a local workdir, gitcollector is started and then it fails trying to get org repos.

$ GITCOLLECTOR_LIBRARY=/tmp/gitcollector GITHUB_ORGANIZATIONS= gitcollector download
WARN GET https://api.github.com/orgs//repos?per_page=100: 404 Not Found []
INFO collection finished in 436.35442ms

imo, if no org is passed, it should exit 0 with a warning, e.g. no organizations to scrape instead of trying to get repos from a no-org

Progress export works incorrectly

Queries executed one after another (first gitbase):

Screenshot 2019-06-18 at 14 27 18

Screenshot 2019-06-18 at 14 27 27

Logs:

$ docker logs -f srcd-tuxvbknvzgu_gitcollector_1
time="2019-06-18T12:23:13.5011812Z" level=debug msg="temporal dir: /tmp/gitcollector-downloader309492289" source="subcmd/download.go:59"
time="2019-06-18T12:23:13.5014106Z" level=debug msg="acces token found" source="subcmd/download.go:70"
time="2019-06-18T12:23:13.5017067Z" level=debug msg="allow updates on downloads: true" source="subcmd/download.go:80"
time="2019-06-18T12:23:13.5280681Z" level=debug msg="number of workers in the pool 2" source="subcmd/download.go:114"
time="2019-06-18T12:23:13.5283908Z" level=debug msg="worker pool is running" source="subcmd/download.go:117"
time="2019-06-18T12:23:13.5287481Z" level=debug msg="github provider started" source="subcmd/download.go:127"
time="2019-06-18T12:23:14.5927277Z" level=info msg=started id=0b17df5c-e47a-4c54-9ca6-33ece8fa658b job=download source="downloader/download.go:80" url="https://github.com/MLonCode/memlayout"
time="2019-06-18T12:23:14.6009875Z" level=info msg=started id=8772b554-d4f9-446e-8463-b3ffdf55092c job=download source="downloader/download.go:80" url="https://github.com/MLonCode/sonic"
time="2019-06-18T12:23:17.1232431Z" level=debug msg=cloned elapsed=2.5220538s id=8772b554-d4f9-446e-8463-b3ffdf55092c job=download source="downloader/download.go:121" url="https://github.com/MLonCode/sonic"
time="2019-06-18T12:23:17.1249695Z" level=debug msg="head commit found" head=76ed928082383174bd469ece51352dd17ac98a00 id=8772b554-d4f9-446e-8463-b3ffdf55092c job=download source="downloader/download.go:136" url="https://github.com/MLonCode/sonic"
time="2019-06-18T12:23:17.1265827Z" level=debug msg="root commit found" elapsed=1.5009ms id=8772b554-d4f9-446e-8463-b3ffdf55092c job=download root=15e56329858f20e662085ab51b9e3872e9589b03 source="downloader/download.go:148" url="https://github.com/MLonCode/sonic"
time="2019-06-18T12:23:17.1509094Z" level=debug msg=copied elapsed=16.4413ms id=8772b554-d4f9-446e-8463-b3ffdf55092c job=download source="downloader/download.go:183" url="https://github.com/MLonCode/sonic"
time="2019-06-18T12:23:17.6263435Z" level=debug msg=cloned elapsed=3.0334211s id=0b17df5c-e47a-4c54-9ca6-33ece8fa658b job=download source="downloader/download.go:121" url="https://github.com/MLonCode/memlayout"
time="2019-06-18T12:23:17.6277775Z" level=debug msg="head commit found" head=8f85636900147b9550e64beb1e05a7e532cbd67c id=0b17df5c-e47a-4c54-9ca6-33ece8fa658b job=download source="downloader/download.go:136" url="https://github.com/MLonCode/memlayout"
time="2019-06-18T12:23:17.6307986Z" level=debug msg="root commit found" elapsed=2.8949ms id=0b17df5c-e47a-4c54-9ca6-33ece8fa658b job=download root=a9796d6f1577e59f55eba5ce1df8f769f1fdc9ff source="downloader/download.go:148" url="https://github.com/MLonCode/memlayout"
time="2019-06-18T12:23:17.6480528Z" level=debug msg=copied elapsed=16.5896ms id=0b17df5c-e47a-4c54-9ca6-33ece8fa658b job=download source="downloader/download.go:183" url="https://github.com/MLonCode/memlayout"
time="2019-06-18T12:23:18.7432751Z" level=debug msg=fetched elapsed=1.093811s id=0b17df5c-e47a-4c54-9ca6-33ece8fa658b job=download source="downloader/download.go:217" url="https://github.com/MLonCode/memlayout"
time="2019-06-18T12:23:18.7507575Z" level=debug msg=commited elapsed=7.173ms id=0b17df5c-e47a-4c54-9ca6-33ece8fa658b job=download source="downloader/download.go:225" url="https://github.com/MLonCode/memlayout"
time="2019-06-18T12:23:18.7542478Z" level=info msg=finished elapsed=4.1613843s id=0b17df5c-e47a-4c54-9ca6-33ece8fa658b job=download source="downloader/download.go:95" url="https://github.com/MLonCode/memlayout"
time="2019-06-18T12:23:18.7555873Z" level=info msg=started id=bdb59e73-d630-4bfc-8557-086b5f164408 job=download source="downloader/download.go:80" url="https://github.com/MLonCode/evop"
time="2019-06-18T12:23:19.0736851Z" level=debug msg=fetched elapsed=1.9188181s id=8772b554-d4f9-446e-8463-b3ffdf55092c job=download source="downloader/download.go:217" url="https://github.com/MLonCode/sonic"
time="2019-06-18T12:23:19.0802461Z" level=debug msg=commited elapsed=6.1763ms id=8772b554-d4f9-446e-8463-b3ffdf55092c job=download source="downloader/download.go:225" url="https://github.com/MLonCode/sonic"
time="2019-06-18T12:23:19.092771Z" level=info msg=finished elapsed=4.4916263s id=8772b554-d4f9-446e-8463-b3ffdf55092c job=download source="downloader/download.go:95" url="https://github.com/MLonCode/sonic"
time="2019-06-18T12:23:19.0935319Z" level=info msg=started id=4739b6a6-e2da-4f01-9fab-19e0da252c45 job=download source="downloader/download.go:80" url="https://github.com/MLonCode/code2seq"
time="2019-06-18T12:23:19.6686057Z" level=debug msg=cloned elapsed=906.594ms id=bdb59e73-d630-4bfc-8557-086b5f164408 job=download source="downloader/download.go:121" url="https://github.com/MLonCode/evop"
time="2019-06-18T12:23:19.6696591Z" level=debug msg="head commit found" head=95dd16974e21c2076a583748107719a1830b8885 id=bdb59e73-d630-4bfc-8557-086b5f164408 job=download source="downloader/download.go:136" url="https://github.com/MLonCode/evop"
time="2019-06-18T12:23:19.6704667Z" level=debug msg="root commit found" elapsed="510.3µs" id=bdb59e73-d630-4bfc-8557-086b5f164408 job=download root=3855424c699a2262fea3d2aa8549cbfd57ea4da2 source="downloader/download.go:148" url="https://github.com/MLonCode/evop"
time="2019-06-18T12:23:19.6723609Z" level=debug msg=copied elapsed=1.2275ms id=bdb59e73-d630-4bfc-8557-086b5f164408 job=download source="downloader/download.go:183" url="https://github.com/MLonCode/evop"
time="2019-06-18T12:23:20.1712836Z" level=debug msg=fetched elapsed=498.4842ms id=bdb59e73-d630-4bfc-8557-086b5f164408 job=download source="downloader/download.go:217" url="https://github.com/MLonCode/evop"
time="2019-06-18T12:23:20.175682Z" level=debug msg=commited elapsed=3.1023ms id=bdb59e73-d630-4bfc-8557-086b5f164408 job=download source="downloader/download.go:225" url="https://github.com/MLonCode/evop"
time="2019-06-18T12:23:20.1781269Z" level=info msg=finished elapsed=1.422356s id=bdb59e73-d630-4bfc-8557-086b5f164408 job=download source="downloader/download.go:95" url="https://github.com/MLonCode/evop"
time="2019-06-18T12:23:20.1804704Z" level=info msg=started id=1a194f8e-b912-4329-aeb3-353175172ec2 job=download source="downloader/download.go:80" url="https://github.com/MLonCode/ncc"
time="2019-06-18T12:23:21.1734223Z" level=debug msg=cloned elapsed=2.0794445s id=4739b6a6-e2da-4f01-9fab-19e0da252c45 job=download source="downloader/download.go:121" url="https://github.com/MLonCode/code2seq"
time="2019-06-18T12:23:21.1739694Z" level=debug msg="head commit found" head=f6fb85ea05ad96535cc6cc087f133830ab6b26c1 id=4739b6a6-e2da-4f01-9fab-19e0da252c45 job=download source="downloader/download.go:136" url="https://github.com/MLonCode/code2seq"
time="2019-06-18T12:23:21.1745408Z" level=debug msg="root commit found" elapsed="338.9µs" id=4739b6a6-e2da-4f01-9fab-19e0da252c45 job=download root=f22be3e739b784c0daa27d3d4aac022a0459a14e source="downloader/download.go:148" url="https://github.com/MLonCode/code2seq"
time="2019-06-18T12:23:21.1799215Z" level=debug msg=copied elapsed=4.9723ms id=4739b6a6-e2da-4f01-9fab-19e0da252c45 job=download source="downloader/download.go:183" url="https://github.com/MLonCode/code2seq"
time="2019-06-18T12:23:23.1977548Z" level=debug msg=cloned elapsed=3.0166444s id=1a194f8e-b912-4329-aeb3-353175172ec2 job=download source="downloader/download.go:121" url="https://github.com/MLonCode/ncc"
time="2019-06-18T12:23:23.1982726Z" level=debug msg="head commit found" head=4e9eaeb6e7d3687cc73e73ae0fdf513f4e429e83 id=1a194f8e-b912-4329-aeb3-353175172ec2 job=download source="downloader/download.go:136" url="https://github.com/MLonCode/ncc"
time="2019-06-18T12:23:23.199415Z" level=debug msg="root commit found" elapsed=1.0163ms id=1a194f8e-b912-4329-aeb3-353175172ec2 job=download root=fc688735a2a79acfb3365f4423878c8fd9536c30 source="downloader/download.go:148" url="https://github.com/MLonCode/ncc"
time="2019-06-18T12:23:23.2109243Z" level=debug msg=copied elapsed=11.0515ms id=1a194f8e-b912-4329-aeb3-353175172ec2 job=download source="downloader/download.go:183" url="https://github.com/MLonCode/ncc"
time="2019-06-18T12:23:23.6000845Z" level=debug msg=fetched elapsed=388.3955ms id=1a194f8e-b912-4329-aeb3-353175172ec2 job=download source="downloader/download.go:217" url="https://github.com/MLonCode/ncc"
time="2019-06-18T12:23:23.6152337Z" level=debug msg=commited elapsed=14.7819ms id=1a194f8e-b912-4329-aeb3-353175172ec2 job=download source="downloader/download.go:225" url="https://github.com/MLonCode/ncc"
time="2019-06-18T12:23:23.6190663Z" level=info msg=finished elapsed=3.437998s id=1a194f8e-b912-4329-aeb3-353175172ec2 job=download source="downloader/download.go:95" url="https://github.com/MLonCode/ncc"
time="2019-06-18T12:23:23.6199417Z" level=info msg=started id=5faf236b-7043-4b59-9eb5-5cbed55474bb job=download source="downloader/download.go:80" url="https://github.com/MLonCode/ggnn_graph_classification"
time="2019-06-18T12:23:23.7704598Z" level=debug msg=fetched elapsed=2.5900296s id=4739b6a6-e2da-4f01-9fab-19e0da252c45 job=download source="downloader/download.go:217" url="https://github.com/MLonCode/code2seq"
time="2019-06-18T12:23:23.7784882Z" level=debug msg=commited elapsed=7.8475ms id=4739b6a6-e2da-4f01-9fab-19e0da252c45 job=download source="downloader/download.go:225" url="https://github.com/MLonCode/code2seq"
time="2019-06-18T12:23:23.780564Z" level=info msg=finished elapsed=4.6866314s id=4739b6a6-e2da-4f01-9fab-19e0da252c45 job=download source="downloader/download.go:95" url="https://github.com/MLonCode/code2seq"
time="2019-06-18T12:23:23.7816651Z" level=info msg=started id=82d7765e-6ad8-4115-86e5-db6a81610158 job=download source="downloader/download.go:80" url="https://github.com/MLonCode/astminer"
time="2019-06-18T12:23:25.5383275Z" level=debug msg=cloned elapsed=1.7562361s id=82d7765e-6ad8-4115-86e5-db6a81610158 job=download source="downloader/download.go:121" url="https://github.com/MLonCode/astminer"
time="2019-06-18T12:23:25.5398515Z" level=debug msg="head commit found" head=d303f59581a4b42e39938677441d4a75ad05313e id=82d7765e-6ad8-4115-86e5-db6a81610158 job=download source="downloader/download.go:136" url="https://github.com/MLonCode/astminer"
time="2019-06-18T12:23:25.542616Z" level=debug msg="root commit found" elapsed=2.6189ms id=82d7765e-6ad8-4115-86e5-db6a81610158 job=download root=cb50054904c2f63654d65bf1d1f472efdb7ed5cf source="downloader/download.go:148" url="https://github.com/MLonCode/astminer"
time="2019-06-18T12:23:25.5446253Z" level=debug msg=copied elapsed=1.4631ms id=82d7765e-6ad8-4115-86e5-db6a81610158 job=download source="downloader/download.go:183" url="https://github.com/MLonCode/astminer"
time="2019-06-18T12:23:26.6332225Z" level=debug msg=fetched elapsed=1.0879254s id=82d7765e-6ad8-4115-86e5-db6a81610158 job=download source="downloader/download.go:217" url="https://github.com/MLonCode/astminer"
time="2019-06-18T12:23:26.6345237Z" level=debug msg=commited elapsed=1.1933ms id=82d7765e-6ad8-4115-86e5-db6a81610158 job=download source="downloader/download.go:225" url="https://github.com/MLonCode/astminer"
time="2019-06-18T12:23:26.6354998Z" level=info msg=finished elapsed=2.8535362s id=82d7765e-6ad8-4115-86e5-db6a81610158 job=download source="downloader/download.go:95" url="https://github.com/MLonCode/astminer"
time="2019-06-18T12:23:26.6365526Z" level=info msg=started id=bf3cfe8b-2597-42f8-8f64-b8dc2c7146d9 job=download source="downloader/download.go:80" url="https://github.com/MLonCode/structured-neural-summarization"
time="2019-06-18T12:23:27.7212567Z" level=debug msg=cloned elapsed=1.1199898s id=bf3cfe8b-2597-42f8-8f64-b8dc2c7146d9 job=download source="downloader/download.go:121" url="https://github.com/MLonCode/structured-neural-summarization"
time="2019-06-18T12:23:27.721759Z" level=debug msg="head commit found" head=9b0535ce760ad7af293b0de5b82c4070d900b915 id=bf3cfe8b-2597-42f8-8f64-b8dc2c7146d9 job=download source="downloader/download.go:136" url="https://github.com/MLonCode/structured-neural-summarization"
time="2019-06-18T12:23:27.7220215Z" level=debug msg="root commit found" elapsed="8.9µs" id=bf3cfe8b-2597-42f8-8f64-b8dc2c7146d9 job=download root=9b0535ce760ad7af293b0de5b82c4070d900b915 source="downloader/download.go:148" url="https://github.com/MLonCode/structured-neural-summarization"
time="2019-06-18T12:23:27.7401005Z" level=debug msg=copied elapsed=1.2315ms id=bf3cfe8b-2597-42f8-8f64-b8dc2c7146d9 job=download source="downloader/download.go:183" url="https://github.com/MLonCode/structured-neural-summarization"
time="2019-06-18T12:23:28.7461138Z" level=debug msg=fetched elapsed=1.005425s id=bf3cfe8b-2597-42f8-8f64-b8dc2c7146d9 job=download source="downloader/download.go:217" url="https://github.com/MLonCode/structured-neural-summarization"
time="2019-06-18T12:23:28.7500071Z" level=debug msg=commited elapsed=2.8174ms id=bf3cfe8b-2597-42f8-8f64-b8dc2c7146d9 job=download source="downloader/download.go:225" url="https://github.com/MLonCode/structured-neural-summarization"
time="2019-06-18T12:23:28.7547999Z" level=info msg=finished elapsed=2.153722s id=bf3cfe8b-2597-42f8-8f64-b8dc2c7146d9 job=download source="downloader/download.go:95" url="https://github.com/MLonCode/structured-neural-summarization"
time="2019-06-18T12:23:33.7560767Z" level=debug msg="waiting new metrics" metrics=library source="metrics/metrics.go:138"
... many repeated "waiting new metrics" ...

Error reading siva files, wrong offset

These repos don't share a common initial commit:

time="2019-06-13T16:53:16.698967058+02:00" level=error msg=failed error="index read failed: seek /home/jfontan/work/gitcollector/sivas-new/f2/f281ab6f2e0e38dcc3af05360667d8f530c00103.siva: invalid argument" id=41441ce5-149a-4df6-be0e-3ea5c187eac4 job=download source="downloader/download.go:61" url="https://github.com/src-d/core-retrieval"
time="2019-06-13T16:53:16.702707441+02:00" level=error msg=failed error="index read failed: seek /home/jfontan/work/gitcollector/sivas-new/f2/f281ab6f2e0e38dcc3af05360667d8f530c00103.siva: invalid argument" id=eb7040ac-3148-4a61-a86c-bdf811ab294e job=download source="downloader/download.go:61" url="https://github.com/src-d/notebooks"
time="2019-06-13T16:53:16.709913362+02:00" level=error msg=failed error="index read failed: seek /home/jfontan/work/gitcollector/sivas-new/f2/f281ab6f2e0e38dcc3af05360667d8f530c00103.siva: invalid argument" id=d1075a87-50fe-421e-9cda-b1c40ef5a14e job=download source="downloader/download.go:61" url="https://github.com/src-d/platform"
time="2019-06-13T16:53:16.724095577+02:00" level=error msg=failed error="index read failed: seek /home/jfontan/work/gitcollector/sivas-new/f2/f281ab6f2e0e38dcc3af05360667d8f530c00103.siva: invalid argument" id=7ef265a5-659a-4223-b790-d00e69f438a5 job=download source="downloader/download.go:61" url="https://github.com/src-d/guide"

Make gitcollector fail if metric could not be updated

caused by https://github.com/src-d/backlog/issues/1442

Since source{d} CE expects a homogeneous way to import repos and metadata between ghsync and gitcollector, and status is something needed to let the user know about the progress of the process, gitcollector should fail if the progress status could not be updated.

If I'm not wrong, if gitcollector cannot persist the metric, it just logs a warn and it continues the importation metrics/metrics.go#L170`.

Transaction timeout on download

summary one of repositories downloads fails with timeout error

environment

Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.3 LTS
Release:	18.04
Codename:	bionic
GOMAXPROCS is 8
8 CPU 16G RAM

steps to reproduce
just run this command

./gitcollector download --library=/home/lwsanty/gitcollector/ --orgs=src-d,bblfsh

expected all corresponding repositories downloaded
observed one of downloads fails with timeout

[2019-09-10T16:54:41.685787348+03:00] ERROR failed error=timeout exceeded: unable to retrieve repository from location cc761bed408a1d6a20457d48d8bf807d3531db4f in transactional mode.

github.com/src-d/go-borges/siva.(*transactioner).Start
        /home/lwsanty/goproj/gopath/pkg/mod/github.com/src-d/go-borges@v0.0.0-20190704083038-44867e8f2a2a/siva/transactioner.go:45
github.com/src-d/go-borges/siva.(*Location).repository
        /home/lwsanty/goproj/gopath/pkg/mod/github.com/src-d/go-borges@v0.0.0-20190704083038-44867e8f2a2a/siva/location.go:528
github.com/src-d/go-borges/siva.(*Location).Init
        /home/lwsanty/goproj/gopath/pkg/mod/github.com/src-d/go-borges@v0.0.0-20190704083038-44867e8f2a2a/siva/location.go:242
github.com/src-d/gitcollector/downloader.PrepareRepository
        /home/lwsanty/goproj/lwsanty/gitcollector/downloader/git.go:199
github.com/src-d/gitcollector/downloader.downloadRepository
        /home/lwsanty/goproj/lwsanty/gitcollector/downloader/download.go:171
github.com/src-d/gitcollector/downloader.Download
        /home/lwsanty/goproj/lwsanty/gitcollector/downloader/download.go:78
github.com/src-d/gitcollector/library.(*Job).Process
        /home/lwsanty/goproj/lwsanty/gitcollector/library/job.go:57
github.com/src-d/gitcollector.(*worker).consumeJob.func1
        /home/lwsanty/goproj/lwsanty/gitcollector/worker.go:61
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1337 id=72f28f0b-50f1-40e1-8457-e27d138525e4 job=download url=https://github.com/src-d/go-mysql-server

note issue was reproduced only once and cannot be reproduced after 3 attempts

question it looks like the problem is in transactions timeout of borges, we could avoid this problem if we propagate transaction timeout from gitcollector to borges by adding new flag to gitcollector

If repositories are already downloaded metrics are not updated

time="2019-06-24T17:18:51.153575758Z" level=debug msg="temporal dir: /tmp/gitcollector-downloader097521558" source="subcmd/download.go:70"
time="2019-06-24T17:18:51.15442459Z" level=debug msg="acces token found" source="subcmd/download.go:82"
time="2019-06-24T17:18:51.154750993Z" level=debug msg="allow updates on downloads: true" source="subcmd/download.go:98"
time="2019-06-24T17:18:51.178458962Z" level=debug msg="metrics collection activated: sync timeout 30" source="subcmd/download.go:126"
time="2019-06-24T17:18:51.178528544Z" level=debug msg="number of workers in the pool 2" source="subcmd/download.go:132"
time="2019-06-24T17:18:51.1787704Z" level=debug msg="worker pool is running" source="subcmd/download.go:135"
time="2019-06-24T17:18:51.179911077Z" level=debug msg="opennebula organization provider started" source="subcmd/download.go:208"
time="2019-06-24T17:18:52.5755099Z" level=info msg=started id=b0443e10-2dc8-4a94-a014-6cf9c3abb22b job=download source="downloader/download.go:80" url="https://github.com/OpenNebula/addon-iscsi"
time="2019-06-24T17:18:52.578585097Z" level=info msg=started id=0067d43d-1e97-4f09-b326-b65066969cef job=download source="downloader/download.go:80" url="https://github.com/OpenNebula/one"
time="2019-06-24T17:18:53.663989463Z" level=debug msg="opennebula organization provider stopped" source="subcmd/download.go:204"
time="2019-06-24T17:18:54.05826056Z" level=debug msg=cloned elapsed=1.482295251s id=b0443e10-2dc8-4a94-a014-6cf9c3abb22b job=download source="downloader/download.go:153" url="https://github.com/OpenNebula/addon-iscsi"
[...]
time="2019-06-24T17:20:28.19196274Z" level=debug msg="root commit found" elapsed=775ns id=9322eda0-1938-4e24-a0a2-28025aa9d905 job=download root=2569c757c1eb9dc17414399419097caf63770dfb source="downloader/download.go:180" url="https://github.com/OpenNebula/addon-terraform"
time="2019-06-24T17:20:28.197779036Z" level=debug msg=copied elapsed=5.381465ms id=9322eda0-1938-4e24-a0a2-28025aa9d905 job=download source="downloader/download.go:215" url="https://github.com/OpenNebula/addon-terraform"
time="2019-06-24T17:20:28.547910983Z" level=debug msg=fetched elapsed=349.103447ms id=9322eda0-1938-4e24-a0a2-28025aa9d905 job=download source="downloader/download.go:249" url="https://github.com/OpenNebula/addon-terraform"
time="2019-06-24T17:20:28.551868571Z" level=debug msg=commited elapsed=3.746051ms id=9322eda0-1938-4e24-a0a2-28025aa9d905 job=download source="downloader/download.go:257" url="https://github.com/OpenNebula/addon-terraform"
time="2019-06-24T17:20:28.554792675Z" level=info msg=finished elapsed=1.322057606s id=9322eda0-1938-4e24-a0a2-28025aa9d905 job=download source="downloader/download.go:96" url="https://github.com/OpenNebula/addon-terraform"
time="2019-06-24T17:20:29.281474141Z" level=debug msg=fetched elapsed=1.434356058s id=281adc17-2260-4cde-b7b5-e13b8fe73046 job=download source="downloader/download.go:249" url="https://github.com/OpenNebula/addon-linstor_un"
time="2019-06-24T17:20:29.282272046Z" level=debug msg=commited elapsed="755.43µs" id=281adc17-2260-4cde-b7b5-e13b8fe73046 job=download source="downloader/download.go:257" url="https://github.com/OpenNebula/addon-linstor_un"
time="2019-06-24T17:20:29.2827552Z" level=info msg=finished elapsed=2.610350605s id=281adc17-2260-4cde-b7b5-e13b8fe73046 job=download source="downloader/download.go:96" url="https://github.com/OpenNebula/addon-linstor_un"
time="2019-06-24T17:20:29.282858124Z" level=info msg="metrics updated" discover=0 download=0 fail=0 metrics=library org=opennebula source="metrics/metrics.go:204" update=0
time="2019-06-24T17:20:29.282886831Z" level=debug msg="worker pool stopped successfully" source="subcmd/download.go:140"
time="2019-06-24T17:20:29.282920121Z" level=info msg="collection finished in 1m38.131065017s" source="subcmd/download.go:143"

GitHub client to retrieve repositories from an organization

It will get an organization name and a token. It should work similar to a Rovers provider and support rate limiting and provide an iterator interface. It should return, at least, the https endpoint.

It should be contained in discovery package for later reuse.

Difference between gitcollector and borges

Hey, congrats on the first stable release!

I have a few question that may be documented somewhere and I just did not find it - sorry in advance! What is the difference between this tool and a borges CLI tool? Does both support the same use-cases? When one should be using one of another?

Thanks in advance!

Worker pool for download and update jobs

It should check jobs in two queues, download and update:

  • Download queue will be driven by discovery of new repositories.
  • Update queue is filled from time to time with all locations.

Download queue take precedence over update queue. This can be implemented with a channel for each queue and selecting over the download queue first to check if there's any job waiting.

Also create the "cron" scheduler for filling the update queue from time to time. The time between updates should be configurable.

Note: It may be interesting to only add locations to the update queue that are not already there or only add new locations if it's empty but only as a bonus for now. Probably not needed for the first version.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.