Git Product home page Git Product logo

harwest-tool's Introduction

πŸ“¦ Harwest ⛏

PyPI Downloads PyPI

Harwest takes away the hassle of managing your submission files on different online-judges by automating the entire process of collecting and organizing your code submissions in one single Git repository.

Here's a sample repository created using Harwest: harwest-sample

Highlights

  • Fully automated collection of all yours submissions with minimal effort setup
  • Simple and easy to use interface to get you started in minutes
  • Extensive traceability for your submissions with reference to the problem, tags, submission date and more
  • Single commit for each submission stamped with the original submission date for building rich and accurate contributions graph
  • Automated git pushes to the remote repository with every update
  • Requires little to no knowledge of operating Git (though would strongly recommend learning it)

Platforms

Harwest currently has extensive support for the following platforms:

while integration with various other OJs are still in the kitchen. Contributions are always welcomed.

Installation

You will require Python 3.5+ along with pip3 in order to be able to install and use Harwest. Refer to the documentation for installing pip on windows, ubuntu/linux or macOS

The package is available at https://pypi.python.org/pypi/harwest PyPI

Run the following command in the terminal to install the package:

$ pip3 install harwest

Getting Started

After installing the package, run the following command in the terminal:

$ harwest

In case you're using Harwest for the first time, you'd be greeted with a set of configuration steps that you'll have to complete to set up the tool.

  • Step [1] requires you to select a directory name where all your code submissions will be stored. The directory will be created under the same path from where you executed the command.

    In case you'd like to set up the directory at some other location then press <Ctrl>+<C> to exit from the setup and execute the command again from your desired location.

  • Step [2] is straight-forward and asks you to enter your full-name and email address which will be used for setting up the git repository.

    NOTE: For the contributions to show up in the contributions streak graph, the provided email address must be the same as the email address associated with your GitHub/BitBucket account

  • Step [3] though optional, takes away the effort of even pushing the changes to the Git repository from you. To take advantage of this feature, create an empty git repository in GitHub or BitBucket (without any README, .gitignore or license) and copy and paste the git remote url as input for this step.

    If you however don't want automated pushes for your repository then leave the input as empty and press <enter>. You can always push the repository to remote manually.

nellex@HQ:~$ harwest

      __  __                              __
     / / / /___ _______      _____  _____/ /_
    / /_/ / __ `/ ___/ | /| / / _ \/ ___/ __/
   / __  / /_/ / /   | |/ |/ /  __(__  ) /_
  /_/ /_/\__,_/_/    |__/|__/\___/____/\__/

  ==========================================

Hey there! πŸ‘‹ Looks like you're using Harwest for the first time. Let's get you started πŸš€

[1] We'll need to create a directory to store all your files
    The directory will be created as /home/nellex/<your-input>
> So, what would you like your directory to be called? accepted
πŸ‘ Alright, so you're directory will be created at /home/nellex/accepted

[2] Then let's build your author tag which will appear in your Git commits as:
    Author: Steve Jobs <[email protected]>
> So what would your beautiful (Author) Full Name be? Nilesh Sah
> And of course, your magical (Author) Email Address? [email protected]

[3] Guess what? We can automate the Git pushes for you too! πŸŽ‰
   In case you'd like that, then please specify the remote Git Url for an "empty" repository
   It would be somewhat like https://github.com/nileshsah/harwest-tool.git
   But it's optional, in case you'd like to skip then leave it empty and just hit <enter>
> (Optional) So, what would be the remote url for the repository again? https://github.com/nileshsah/accepted.git

 πŸ₯³ You rock! We're all good to go now

Once the initial set up is complete, you can then execute the command

$ harwest <platform>
$ harwest codeforces # example

to harvest your submissions from the Codeforces platform. If it's the first time you're running the command, you'll be prompted for providing your Codeforces handle name

> So what's your prestigious Codeforces Handle Name? nellex

Harwest will then start scraping all your submissions, starting from page 1 till the very end.

nellex@HQ:~$ harwest codeforces

      __  __                              __
     / / / /___ _______      _____  _____/ /_
    / /_/ / __ `/ ___/ | /| / / _ \/ ___/ __/
   / __  / /_/ / /   | |/ |/ /  __(__  ) /_
  /_/ /_/\__,_/_/    |__/|__/\___/____/\__/

  ==========================================

⛏ ️Harvesting Codeforces (nellex) Submissions to /home/nellex/accepted
βŒ›  Currently scanning page #1: (24/24) Phoenix and Beauty https://codeforces.com/contest/1348/problem/B
Username for 'https://github.com': nileshsah
Password for 'https://[email protected]':
πŸ‘Œ The updates were automatically pushed to the remote repository
βœ… The repository was successfully updated!

In case scanning stops at any page due to some server side error, you can restart scraping from the failed page by running the command:

$ harwest <platform> --start-page N
$ harwest codeforces --start-page 3 # example

or instead force Harwest to re-scan the entire submission list for the platform by running:

$ harwest <platform> --full-scan
$ harwest atcoder --full-scan # example

Reconfigure

Harwest settings can be reconfigured by running the following command which will then restart the entire configuration steps.

$ harwest --init

Harwest provides the ability to re-use an existing directory previously created by this tool for further updates.

To change the handle name for a specific platform, you can run:

$ harwest <platform> --setup
$ harwest codeforces --setup # example

License

MIT License

harwest-tool's People

Contributors

ngthanhtrung23 avatar nileshsah avatar s-i-d-d-i-s avatar sainad2222 avatar siddhant-k-code avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

harwest-tool's Issues

atcoder harwest for java submissions does not work

Source : https://codeforces.com/blog/entry/85788?#comment-747260

Error Log says

ValueError: ("Please provide correct file extension for the language 'Java8 (OpenJDK 1.8.0)' in", 'C:\\Users\\bleh0\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python39\\site-packages\\harwest\\lib\\resources\\language.json', 'file')

Possible Fix:

add the entry "Java8 (OpenJDK 1.8.0)": "java" in language.json

Hell of the error...

Traceback (most recent call last):
File "/home/imskanand/.local/bin/harwest", line 8, in
sys.exit(main())
File "/home/imskanand/.local/lib/python3.8/site-packages/harwest/harwest.py", line 115, in main
args.func(args)
File "/home/imskanand/.local/lib/python3.8/site-packages/harwest/harwest.py", line 70, in codeforces
process_platform(args, "Codeforces", CodeforcesWorkflow)
File "/home/imskanand/.local/lib/python3.8/site-packages/harwest/harwest.py", line 90, in process_platform
workflow(configs).run(start_page_index=args.start_page, full_scan=full_scan)
File "/home/imskanand/.local/lib/python3.8/site-packages/harwest/lib/abstractworkflow.py", line 105, in run
self.repository.push()
File "/home/imskanand/.local/lib/python3.8/site-packages/harwest/lib/utils/repository.py", line 52, in push
self.git.push(*args)
File "/home/imskanand/.local/lib/python3.8/site-packages/git/cmd.py", line 542, in
return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
File "/home/imskanand/.local/lib/python3.8/site-packages/git/cmd.py", line 1005, in _call_process
return self.execute(call, **exec_kwargs)
File "/home/imskanand/.local/lib/python3.8/site-packages/git/cmd.py", line 822, in execute
raise GitCommandError(command, status, stderr_value, stdout_value)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git push origin master
stderr: 'fatal: protocol 'git remote add origin https' is not supported'

The timestamp on Readme.md is incorrect

Capture

The commit on readme has date 00:00:00 2000

If this is intended, maybe changing it to some user specified date would be better.
It's adding lot of years beside contribution graph

Capture

Valueerror on running "harwest atcoder --full-scan"

File "/home/sainath/.local/bin/harwest", line 8, in
sys.exit(main())
File "/home/sainath/.local/lib/python3.8/site-packages/harwest/harwest.py", line 115, in main
args.func(args)
File "/home/sainath/.local/lib/python3.8/site-packages/harwest/harwest.py", line 74, in atcoder
process_platform(args, "AtCoder", AtcoderWorkflow)
File "/home/sainath/.local/lib/python3.8/site-packages/harwest/harwest.py", line 90, in process_platform
workflow(configs).run(start_page_index=args.start_page, full_scan=full_scan)
File "/home/sainath/.local/lib/python3.8/site-packages/harwest/lib/abstractworkflow.py", line 96, in run
response.append(self.__add_submission(submission))
File "/home/sainath/.local/lib/python3.8/site-packages/harwest/lib/abstractworkflow.py", line 29, in __add_submission
solution_file_path = self.__get_solution_path(submission)
File "/home/sainath/.local/lib/python3.8/site-packages/harwest/lib/abstractworkflow.py", line 60, in __get_solution_path
lang_ext = config.get_language_extension(submission_lang)
File "/home/sainath/.local/lib/python3.8/site-packages/harwest/lib/utils/config.py", line 49, in get_language_extension
raise ValueError(
ValueError: ("Please provide correct file extension for the language 'Python3 (3.4.3)' in", '/home/sainath/.local/lib/python3.8/site-packages/harwest/lib/resources/language.json', 'file')

On `harwest codeforces` pull the (latest) unaccepted submission too

When doing harwest codeforces, is there a method by which I can pull the (latest) unaccepted submission too, please?

  • I would love to have the ability to see changed code as I keep progressing.
  • Right now it only pulls the correct solution. Is it a CodeForces API limitation?

feat: Automate using GitHub Actions

From what I understand, to update the repo, we still require to periodically run harwest codeforces or harwest atcoder. Instead, we could also offer the option of automatically setting up a GitHub Action that runs harwest and updates the repo daily. (using the cron directive in the .yml file)

A few ideas I have in mind:

  • During initialization, we detect if the repository url is a GitHub repo. If yes, we prompt whether they are okay with automatic updates by a GitHub Action.
  • If yes, we also add a .github/workflows/harwest.yml to the initial repo.
  • Currently, the configuration data is stored in lib/resources. This will have to be changed to the repo itself (maybe a .config folder).
  • If the user runs harwest codeforces or the like locally, we will pull first to receive the latest updated repo.

P.S. Great work with the project - really slick!

Support for Atcoder OJ

Support for Atcoder can be easily added using Kenkooo API

Here's a example repo combining both Codeforces and Atcoder Link

Harwest only creates folder, not downloading submission. (Codeforces)

Submissions aren't downloaded. Instead, it only creates empty contest folders.

submissions.js is left empty.

No error message whatsoever.

Tried:

  • Other cf handle
  • Full scan
  • Specifying start page
  • cmd and Powershell (admin)

Working fine for atcoder.

Windows 10 19044, Python 3.9

Issue in GitHub Login

Facing this issue:
image

Traceback (most recent call last):
  File "d:\python\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "d:\python\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Python\Scripts\harwest.exe\__main__.py", line 9, in <module>
  File "d:\python\lib\site-packages\harwest\harwest.py", line 106, in main
    args.func(args)
  File "d:\python\lib\site-packages\harwest\harwest.py", line 77, in codeforces
    CodeforcesWorkflow(configs).run(start_page_index=args.start_page)
  File "d:\python\lib\site-packages\harwest\lib\codeforces\workflow.py", line 92, in run
    self.repository.push()
  File "d:\python\lib\site-packages\harwest\lib\utils\repository.py", line 52, in push
    self.git.push(*args)
  File "d:\python\lib\site-packages\git\cmd.py", line 542, in <lambda>
    return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
  File "d:\python\lib\site-packages\git\cmd.py", line 1005, in _call_process
    return self.execute(call, **exec_kwargs)
  File "d:\python\lib\site-packages\git\cmd.py", line 822, in execute
    raise GitCommandError(command, status, stderr_value, stdout_value)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git push origin master
  stderr: 'Logon failed, use ctrl+c to cancel basic credential prompt.
bash: /dev/tty: No such device or address
error: failed to execute prompt script (exit code 1)
fatal: could not read Username for 'https://github.com': No such file or directory'

I filled all correct details at both places!!

c++20 file extension

raise ValueError(
ValueError: ("Please provide correct file extension for the language 'GNU C++20 (64)'

Newer submission gets crawled first

Hello,

I found out that when I have multiple submissions on CodeForces for the same problem, the newer submissions get crawled first, then the older one gets crawled. As result, "the newest commit" which stacks on top turns out to be the oldest submission instead of the newest one.

Is there a way so that my newest submission will appear as the latest commit?

[Feature Request] Crawl submissions in gym & virtual contest

These submissions require login. Using requests.session, login should be possible. I've hacked around and this login method works:

    def __login(self):
        username = 'I_love_Hoang_Yen'
        password = '<redacted>'
        bfaa = 'f1b3f18c715565b589b7823cda7448ce'
        ftaa = ''.join(random.choices('abcdefghijklmnopqrstuvwxyz0123456789', k=18))
        LOGIN_URL = 'https://codeforces.com/enter'
        r = self.session.get(LOGIN_URL)
        csrf = r.text.split("csrf_token' value='")[1].split("'")[0]

        data = {
            "csrf_token": csrf,
            "action": "enter",
            "ftaa": ftaa,
            "bfaa": bfaa,
            "handleOrEmail": username,
            "password": password,
            "_tta": "176",
            "remember": "on",
        }
        r = self.session.post(LOGIN_URL, data=data, headers={'X-Csrf-Token': csrf})

After that it's also necessary to modify submission URL (for contest ID > 100k, should be /gym/{contest_id}/submission/{submission_id}.

Feature to push code automatically to any branch mentioned during init process

Currently, the recent GitHub projects use main as the master branch instead of the master. So when I use the feature to automatically push code to GitHub using harwest, it shows an error as my project doesn't have master as my branch.

cmdline: git push origin master stderr: 'error: src refspec master does not match any'

This is the error you get

Cannot push Solution

βŒ› Currently scanning page #2: (5/5) 1111gal password https://atcoder.jp/contests/abc242/tasks/abc242_c
Traceback (most recent call last):
File "c:\users\singh\appdata\local\programs\python\python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\singh\appdata\local\programs\python\python39\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\singh\AppData\Local\Programs\Python\Python39\Scripts\harwest.exe_main
.py", line 7, in
File "c:\users\singh\appdata\local\programs\python\python39\lib\site-packages\harwest\harwest.py", line 115, in main
args.func(args)
File "c:\users\singh\appdata\local\programs\python\python39\lib\site-packages\harwest\harwest.py", line 74, in atcoder
process_platform(args, "AtCoder", AtcoderWorkflow)
File "c:\users\singh\appdata\local\programs\python\python39\lib\site-packages\harwest\harwest.py", line 90, in process_platform
workflow(configs).run(start_page_index=args.start_page, full_scan=full_scan)
File "c:\users\singh\appdata\local\programs\python\python39\lib\site-packages\harwest\lib\abstractworkflow.py", line 105, in run
self.repository.push()
File "c:\users\singh\appdata\local\programs\python\python39\lib\site-packages\harwest\lib\utils\repository.py", line 52, in push
self.git.push(*args)
File "c:\users\singh\appdata\local\programs\python\python39\lib\site-packages\git\cmd.py", line 542, in
return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
File "c:\users\singh\appdata\local\programs\python\python39\lib\site-packages\git\cmd.py", line 1005, in _call_process
return self.execute(call, **exec_kwargs)
File "c:\users\singh\appdata\local\programs\python\python39\lib\site-packages\git\cmd.py", line 822, in execute
raise GitCommandError(command, status, stderr_value, stdout_value)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git push origin master
stderr: '
Unhandled Exception: System.ComponentModel.Win32Exception: The directory name is invalid
at System.Diagnostics.Process.StartWithCreateProcess(ProcessStartInfo startInfo)
at System.Diagnostics.Process.Start()
at GitCredentialManager.GitProcess.get_Version()
at GitCredentialManager.GitProcessConfiguration.GetCanonicalizeTypeArg(GitConfigurationType type)
at GitCredentialManager.GitProcessConfiguration.TryGet(GitConfigurationLevel level, GitConfigurationType type, String name, String& value)
at GitCredentialManager.Settings.d__5.MoveNext()
at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source)
at GitCredentialManager.Settings.TryGetSetting(String envarName, String section, String property, String& value)
at GitCredentialManager.Authentication.MicrosoftAuthentication.CanUseBroker(ICommandContext context)
at GitCredentialManager.Program.Main(String[] args)
bash: /dev/tty: No such device or address
error: failed to execute prompt script (exit code 1)
fatal: could not read Username for 'https://github.com': No such file or directory'

This is the error I am getting while pushing the solutions.

Workflow will stop if 1 submission page only has gym submissions

How to reproduce:

  • Set CF handle to I_love_Hoang_Yen,
  • Run harwest codeforces -p 5

What happens: the crawler stop without crawling anything, even though I have 150+ pages of submissions.

I think the reason is because page 5 has only my non-AC or gym submissions. So self.client.get_user_submissions returns an empty array, thus stopping the crawler.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.