Comments (8)
Hi @rajurajvijay619, did you try using just a single language for the experiments?
E.g for java, I find total 500k samples from 184Mb of .gz
to be very comfortably manageable on a laptop.
As one can see from published analysis example,
languages like Go, JS or Ruby would give even smaller dataset sizes and fit on almost any local machine.
Hope this helps and good luck with experiments!
from codesearchnet.
I'll go ahead and close this issue, please lmk if there are any more questions
from codesearchnet.
@bzz just one question, when running for a single language(local machine), does the setup still requires GPUs?
from codesearchnet.
@sara-02 you can download data without GPUs, however running the default models in this repo will be painfully slow without gpus. However, you can try training on a smaller sample of the data as @bzz proposes, you can also set this parameter to limit the size of the data.
Also, google colab notebooks are great for free GPUs. Thanks for getting involved with this project ❤️
from codesearchnet.
@rajurajvijay619 can you describe your constraints a bit more? Is it disk size for downloading the dataset? Can you download the entire dataset and just sample from that?
Thanks for your feedback
from codesearchnet.
@sara-02 you can download data without GPUs, however running the default models in this repo will be painfully slow without gpus. However, you can try training on a smaller sample of the data as @bzz proposes, you can also set this parameter to limit the size of the data.
Also, google colab notebooks are great for free GPUs. Thanks for getting involved with this project heart
Thanks. I will look into colab as well as running it locally with only one language. I was hesitant to start because the first set in setup states that Additionally, you must install Nvidia-Docker to satisfy GPU-compute related dependencies.
So, I thought the code might not run as-is on a local system with GPUs.
from codesearchnet.
@sara-02 you are correct regarding docker. I think in the end it could make your life easier to use the Docker setup, as installing all the dependencies by hand can become very cumbersome and brittle.
Let me know where you are struggling with Docker and I will be more than happy to help! I wrote this tutorial regarding Docker incase a gentle introduction is useful.
Looking forward to see what you do with this dataset! Please do not be shy in asking questions!
from codesearchnet.
if you are using collab, I do not believe you will be able to use Docker, in that case you will have to install via pip
all the dependencies defined in the Dockerfile in the Collab notebook
from codesearchnet.
Related Issues (20)
- Less number of data found than stated in the paper HOT 1
- question about NDCG calculation HOT 2
- Generating Pypi module for function_parser HOT 3
- How can I get the annotated code? HOT 1
- Error when executing docker run
- Missing annoy module
- Missing code to build files *_dedupe_definitions_v2.pkl HOT 1
- NDCG computation HOT 1
- How to deconstruct code into tokens to extract functions and comments? HOT 2
- How to run the Function Parser?
- What is the difference between the Original String and code fields?
- How big the dataset is?
- Request to provide unfiltered dataset HOT 1
- Codes
- Please add the commit id for each language parser
- Expired or Private Links of Java Code Snippets in CodeSearchNET
- Clone not working HOT 1
- can we combine the original dataset and re-divided to perform cross-validation?
- dataset can not be downloaded HOT 2
- Functions with original comments
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from codesearchnet.