Comments (2)
Hey there,
so I currently use this (self written-ish) script for downloading.
You'll need ffmpeg, proxychains (for a proxy), gnu-parallel for it to work.
# Downloading audioset script
# Requirements are proxychains and ffmpeg
# Also Youtube-dl
# pip install youtube-dl
# Chan be changed by ./0_download_audioset.sh 8
base_dir=${1-"./data"}
njobs=${2:-4}
SAMPLE_RATE=16000
EXTENSION="wav"
balanced_dir=${base_dir}/audio/balanced/
eval_dir=${base_dir}/audio/eval/
csv_dir=${base_dir}/csvs
log_dir=${base_dir}/logs
fetch_clip() {
# echo "Fetching $1 ($2 to $3)..."
outname="$1_$2_$3"
outdir=${4}
output_path=${outdir}/${outname}.${EXTENSION}
# Do not redownload already existing file
if [ -f "${output_path}" ]; then
return
fi
link=$(youtube-dl -g https://youtube.com/watch?v=$1 | awk 'NR==2{print}')
if [ $? -eq 0 ]; then
proxychains -q ffmpeg -loglevel quiet -i "$link" -ar $SAMPLE_RATE -ac 1 \
-ss "$2" -to "$3" "${output_path}"
fi
}
function parallel_download() {
if [[ $# != 2 ]]; then
echo "[csv_segments] [output_dir]"
exit
fi
csv_segments=${1}
output_dir=${2}
echo "Downloading ${csv_segments} Subset using ${njobs} workers"
grep "^[^#;]" ${csv_segments} | parallel --bar --resume --joblog ${log_dir}/job.log -j $njobs --colsep=, fetch_clip {1} {2} {3} ${output_dir} > /dev/null
}
export SAMPLE_RATE
export EXTENSION
export -f fetch_clip
mkdir -p ${csv_dir} ${balanced_dir} ${eval_dir} ${log_dir}
wget --continue http://storage.googleapis.com/us_audioset/youtube_corpus/v1/csv/balanced_train_segments.csv -O ${csv_dir}/balanced_train_segments.csv
wget --continue http://storage.googleapis.com/us_audioset/youtube_corpus/v1/csv/eval_segments.csv -O ${csv_dir}/eval_segments.csv
wget --continue http://storage.googleapis.com/us_audioset/youtube_corpus/v1/csv/class_labels_indices.csv -O ${csv_dir}/class_labels_indices.csv
parallel_download ${csv_dir}/balanced_train_segments.csv ${balanced_dir}
parallel_download ${csv_dir}/eval_segments.csv ${eval_dir}
echo "Finished Downloading data"
from datadriven-gpvad.
Thank you very much!
from datadriven-gpvad.
Related Issues (15)
- When forward “example.wav”, Can not get the same result as Readme HOT 4
- Evaluation set could provide? HOT 6
- assert len(cv_df) > 0, "Fraction a bit too large?" HOT 1
- Provide teacher pretrained for project HOT 1
- How was the ground truth in the article be set? How to get it? HOT 4
- Using the SRE model for other languages HOT 2
- Training from scratch [Data format query] HOT 7
- Testset C came which one tasks in DCASE18? HOT 1
- The error about “python3 extract_features.py wavs.txt -o hdf5/balanced.h5”,too HOT 7
- Something wrong when I tried to extract features HOT 2
- 'filename' also needed in data/softlabels/hdf5/balanced.h5 ? HOT 4
- The error about “python3 extract_features.py wavs.txt -o hdf5/balanced.h5” HOT 11
- About how to perform fine-tunning HOT 7
- How to train "teacher"? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datadriven-gpvad.