malariagen / ag1000g-phase3-data-paper Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
Crosses meta data in the release bucket does not match the file here - the 'cross' column is missing.
update
There is some confusion here. @hardingnj, the file you linked in your email (see above) is actually the phase2 crosses meta data, I don't know what is doing in vector_ops - we really need to put the phase in the file names!
There is a file in vector_ops, however, that looks more helpful - I will see how this compares with the crosses genotypes for number of samples.
Either way, we still don't have an up to date cross meta data file in the release bucket.
From https://github.com/malariagen/ag1000g-phase3-data-paper/actions/runs/355697703
The set-env
command is deprecated and will be disabled on November 16th. Please upgrade to using Environment Files.
The add-path
command is deprecated and will be disabled on November 16th. Please upgrade to using Environment Files.
GitHub Actions: Deprecating set-env and add-path commands
https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/
Not something that needs any discussion now, but might be worth talking at some point about how the population definitions are stored. Currently they're stored as either a YAML or CSV file that maps population IDs like "ANG_1_coluzzii_2009" to sets of sample IDs. Some thoughts:
E.g., population_definitions.yml could be something like:
BF_bana_2012_coluzzii:
label: Burkina Faso, Bana, 2012, An. coluzzii
samples:
- sample_set: AG1000G-BF-A
query: location = "Bana" and species_aim = "coluzzii"
BF_pala_2012_coluzzii:
label: Burkina Faso, Pala, 2012, An. coluzzii
samples:
- sample_set: AG1000G-BF-A
query: location = "Pala" and species_aim = "coluzzii"
# etc.
ref: https://github.com/malariagen/ag1000g-phase3-data-paper/tree/master/content/tables/site-filters
https://github.com/malariagen/ag1000g-phase3-data-paper/pull/49/checks?check_run_id=1385871856
https://github.com/malariagen/ag1000g-phase3-data-paper/pull/51/checks?check_run_id=1381519389
[INFO] Running filter pandoc-manubot-cite
Traceback (most recent call last):
File "/usr/share/miniconda3/envs/manubot/bin/pandoc-manubot-cite", line 8, in <module>
sys.exit(main())
File "/usr/share/miniconda3/envs/manubot/lib/python3.7/site-packages/manubot/pandoc/cite_filter.py", line 214, in main
doc = pf.load(input_stream=args.input)
File "/usr/share/miniconda3/envs/manubot/lib/python3.7/site-packages/panflute/io.py", line 58, in load
doc = json.load(input_stream, object_hook=from_json)
File "/usr/share/miniconda3/envs/manubot/lib/python3.7/json/__init__.py", line 296, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/share/miniconda3/envs/manubot/lib/python3.7/json/__init__.py", line 361, in loads
return cls(**kw).decode(s)
File "/usr/share/miniconda3/envs/manubot/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/share/miniconda3/envs/manubot/lib/python3.7/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
File "/usr/share/miniconda3/envs/manubot/lib/python3.7/site-packages/panflute/elements.py", line 1353, in from_json
return Doc(*items, api_version=api, metadata=meta)
File "/usr/share/miniconda3/envs/manubot/lib/python3.7/site-packages/panflute/elements.py", line 66, in __init__
raise TypeError("invalid api version", api_version)
TypeError: ('invalid api version', [1, 20])
Error running filter pandoc-manubot-cite:
Filter returned error status 1
Error: Process completed with exit code 83.
Looking at the popstructure_count-alleles notebook I notice one of the species groups is called "gamb_colu_arab_pca" but this refers to data within the 2La inversion. Shouldn't this be called "gamb_colu_arab_2La" instead to avoid confusion?
Unsure of complete context here: but I assume repeat of the phase 2 analyses.
Discussion of results.
Calculation of scaled_lat
and scaled_long
in latlong_to_rgb_hex_via_lab()
assume lat
and long
are int
, but may be pd.core.series.Series
.
lat
and long
are int
in spot checks, so the diagnostics miss this.
Results in miscalculation of colour space, when lat
and long
are pd.core.series.Series
, not always noticeable.
ag3.py pulls meta data from vo_agam_release/v3/metadata/general/AG1000G-X/samples.meta.csv
.
The new crosses metadata with cross info lives at vo_agam_release/v3/metadata/crosses/samples.crosses.csv
.
Should we point ag3.py at the new file or replace the original file with the new one - which would you prefer?
Crosses genotypes has data from 699 samples, as does the (old) crosses meta dats at vo_agam_release/v3/metadata/general/AG1000G-X/samples.meta.csv
.
The new crosses meta, however, has only 519 rows.
Also, in the text we talk about 15 crosses, five of which are new. If I df.cross_id.unique()
the new meta data I get 24 named crosses?
@hardingnj, any ideas what has happened here?
Report prevalence of:
PCA complete for gamb_colu only.
Still needs discussion of major results
ref
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.