Git Product home page Git Product logo

ag1000g-phase3-data-paper's People

Contributors

adam3smith avatar adebali avatar agapow avatar agitter avatar alimanfoo avatar cclarkson avatar cgreene avatar ctb avatar dhimmel avatar dsiddy avatar evancofer avatar gwaybio avatar hardingnj avatar leehart avatar michaelmhoffman avatar olgabot avatar petebachant avatar rgieseke avatar rhagenson avatar slochower avatar vincerubinetti avatar vsmalladi avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

alimanfoo d-kwiat

ag1000g-phase3-data-paper's Issues

phase 3 crosses meta data - release bucket version incorrect

Crosses meta data in the release bucket does not match the file here - the 'cross' column is missing.

update
There is some confusion here. @hardingnj, the file you linked in your email (see above) is actually the phase2 crosses meta data, I don't know what is doing in vector_ops - we really need to put the phase in the file names!

There is a file in vector_ops, however, that looks more helpful - I will see how this compares with the crosses genotypes for number of samples.

Either way, we still don't have an up to date cross meta data file in the release bucket.

Manubot workflow run failure

From https://github.com/malariagen/ag1000g-phase3-data-paper/actions/runs/355697703


The set-env command is deprecated and will be disabled on November 16th. Please upgrade to using Environment Files.

The add-path command is deprecated and will be disabled on November 16th. Please upgrade to using Environment Files.


GitHub Actions: Deprecating set-env and add-path commands
https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/

Improve results text: re sample QC

  • Improve flow of text
  • double check numbers because of exclusion of AG1000G-X samples.
  • possible sankey diagram nh produced a while ago, for discourse?

Population definitions suggestions

Not something that needs any discussion now, but might be worth talking at some point about how the population definitions are stored. Currently they're stored as either a YAML or CSV file that maps population IDs like "ANG_1_coluzzii_2009" to sets of sample IDs. Some thoughts:

  • Re the population IDs...
    • Suggest to stick with standard two-letter country codes like "AO" rather than custom country codes like "ANG".
    • Also it might be convenient to use some kind of site name or abbreviation, rather than a number, just to make it easier to remember. So, e.g., "ANG_1_coluzzii_2009" might become "AO_luanda_2009_coluzzii".
  • It would be useful to have a human-readable label for each population, and for this to be included in the population definitions somehow. These can then be used in tables and labelling plots etc. E.g., the label for "ANG_1_coluzzii_2009" would probably be something like "Angola, Luanda, 2009, An. coluzzii".
  • I wonder if it would be more convenient to store the queries that selected the samples, rather than the sample IDs. E.g., in the population definitions file, rather than listing sample IDs explicitly, give the sample set ID and the query that selected the samples.

E.g., population_definitions.yml could be something like:

BF_bana_2012_coluzzii:
  label: Burkina Faso, Bana, 2012, An. coluzzii
  samples:
    - sample_set: AG1000G-BF-A
      query: location = "Bana" and species_aim = "coluzzii"
BF_pala_2012_coluzzii:
  label: Burkina Faso, Pala, 2012, An. coluzzii
  samples:
    - sample_set: AG1000G-BF-A
      query: location = "Pala" and species_aim = "coluzzii"
# etc.

Error during Build Manuscript step in Manubot workflow, running filter pandoc-manubot-cite

https://github.com/malariagen/ag1000g-phase3-data-paper/pull/49/checks?check_run_id=1385871856
https://github.com/malariagen/ag1000g-phase3-data-paper/pull/51/checks?check_run_id=1381519389

[INFO] Running filter pandoc-manubot-cite
Traceback (most recent call last):
  File "/usr/share/miniconda3/envs/manubot/bin/pandoc-manubot-cite", line 8, in <module>
    sys.exit(main())
  File "/usr/share/miniconda3/envs/manubot/lib/python3.7/site-packages/manubot/pandoc/cite_filter.py", line 214, in main
    doc = pf.load(input_stream=args.input)
  File "/usr/share/miniconda3/envs/manubot/lib/python3.7/site-packages/panflute/io.py", line 58, in load
    doc = json.load(input_stream, object_hook=from_json)
  File "/usr/share/miniconda3/envs/manubot/lib/python3.7/json/__init__.py", line 296, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/share/miniconda3/envs/manubot/lib/python3.7/json/__init__.py", line 361, in loads
    return cls(**kw).decode(s)
  File "/usr/share/miniconda3/envs/manubot/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/share/miniconda3/envs/manubot/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
  File "/usr/share/miniconda3/envs/manubot/lib/python3.7/site-packages/panflute/elements.py", line 1353, in from_json
    return Doc(*items, api_version=api, metadata=meta)
  File "/usr/share/miniconda3/envs/manubot/lib/python3.7/site-packages/panflute/elements.py", line 66, in __init__
    raise TypeError("invalid api version", api_version)
TypeError: ('invalid api version', [1, 20])
Error running filter pandoc-manubot-cite:
Filter returned error status 1
Error: Process completed with exit code 83.

SNP discovery

  • Add text and numbers
  • Some leeway on precise numbers to report.
  • decide on informative figure- perhaps similar to private etc in phase 1

Confusing name in PCA data

Looking at the popstructure_count-alleles notebook I notice one of the species groups is called "gamb_colu_arab_pca" but this refers to data within the 2La inversion. Shouldn't this be called "gamb_colu_arab_2La" instead to avoid confusion?

Gene drive results

Unsure of complete context here: but I assume repeat of the phase 2 analyses.

Discussion of results.

ag3.py does not pull down the new crosses meta data

ag3.py pulls meta data from vo_agam_release/v3/metadata/general/AG1000G-X/samples.meta.csv.

The new crosses metadata with cross info lives at vo_agam_release/v3/metadata/crosses/samples.crosses.csv.

Should we point ag3.py at the new file or replace the original file with the new one - which would you prefer?

new crosses meta data has fewer samples than crosses genotypes

Crosses genotypes has data from 699 samples, as does the (old) crosses meta dats at vo_agam_release/v3/metadata/general/AG1000G-X/samples.meta.csv.

The new crosses meta, however, has only 519 rows.

Also, in the text we talk about 15 crosses, five of which are new. If I df.cross_id.unique() the new meta data I get 24 named crosses?

@hardingnj, any ideas what has happened here?

PCA results

PCA complete for gamb_colu only.
Still needs discussion of major results

  • PCA of arab
  • PCA of gamb_colu_arab
  • discussion of major results of the above.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.