Git Product home page Git Product logo

msstats's Introduction

msstats - read data from ms via stdin, calculate common summary statistics

Copyright (C) 2002 Kevin Thornton

msstats is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

Comments are welcome.

- Kevin Thornton <[email protected]>

Deprecated

The final release of this software is version 0.3.5, which is compatible with libsequence 1.9.2. No further updates will occur, and the GitHub repository is archived/read-only.

#Usage

[coalesent simulation] | msstats | gzip > stats.gz

"coalescent simulation is assumed to be any program that prints biallelic marker output in the format of Dick Hudson's ms (http://home.uchicago.edu/~rhudson1)

This format looks like this:

//
segsites: 2
positions: 0.1 0.9
01
10

The above is for 2 "SNPs" in a sample of size n = 2

#Installation

./configure

make

sudo make install

##If dependent libraries are in non-standard locations. For example, "/opt":

CXXFLAGS=-I/opt/include LDFLAGS="$LDFLAGS -L/opt/lib" ./configure

make

sudo make install

##Installing somewhere other than /usr/local/bin

./configure --prefix=/path/to/where/you/want/it

For example,

./configure --prefix=$HOME

make

make install

will result in msstats being in ~/bin.

##Notes on calculations for metapopulations.

If the input data contain ordered samples from multiple demes, you may pass that info to msstats as follows:

-I D n0 n1 ... nD,

where D is the number of demes and n0 is the sample size in the first deme, etc.

When -I is used, summary statistics will be calculated for each deme within each replicate. The columns "rep" and "pop" will tell you which deme in which replicate each statistic corresponds to.

The program will exit with an error if the sum of deme sizes does not equal the input sample size read from STDIN.

When the -I option is used, msstats will report Hudson, Slatkin, and Maddison's Fst statistic. The reference for this statistic is:

Hudson, Slatkin and Maddison (1992) Estimation of levels of gene flow from population data. Genetics 132:583-589

For all pairwise comparisions amongst the D demes, you will see output columns with headers hsmij, which is the Fst statistic calculated between demes i and j. These numbers will be repeated for each deme within each replicate. Thus, to get the actual distribution of Fst for a simulation, you should condition on the line of results for just one population. For example:

ms 30 100 -t 10 -I 3 10 10 10 1 | ./src/msstats -I 3 10 10 10 > output

Then, using R,

x=read.table("output",header=T)
mean(x$hsm01[x$pop==0])
[1] 0.3511204

#The output

##For a single population

There are 21 columns in the output:

  1. S = the number of "segregating sites", aka mutations. (Watterson, 1975)
  2. n1 = the number of singletons in the data. This is the number of mutations at frequency 1 and n-1 in the sample.
  3. next = the number of "external mutations" (sensu Fu) = the number of derived singletons.
  4. theta = Watterson's estimate of theta
  5. pi = "sum of site heterogzygosity" = sum of 2pq over the S sites. (Nei, Tajima, others).
  6. thetaH = Fay and Wu's estimator of theta. Their H statistic is pi - thetaH, hence no column for it.
  7. Hprime = Zeng et al.'s normalized Fay and Wu's H.
  8. tajd = Tajima's D
  9. fulif = Fu and Li's F
  10. fulid = Fu and Li's D
  11. fulifs = Fu and Li's F-star
  12. fulifds = Fu and Li's D-star
  13. rm = Hudson and Kaplan's Rmin
  14. rmmg = Myers and Griffiths simplest lowest bound on Rmin
  15. nhap = Number of distinct haplotypes. Depaulis and Veuille
  16. hdiv = haplotype diversity. Depaulis and Veuille
  17. wallB = Jeff Wall's B
  18. wallQ = Jeff Wall's Q
  19. rosarf = Ramos-Onsins and Rozas Rf statistic
  20. rosarf = Ramos-Onsins and Rozas Ru statistic
  21. zns = Kelly's Zns = average pairwise r-squared.

msstats's People

Contributors

molpopgen avatar kvnloo avatar

Stargazers

Daniel Goldstein avatar Daojun Yuan avatar keithmp avatar Felix Beaudry, Ph.D. avatar Wenbin Mei avatar Alisa Vershinina, PhD avatar Sheng Wang avatar Qiang Wang avatar Rajiv McCoy avatar

Watchers

Sheng Wang avatar  avatar  avatar

Forkers

rossibarra

msstats's Issues

issue compiling

I'm having trouble compiling msstats and I'm not familiar enough with c++ to troubleshoot. Do you have any suggestions?

Just running './configure' and 'make'

make[2]: Entering directory '/home/haekel/Desktop/msstats/src' g++ -DHAVE_CONFIG_H -I. -I.. -I/usr/local/include -DNDEBUG -O3 -Wall -W -g -O2 -std=c++11 -MT msstats.o -MD -MP -MF .deps/msstats.Tpo -c -o msstats.o msstats.cc msstats.cc: In function ‘int main(int, char**)’: msstats.cc:166:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] for(int i = 0 ; i < config.size() ; ++i) ^ msstats.cc:172:25: error: no matching function for call to ‘Sequence::SimData::assign(double*, unsigned int, __gnu_cxx::__alloc_traits<std::allocator<std::__cxx11::basic_string<char> > >::value_type*, __gnu_cxx::__alloc_traits<std::allocator<int> >::value_type&)’ &d[sum],config[i]); ^ In file included from /usr/local/include/Sequence/SimData.hpp:65:0, from msstats.cc:29: /usr/local/include/Sequence/PolyTable.hpp:188:10: note: candidate: bool Sequence::PolyTable::assign(Sequence::PolyTable::const_site_iterator, Sequence::PolyTable::const_site_iterator) bool assign(PolyTable::const_site_iterator beg, ^ /usr/local/include/Sequence/PolyTable.hpp:188:10: note: candidate expects 2 arguments, 4 provided /usr/local/include/Sequence/PolyTable.hpp:197:10: note: candidate: bool Sequence::PolyTable::assign(const std::vector<double>&, const std::vector<std::__cxx11::basic_string<char> >&) bool assign( const std::vector<double> & __positions, ^ /usr/local/include/Sequence/PolyTable.hpp:197:10: note: candidate expects 2 arguments, 4 provided /usr/local/include/Sequence/PolyTable.hpp:203:10: note: candidate: bool Sequence::PolyTable::assign(std::vector<double>&&, std::vector<std::__cxx11::basic_string<char> >&&) bool assign( std::vector<double> && __positions, ^ /usr/local/include/Sequence/PolyTable.hpp:203:10: note: candidate expects 2 arguments, 4 provided msstats.cc:175:34: error: ‘RemoveInvariantColumns’ was not declared in this scope RemoveInvariantColumns(&d2); ^ msstats.cc:65:7: warning: unused variable ‘total’ [-Wunused-variable] int total = std::accumulate(config.begin(),config.end(),0,plus<int>()); ^ msstats.cc: In function ‘void calcstats(const Sequence::SimData&, const unsigned int&)’: msstats.cc:222:85: error: invalid initialization of reference of type ‘const bool&’ from expression of type ‘std::vector<double>’ while( Recombination::Disequilibrium(&d,LDSTATS,&site1,&site2,false,0,mincount)) ^ In file included from /usr/local/include/Sequence/PolySNP.hpp:80:0, from /usr/local/include/Sequence/PolySIM.hpp:41, from msstats.cc:30: /usr/local/include/Sequence/Recombination.hpp:96:38: note: in passing argument 2 of ‘std::vector<Sequence::PairwiseLDstats> Sequence::Recombination::Disequilibrium(const Sequence::PolyTable*, const bool&, const unsigned int&, const unsigned int&, double)’ std::vector<PairwiseLDstats> Disequilibrium( ^ Makefile:406: recipe for target 'msstats.o' failed make[2]: *** [msstats.o] Error 1 make[2]: Leaving directory '/home/haekel/Desktop/msstats/src' Makefile:352: recipe for target 'all-recursive' failed make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory '/home/haekel/Desktop/msstats' Makefile:292: recipe for target 'all' failed make: *** [all] Error 2

I also tried this code but it didn't help.
./configure CC=gcc CXXFLAGS='-Wno-sign-compare'

markdown formatting

line ending in markdown cause typo in readme.
ms format is not:

// segsites: 2 positions: 0.1 0.9 01 10

but instead:

//
segsites: 2
positions: 0.1 0.9
01
10

Cannot compile msstats on mi MacBook

Dear Kevin,

I am trying to install msstats in my MacBook Pro Yosemite 10.10.5 but I am unable to do it. I don't have a computational background and I don't have any C++ knowledge.

I followed the instructions provided in GitHub: cd into the msstats directory and run ./configure. I get this:

Lucass-MacBook-Pro:msstats-master lucas$ ./configure
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
/Users/lucas/Dropbox/puc/Softwares/msstats-master/missing: Unknown --is-lightweight' option Try /Users/lucas/Dropbox/puc/Softwares/msstats-master/missing --help' for more information
configure: WARNING: 'missing' script is too old or missing
checking for a thread-safe mkdir -p... ./install-sh -c -d
checking for gawk... no
checking for mawk... no
checking for nawk... no
checking for awk... awk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking for style of include used by make... GNU
checking dependency style of gcc... gcc3
checking for an ANSI C-conforming const... yes
checking for g++... g++
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking dependency style of g++... gcc3
checking whether g++ supports C++11 features by default... no
checking whether g++ supports C++11 features with -std=c++11... yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking for main in -lgslcblas... no
GSL BLAS runtime library not found

Also, when I write "make":

Lucass-MacBook-Pro:msstats-master lucas$ make
make: *** No targets specified and no makefile found. Stop.

Or "make install" or "sudo make install" I get:

Lucass-MacBook-Pro:msstats-master lucas$ make install
make: Nothing to be done for `install'.

Lucass-MacBook-Pro:msstats-master lucas$ sudo make install
Password:
make: Nothing to be done for `install'.

Do you have any idea of why I am unable to compile msstats?

Thank you very much in advance
Best regards,
Lucas

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.