Table Of Contents

Previous topic

1. Introduction

Next topic

3. Workflows

This Page

2. Data

We make a number of data files available (temporarily) on this site. The data files are as follows.

2.1. uce_probe.sqlite.bz2

This is a bzip archive of an sqlite database containing the data we used during our probe design process and the probes that results from this process. The tables are:

  • cons - this table contains matches that we identified in alignments (MAF) between chicken and lizard using the initial version (0.1) of the genome/summary.py`code in this repository (see :ref:`tagging below).
  • blast - this table contains results from a blast of the matches from cons onto the zebra finch genome. For these matches we used the 0.1 tagged genome/summaryBlast.py code
  • gallus_refseq - this is a table of the refseq genes in galGal3. it would likely be best to download a recent version of these data from NCBI or UCSC rather than using what is here for anything.
  • probes - this is a table of the probes that we designed from the UCEs we located in chicken, lizard, and zebra finch. We designed these probes using design/sure_select_tiler.py

2.2. uce_bed.bz2

This is a BED format file of the UCEs we located that are shared between chicken, lizard, and zebra finch. The locations within this file are relative to chicken (galGal3).

2.3. uce_probes.bed.bz2

This is a BED format file of the locations of the probes we designed. The locations within this file are relative to chicken (galGal3).

2.4. db1.sql.bz2

This is a bzipped dump file of the mysql database in which we stored our alignments of each probe sequence to other genomes. Initially and for error checking purposes, we conducted the align, clean, and insert steps by hand (as seen in Simulation:steps). This process has now largely been automated in future:run_mutiple_lastz.py. The tables are:

2.4.1. group data tables

  • all_orgs - this table contains data from all of the matches to individual species listed in probe matches to individual species

  • cons - this table is identical to the cons table from uce_probe.sqlite. it is repeated here for ease of use

  • group_XAB_5 - this contains the list of probes found in all members of

    • monDom5
    • loxAfr3
    • choHof1
    • hg19
    • mm9
  • group_Bats_5 - this contains the list of probes found in all members of

    • eriEur1
    • equCab2
    • bosTau4
    • canFam2
    • pteVam1
  • group_Bats_7 - this contains the list of probes found in all members of

    • eriEur1
    • equCab2
    • bosTau4
    • canFam2
    • pteVam1
    • myoLuc1
    • ailMel1
  • group_Bats_8 - this contains the list of probes found in all members of

    • eriEur1
    • pteVam1
    • myoLuc1
    • bosTau4
    • vicPac1
    • equCab2
    • canFam2
    • ailMel1
  • group_Bats_12 - this contains the list of probes found in all members of

    • eriEur1
    • pteVam1
    • myoLuc1
    • bosTau4
    • vicPac1
    • susScr9
    • oviAri1
    • turTru1
    • equCab2
    • canFam2
    • felCat3
    • ailMel1
  • group_elephants_7 - this contains the list of probes found in all members of
    • loxAfr3
    • canFam2
    • echTel1
    • choHof1
    • dasNov2
    • hg19
    • monDom5
  • group_size_19 - this contains the list of probes found in all members of

    • anoCar2
    • bosTau4
    • calJac3
    • canFam2
    • cavPor3
    • chinese
    • equCab2
    • gorGor3
    • hg19
    • korean
    • loxAfr3
    • mm9
    • monDom5
    • oryCun2
    • panTro2
    • ponAbe2
    • rheMac2
    • taeGut1
    • venter
  • group_size_23 - this contains the list of probes found in all memebrs of

    • bosTau4
    • canFam2
    • cavPor3
    • chinese
    • dasNov2
    • echTel1
    • equCab2
    • gorGor3
    • hg19
    • korean
    • loxAfr3
    • mm9
    • monDom5
    • ochPri2
    • oryCun2
    • panTro2
    • ponAbe2
    • pteVam1
    • rn4
    • speTri1
    • tarSyr1
    • tupBel1
    • venter
  • group_size_25 - this contains the list of probes found in all members of

    • anoCar2
    • bosTau4
    • calJac3
    • canFam2
    • cavPor3
    • chinese
    • dipOrd1
    • equCab2
    • gorGor3
    • hg19
    • korean
    • loxAfr3
    • mm9
    • monDom5
    • ornAna1
    • oryCun2
    • panTro2
    • ponAbe2
    • pteVam1
    • rheMac2
    • rn4
    • taeGut1
    • tarSyr1
    • venter
    • vicPac1
  • group_size_29 - this contains the list of probes found in all members of

    • anoCar2
    • bosTau4
    • calJac3
    • canFam2
    • cavPor3
    • chinese
    • choHof1
    • dipOrd1
    • echTel1
    • equCab2
    • eriEur1
    • gorGor3
    • hg19
    • korean
    • loxAfr3
    • mm9
    • monDom5
    • ornAna1
    • oryCun2
    • panTro2
    • ponAbe2
    • pteVam1
    • rheMac2
    • rn4
    • taeGut1
    • tarSyr1
    • tupBel1
    • venter
    • vicPac1
  • probe_distribution - this contains a binary “matrix” indicating presence/absence (1/0) of probe matches by species

  • probes - this is a table providing the ids of the probes we designed

  • species - this table provides information on the genome build of each organism to which we aligned probes

  • sureselect - this table is identical to the probes table from uce_probe.sqlite

2.4.2. probe matches to individual species

These tables provide the individual matches of probes to difference organisms, as found by using lastz to align our probe sequences to the individual genomes. We provide the build version, name, etc. of each genome sequence in the species table of the database dump described in group data tables.

  • ailMel1
  • anoCar2
  • bosTau4
  • calJac3
  • canFam2
  • cavPor3
  • chinese
  • choHof1
  • danRer6
  • dasNov2
  • dipOrd1
  • echTel1
  • equCab2
  • eriEur1
  • felCat3
  • gasAcu1
  • gorGor3
  • hg19
  • korean
  • loxAfr3
  • macEug1
  • micMur1
  • mm9
  • monDom5
  • myoLuc1
  • ochPri2
  • ornAna1
  • oryCun2
  • otoGar1
  • oviAri1
  • panTro2
  • ponAbe2
  • proCap1
  • pteVam1
  • rheMac2
  • rn4
  • sorAra1
  • speTri1
  • susScr9
  • taeGut1
  • tarSyr1
  • tetNig1
  • tupBel1
  • turTru1
  • venter
  • vicPac1
  • xenTro2