General Genomic Information

You’ll need to know which cutter you are using, so you can pull/use the appropriate barcode and splitting scripts. Basically, each sample/file will end up labeled with a Biotin Plate barcode, and an individual Well barcode.

These will look something like (for Sbf1):

  • PlateBarcode-CC_CCTGCA_GG_FRAGMENT

Cutters

The two cutters we primarily use are:

  • Six cutter (5' - CTGCA_G - 3') Pst1
  • Eight cutter (5' - CCTGCA_GG - 3') Sbf1

Q Phred Quality scores

Initial files are .fastq because they contain Q quality scores.

  • there are 4 lines per each individuals sequence
  • score is series of characters->numbers->CAPletters->lowercaseletters lowest->highest
  • Q represents the quality score of nucleotides generated by automated DNA seq
    • Q = -10log10 P where P = base calling error probability
    • 50 would be 99.999%; 20 would be 99%; 10 would be 90%
    • Most common is +33

Add Metadata/SiteNames (Optional)

If you want to add metadata to your filenames, you can do that here, but it is optional. A metadata file should contains the name of each individual, its location on the plate, etc. This part can be done now or later. But be absolutely sure that your indiv names, their plate location (A1, B1, etc.), and the appropriate barcode are all in sync. For example, the barcodes might be A1 A2 A3 (Across the plate row) while your sample names are A1 B1 C1 (Down the plate column)