2022.01.16 00:34

Ssr locator tool for simple sequence repeat discovery

Thiel, N. Stein, P. Langridge, and A. Molecular Biology Letters, vol. Altschul, W. Gish, W. Miller, E. Myers, and D. Kantety, M. La Rota, D. Matthews, and M. Sorrells, J. Parida, K. Anand Raj Kumar, V. Dalal, N. Singh, and [13] A. Smit, R. Hubley, and P. Green, RepeatMasker Open- T. La Rota, R. Kantety, J. Yu, and M. Sorrells, —, Castelo, W. Martins, and G. BMC Genomics, vol. Yu, T. Dake, S. Singh, et al.

Thiel, W. Michalek, R. Varshney, and A. Nicot, V. Chiquet, B. Gandon, et al. Temnykh, G. DeClerck, A. Lukashova, L. Lipovich, S. Cart- , no.

Asp, U. Frei, T. Didion, K. Nielsen, and T. Rozen and H. Zhang, D. Yuan, S. Yu, et al. Needleman and C. The algorithm consists in reading the file generated by the previous module SSR locus, forward and reverse primers, and original amplicon , followed by a search of sequences containing primer annealing sites.

For the global alignment between paralog and original amplicon sequences and score calculations match, mismatch, gaps , a routine was written in Delphi language using the algorithms of Needleman and Wunsch [ 20 ] and Smith and Waterman [ 21 ]. Also, in the same module, amplicon identities were calculated according to Waterman [ 22 ] and Vingron and Waterman [ 23 ].

The strategy of creating a two-language hybrid program was established as a function of: i the higher speed achieved by handling large text files with Perl as compared to Delphi,and ii the better fitness of Perl for generating combinatory strings to be located. The Perl module was transformed into an executable file, making unnecessary to install Perl libraries during program installing.

The graphic interface built, integrating input and output windows to the Windows operational system, was obtained using the Suite Turbo Delphi, where a menu system executes calls for each of the previously described modules. A total of 28 rice Oryza sativa ssp. A flow chart representing the different steps performed by the software is shown in Figure 1.

Flow-chart showing the functional structure of SSR Locator. A Perl script to search SSRs; B text file where information from detected SSRs is stored; C module for the statistical calculations for SSR motif occurrence; D module that formats text files into standard Primer3 input files; E running of Primer3; F module for running Virtual-PCR using a second sequence file as a template ; G module performing global alignment between homologous amplicons; H identity and alignment score calculations between homologous amplicons; and I file containing SSR, primer, homologous amplicons, identity, and score information.

The mono-, 4-mer, 6-mer, 7-mer, 8-mer, 9-mer, and mer repeats were identical for the three programs. Considering the fl -cDNA sequences, in In sequences, two loci were detected, in seven sequences three lociandonly one sequence had four loci, adding up to occurrences. Among the types analyzed, SSRs mono to 6-mer repeats and minisatellites 7- to mer repeats comprised The distribution of occurrences detected by SSRLocator was consisted of monomers, 2-mers, 3-mers, 4-mers, 5-mers, 6-mers, 82 7-mers, 6 8-mers, 25 9-mers, and 5 mers, corresponding to rates of 3.

In nonredundant transcripts from the TIGR database, Overall occurrences of Frequencies closer to those found in this study were described for CDS regions of Rosaceaespecies, with an average of For monomer, 2-mer and 3-mer repeats, all possible arrangements are shown, while for 4-mer to mer repeats, only the ten most frequent motifs are shown.

In the overall distribution, the monomers represent 3. In maize, barley, rice, sorghum, and wheat ESTs, the motif AG was described as the most frequent [ 6 , 16 , 28 , 29 , 31 , 32 ]. However, in some studies, the most frequent motif was GA [ 30 , 33 ]. Repeats composed by guanine and cytosine were the most abundant among trimers, with occurrences of Many reports indicate the 3-mer CCG as the most frequent in maize, barley, wheat, sorghum and rye [ 6 , 16 , 28 , 32 ], sugarcane [ 27 ] and rice [ 29 , 31 ].

Among 4-mers, different arrangements were found, where the motifs GATC 7. These motifs add up to For all remaining repeats minisatellites , the occurrences are widely distributed with low-percentage values for each arrangement.

For 7-mer, 8-mer, 9-mer, and mer repeats, the totals of occurrences were 57, 5, 23, and 5, respectively. A module in SSRLocator checks for primer redundancy. A total of primer pairs amplified only the fragment from its original locus specific amplicons and pairs amplified one or more regions besides the original locus.

From these, pairs amplified two fragments, one from the original site and a second from another region paralogous. In this case, specific amplicons plus redundant amplicons, were detected. A total of , 90, 2, and 5 primer pairs generated three two redundancies , four three redundancies , five four redundancies , and six five redundancies fragments, respectively.

The final product of primers with more than one anchoring region resulted in specific amplicons and redundant amplicons, adding up to fragments. To investigate the ability of these primers in amplifying genomic sequences, an extra experiment was performed against the whole rice genomic sequence available at NCBI.

The different groups of redundant and nonredundant primer sets, that is, amplifying one, two, three, or more times in the cDNA database, were tested against the genomic sequence. From the nonredundant primers, only amplified a locus in the genomic sequence. This difference was already expected because of difficulties in amplifying genomic regions, that is, if some primers anneal to a boundary region between two exons in the cDNA, the presence of introns would make this annealing site no more available.

Only one primer set did amplify more than two loci. These results indicate that SSR locator performance was consistent between the two databases regarding the nonredundant loci, that is, from those loci that were able to be amplified in both databases, their status of nonredundant was maintained. The changes observed for the redundant loci can be attributable to many causes, including redundancy in the cDNA database, but also to biological reasons due to primer positioning.

Results of a global alignment between amplicons from original and redundant sites are shown in Table 3. Among the redundant amplifications, The fact that such a high percentage of redundant loci show high identity is probably a consequence of the genome fraction chosen, that is, expressed sequences. This fraction is under tight selection pressure and should not accumulate variations such as substitutions or indels at a high rate.

As expected, comparisons to whole genome, generated a great deal of polymorphism, due to the inclusion of intronic regions in the alignments data not shown. Lu, Z. Zhou, and Z. View at: Google Scholar M. Subramanian, R. Mishra, and L. R13, Varshney, A. Graner, and M. Bilgen, M. Karaca, A.

Onus, and A. Pearson and D. Altschul, W. Gish, W. Miller, E. Myers, and D. Smit, R. Hubley, and P. Green, RepeatMasker Open Castelo, W. Martins, and G. Thiel, W. Michalek, R. Varshney, and A. Temnykh, G. DeClerck, A. Lukashova, L. Lipovich, S. Cartinhour, and S. View at: Google Scholar S. Rozen and H. Needleman and C. Smith and M. Series B , vol. Vingron and M. Kikuchi, K. Satoh, T. Nagata et al. Cardle, L. Ramsay, D. Milbourne, M. Macaulay, D. Marshall, and R. Jung, A. Abbott, C. Jesudurai, J.

Tomkins, and D. Cordeiro, R. Casu, C. McIntyre, J. Manners, and R. Varshney, T. Thiel, N. BMC Bioinformatics , , 15 Sep Contact us. Europe PMC requires Javascript to function effectively. Recent Activity. Search life-sciences literature Over 39 million articles, preprints and more Search Advanced search. This website requires cookies, and the limited processing of your personal data in order to function. By using the site you are agreeing to this as outlined in our privacy notice and cookie policy.

Search articles by 'Luciano Carlos da Maia'. Palmieri DA ,. Kopp MM ,. Costa de Oliveira A. Affiliations 1 author 1. Share this article Share with email Share with twitter Share with linkedin Share with facebook. Abstract Microsatellites or SSRs simple sequence repeats are ubiquitous short tandem duplications occurring in eukaryotic organisms. Free full text. Int J Plant Genomics. Published online Jul PMID: Author information Article notes Copyright and License information Disclaimer. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article has been cited by other articles in PMC. Go to:. Algorithms The algorithms used for the searches, alignment, and homology estimates are described separately. Primer design An algorithm written in Delphi language performs calls to Primer3 [ 19 ], which execute primer designs. Global alignment For the global alignment between paralog and original amplicon sequences and score calculations match, mismatch, gaps , a routine was written in Delphi language using the algorithms of Needleman and Wunsch [ 20 ] and Smith and Waterman [ 21 ].

Implementation The strategy of creating a two-language hybrid program was established as a function of: i the higher speed achieved by handling large text files with Perl as compared to Delphi,and ii the better fitness of Perl for generating combinatory strings to be located. Sequences for analysis A total of 28 rice Oryza sativa ssp. Open in a separate window. Figure 1. Identity between specific and redundant amplicons Results of a global alignment between amplicons from original and redundant sites are shown in Table 3.

Table 3 Distribution of amplicon alignments for specific and redundant amplicons with varying identity levels. Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nature Genetics. DNA polymerase III proofreading mutants enhance the expansion and deletion of triplet repeat sequences in Escherichia coli.

Journal of Biological Chemistry. Ellegren H. Microsatellites: simple sequences with complex evolution. Nature Reviews Genetics. Mirkin SM.

DNA structures, repeat expansions and human hereditary disorders. Current Opinion in Structural Biology. Analysis on frequency and density of microsatellites in coding sequences of several eukaryotic genomes.

Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions. Genome Biology. Genic microsatellite markers in plants: features and applications. Trends in Biotechnology. A software program combining sequence motif searches with keywords for finding repeats containing DNA sequences. Improved tools for biological sequence comparison. Basic local alignment search tool. Journal of Molecular Biology. Abajian C. RepeatMasker Open Benson G.

Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research. TROLL—tandem repeat occurence locator. Theoretical and Applied Genetics. Computational and experimental analysis of microsatellites in rice Oryza sativa L. Genome Research. Schuler GD. Sequence mapping by electronic PCR. Rozen S, Skaletsky H. Primer3 on the WWW for general users and for biologist programmers.

anmenpelgre1972's Ownd

0コメント

1000 / 1000