Background Reference




Дата канвертавання20.04.2016
Памер43.89 Kb.
GenBank Research Reference Overviews

Background Reference

General Strategies Reference

Potential Research Reference

Syntax Reference

Semantics Reference

Redundancy Reference

Inconsistency Reference

Irrelevancy Reference

Development Reference

Others

Background Reference
GenBank (1999),Dennis A. Benson, Mark S. Boguski, David J. Lipman, James Ostell, B. F. Francis Ouellette, Barbara A. Rapp, et al. Nucleic Acids Research

http://citeseer.nj.nec.com/516025.html

http://www.psc.edu/general/software/packages/genbank/genbank.html

http://www.cas.org/ONLINE/DBSS/genbankss.html

http://www.bio-mirror.net/srs6bin/cgi-bin/wgetz?-page+LibInfo+-lib+GENBANK



http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html
Data cleaning paper and research group

http://www.dbis.informatik.hu-berlin.de/research/bioinformatics/papers/data_cleansing.html
Genbank Documentation

http://www.genome.ad.jp/dbget-bin/show_man?genbank


Sample records

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=Nucleotide&term=L00727[pacc]&doptcmdl=GenBank

http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html

http://www.cas.org/ONLINE/DBSS/genbankss.html


Bad data warning over public gene databases

http://www.itworld.com/Tech/2987/020506genedatabase/pfindex.html

journal article talking about the necessity of cleanup of Genbank
other BioDB collections

GeneDB (curated) http://www.genedb.org/genedb/navHelp.jsp

Swiss-Prot (curated)



Peter Sterk and Stephan Beck

The Up-to-Date Status of Major Genome Sequencing Projects: The Genome MOT


http://www2.ebi.ac.uk/embnet.news/vol5_2/EMBnet-MOT.html
GenBank (1999),Dennis A. Benson, Mark S. Boguski, David J. Lipman, James Ostell, B. F. Francis Ouellette, Barbara A. Rapp, et al. Nucleic Acids Research

http://citeseer.nj.nec.com/516025.html
Pursuant to agreements made at their 2002 Collaborative Meeting,

DDBJ/EMBL/GenBank have undertaken the collection of a new class of

sequence data : Third-Party Annotation (TPA).

\Document GenBank.htm


In order to assure that the sequence annotation is of high quality,

it is required that TPA records be associated with a study published

in a peer-reviewed journal before the data is released to the public.

\Document GenBank.htm

FASTA format description

http://www.ncbi.nlm.nih.gov/BLAST/fasta.html


>gi|22136741|gb|AY133756.1| Arabidopsis thaliana clone U18350 putative copper/zinc superoxide dismutase (At2g28190) mRNA, complete cds

ATGGCTGCCACCAACACAATCCTCGCATTCTCATCTCCTTCTCGTCTTCTCATTCCTCCTTCCTCCAATC

CTTCAACTCTCCGTTCCTCTTTCCGCGGCGTCTCTCTCAACAACAACAATCTCCACCGTCTCCAATCTGT

TTCCTTCGCCGTTAAAGCTCCGTCGAAAGCGTTGACAGTTGTTTCCGCGGCGAAGAAGGCTGTTGCAGTG

CTTAAAGGTACTTCTGATGTCGAAGGAGTTGTTACTTTGACCCAAGATGACTCAGGTCCTACAACTGTGA

ATGTTCGTATCACTGGTCTCACTCCAGGGCCTCATGGATTTCATCTCCATGAGTTTGGTGATACAACTAA

TGGATGTATCTCAACAGGACCACATTTCAACCCTAACAACATGACACACGGAGCTCCAGAAGATGAGTGC

CGTCATGCGGGTGACCTGGGAAACATAAATGCCAATGCCGATGGCGTGGCAGAAACAACAATAGTGGACA

ATCAGATTCCTCTGACTGGTCCTAATTCTGTTGTTGGAAGAGCCTTTGTGGTTCACGAGCTTAAGGATGA

CCTCGGAAAGGGTGGCCATGAGCTTAGTCTGACCACTGGAAACGCAGGCGGGAGATTGGCATGTGGTGTG

ATTGGCTTGACGCCGCTCTAAGTCAGAGGCTAAGCAAGTACTCTTATGTCTA
A New File Format and Tools for the Large-Scale DataSubmission to DNA Data Bank of Japan (DDBJ)

recomb2000.ims.u-tokyo.ac.jp/Posters/pdf/31.pdf


Data Sequence Data Sequence Databases Genbank

genome.microbio.uab.edu/MIC753/files/04_Data.pdf


Entrez based resource http://www.sdsc.edu/pb/edu/pharm207/4/

steps and tips to download GenBank

sdmc.krdl.org.sg:8080/kleisli/psZ/biokleisli-tutorial5.ps.gz


NCBI's Genome Annotation Pipeline



www.sanger.ac.uk/HGP/havana/docs/ncbi.ppt

Biologic database fundamental

http://www.ii.uib.no/bio/seminars/sem97db

The BioCatalog

http://corba.ebi.ac.uk/Biocatalog/Database_and_analysis.html


Dr Ian Collet, bioinformatics lecturer at Queensland University of Technology


More than 71% of all GenBank entries and 40% of the individual nucleotides in the database are derived EST sequences

Schuler, G.D. 1997. Pieces of the puzzle: expressed sequence tags and the catalog of human genes. J. Mol. Med. 75:694-698.




General Strategies Reference

http://bioinfo.pl/

rich link to resource----http://bioinfo.pl/index.php?page=html/links.html

related tool:

http://bioinfo.pl/links/tools.html
Bioinformatics Laboratory,

BioInfoBank Institute

BioInfo.PL is the home page of a group of Polish scientists working in the field of Bioinformatics. The site is meant to promote our scientific and academic activity. It contains several useful bioinformatics links and local services focused mainly on the prediction and analysis of the structure and function of proteins or genes.
http://metalife.online.bg

http://metalife.orbitel.bg/
In the beginning of the year 2002 a team of biologists and programmers launched new FREE bioinformatics resource. This site offers:

- collected information in searchable databases [incl. GBK, SPRT, PIR and many of major databases available];

- Algorithms [Blast, ClustlW, 3D modeler, 2D Prediction and many others]

- User can save their files generated by algorithms and search processes.


Servers are placed in Bulgaria, at the following address: http://metalife.online.bg
DNannotator (Chunyu Liu, 2001)

Tools for integration of annotation for regional genomic sequences

Special uses of terms by DNannotator

Annotation: Used in its narrow sense meaning mapping of features to genomic DNA sequences.

Customized: Users supply their own annotation source data, such as SNPs, genes, STSs, oligos etc., and their preferred target gDNA sequence for annotation.

High Throughput: Maps batches of source data (prepared by users) onto one gDNA sequence.

Genomic region: A genomic region sized < ~ 30 Mb. DNannotator is a supplement to public annotation efforts such as NCBI's Map Viewer, UCSC's Genome Browser or Sanger's Ensembl. The user can merge annotation from all sources of public annotation, and from his own findings, onto the genomic region of interest.

http://sky.bsd.uchicago.edu/Overview.htm



Potential Research Reference
R. Apweiler, P. Kersey, V. Junker, A. Bairoch (AKJB01)

Technical comment to "Database verification studies of SWISS-PROT and GenBank" by Karp et al.

Bioinformatics, 2001, 17, 6, 533-534


P.D. Karp, S. Paley, J. Zhu (KPZ01)

Database verification studies of SWISS-PROT and GenBank.

Bioinformatics, 2001, 17, 6, 526-532
Late-Night Thoughts on the Sequence Annotation Problem

Sarah J. Wheelan and Mark S. Boguski

sullivan.bu.edu/kasif/seminar/rosetta-168.pdf

Syntax Reference

Sequence tools

GI Rerieval - A script to extract GI numbers from BLAST output

Batch Entranz - Get GenBank records using GI

Name Formateer - Format GenBank DEFINITION entry

NN - Secondary structure prediction. NOTE: This method is in developement so confidence is very limited.

GB Format - Gene Bank data formating

get UNF - Get sequence from unfinished genomes


related tool:

http://bioinfo.pl/links/tools.html


GenBank tool

http://corba.ebi.ac.uk/Biocatalog/Database_and_analysis.html


Genome Project Submission Account guidelines

http://www.sander.embl-ebi.ac.uk/Services/GenomeSubm/#step5
Comments and tips for Genbank java XML based parsers: BioJava, SUN’s JAXP API, jaxp.jar, parser.jar, crimson.jar, Xerces

http://www.biojava.org/pipermail/biojava-l/2002-February/002230.html

http://www.biojava.org/pipermail/biojava-l/2002-February/002232.html



td2@sanger.ac.uk

http://www.sanger.ac.uk/
Genbank parser BioPython problem

http://biopython.org/pipermail/biopython-dev/2002-January/000810.html


Genbank parser BioPerl problem

http://bioperl.org/pipermail/bioperl-l/2003-February/011022.html

archive.develooper.com/beginners@perl.org/ msg41005.html

news.gmane.org/ thread.php?group=gmane.comp.lang.perl.bio.general


general genbank parser in perl

www.stanford.edu/class/gene211/PS2_2003.pdf


GenBank tool Genquire

http://bioinformatics.org/pipermail/genquire-users/2002-January/000015.html


Sequin is a stand-alone software tool developed by the NCBI for submitting and updating entries to the GenBank, EMBL, or DDBJ sequence databases. It is capable of handling simple submissions which contain a single short mRNA sequence, and complex submissions containing long sequences, multiple annotations, segmented sets of DNA, or phylogenetic and population studies.

http://www.ncbi.nlm.nih.gov/Sequin/


Data cleanup before submitting to GenBank .

http://www-shgc.stanford.edu/Seq/doepages/methodology.html

Semantics Reference


PubCrawler - Automated Retrieval of PubMed and GenBank Reports

http://pubcrawler.gen.tcd.ie/pubcrawler_pod.html




Redundancy Reference

SPTR - A comprehensive, non-redundant and up-to-date view of the protein sequence world



http://www.dl.ac.uk/CCP/CCP11/newsletter/vol2_3/sptr.html
J. Gorodkin, C. Zwieb, B. Knudsen (GZK01)

Semi-automated update and cleanup of structural RNA alignment databases.

Bioinformatics, 2001, 17, 7, 642-645

http://www.birc.dk/Publications/Articles/Gorodkin_2001c.html

http://www.bioinf.au.dk/rnadbtool/

www.bioinf.kvl.dk/~gorodkin/record/Papers/rnadbtool/rnadb_long_final.ps

http://www.informatik.uni-trier.de/~ley/db/journals/bioinformatics/bioinformatics17.html
DNannotator (Chunyu Liu, 2001)

http://sky.bsd.uchicago.edu/Overview.htm



CLEANUP (Grillo G., Attimonelli M., Liuni S., and Pesole G.)

Grillo, G., Attimonelli, M., Liuni, S., and Pesole G. (1996). CABIOS 12, 1-8.

CLEANUP: a fast computer program for removing redundancies from nucleotide sequence databases


http://embnet.angis.org.au/vol3_2/software.html

http://www2.ebi.ac.uk/embnet.news/vol5_2/EMBnet-MOT.html


NRDB (Warren Gish )

ftp://ncbi.nlm.nih.gov/pub/nrdb


ICAass (Jeremy Parsons)

ICAtools: Medium-to-large scale DNA sequencing analysis

http://www.littlest.co.uk/software/bioinf/old_packages/icatools/

http://www.littlest.co.uk/software/bioinf/index.html




Inconsistency Reference



DNannotator (Chunyu Liu, 2001)

http://sky.bsd.uchicago.edu/Overview.htm


A utility that prepares raw DNA sequence fragments for sequence assembly. This sequence cleanup program includes quality assessment, confidence reassurane, vector trimming and vector removal. Software tool is available freely

http://www.cs.jhu.edu/~salzberg/appendixa.html
M.Y. Galperin, E.V. Koonin (GaKo98)

Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement, and operon disruption.

In Silico Biology, 1998
S.E. Brenner (Bre99)

Errors in genome annotation

Trends in Genetics, 1999, 15, 4, 132-133
A. Felsenfeld, J. Peterson, J. Schloss, M. Guyer (FPSG99)

Assessing the quality of the DNA sequence from The Human Genome Project.

Genome Research, 1999, 9, 1-4
C. Medigue, M. Rose, A. Viari, A. Danchin (MRVD99)

Detecting and Analyzing DNA sequencing errors: Toward higher quality of the Bacillus subtilis genome sequence.

Genome Research, 1999, 9, 1116-1127
P. Bork (Bor00)

Power and pitfalls in sequence analysis: The 70% hurdle

Genome Research, 2000, 10, 398-400
R. Guigo, P. Agarwal, J.F. Abril, M. Burset, J.W. Fickett (GAABF00)

An assessment of gene prediction accuracy in large DNA sequences.

Genome Research, 2000, 10, 1631-1642
D. Devos, A. Valencia (DeVa01)

Inrinsic errors in genome annotation.

Trends in Genetics, 2001, 17, 8, 429-431
C. Médigue, M. Rose, A. Viari, and A. Danchin
Detecting and Analyzing DNA Sequencing Errors: Toward a Higher Quality of the Bacillus subtilis Genome Sequence
Genome Res., November 1, 1999; 9(11): 1116 - 1127.

Graziano Pesole, Sabino Liuni, Giorgio Grillo and Cecilia Saccone


UTRdb: a specialized database of 54- and 34-untranslated regions of eukaryotic mRNAs

bighost.area.ba.cnr.it/BioWWW/PDF/NARUTRdb1998.pdf


J. Posfai, R.J. Roberts (PoRo92)

Finding errors in DNA sequences.

Proc. Natl. Acad. Sci. USA, 1992, 89, 4698-4702
J.-M. Claverie (Cla93)

Detecting frame shifts by amino acid sequence comparison.

J. Mol. Biol., 1993, 234, 1140-1157
G.A. Fichant, Y. Quentin (FiQu95)

A frameshift error detection algorithm for DNA sequencing projects.

Nucleic Acid Research, 1995, 23, 15, 2900-2908
S. Schweigert, P.V.G. Herde, P.R. Sibbald (SHS95)

Issues in incorporation semantic integrity in molecular biological object-oriented databases.

Comp. Appl. Biosci., 1995, 11, 4, 339-347
P. Bork, A. Bairoch (BoBa96)

Go hunting in sequence databases but watch out for the traps.

Trends in Genetics, 1996, 12, 10, 425-427
U. Bhatia, K. Robinson, W. Gilbert (BRG97)

Dealing with Database Explosion: A cautionary note.

Science, 1997, 276, 1724-1725


Irrelevancy Reference

http://www.birc.dk/Publications/Articles/Gorodkin_2001c.html

http://www.bioinf.au.dk/rnadbtool/

www.bioinf.kvl.dk/~gorodkin/record/Papers/rnadbtool/rnadb_long_final.ps

http://www.informatik.uni-trier.de/~ley/db/journals/bioinformatics/bioinformatics17.html
QIAGEN product line

PCR (Polymerase Chain Reaction) cleanup

Gel extraction, enzymatic reaction cleanup

Nucleotide removal

Dye-terminator removal.

http://www.qiagen.com/literature/index.asp


reaction cleanup

A concise guide to cDNA Microarray analysis, biotechniques, 29(3), sept. 2000,548-562

BiotechniquesCookbook.pdf
Qbio Gene product line

Genclean.

http://www.qbiogene.com/products/geneclean/geneclean-overview.shtml
Perkinelmer product line

MultiPROBE

lifesciences.perkinelmer.com/
Promega

MagneSil™ Sequencing CleanUp

www.promega.com/
MoBio

Ultra Clean PCR Cleanup kit (MoBio Laboratories), free kit

http://www.mobio.com/

Development Resource

Development http://www.bioinformatics.org/bradstuff/bp/api/Bio/GenBank/


ftp://area.ba.cnr.it/pub/embnet/software
A set of Unix utilities called filtersites for genome data manipulating or cleanup processing was found on

http://bioweb.pasteur.fr/docs/softgen.html#FILTERSITES

http://bioweb.pasteur.fr/intro-uk.html#log

http://inka.mssm.edu/docs/molmod/guide.html

inka.mssm.edu/endo/guide.html


Some cleanup software can be downloaded for free at

http://www.millipore.com/forms.nsf/autoregister
Bioinformatics free software

http://www.ebioinfogen.com/pcsoft.htm



Others


R. Kimball (Kim96)

Dealing with dirty data. DBMS, September 1996



  1. Maydanchik (May99)

Challenges of Efficient Data Cleansing.

Published in DM Direct in September 1999


J.I. Maletic, A. Marcus (MaMa00)

Data Cleansing: Beyond Integrity Analysis.

Proceedings of the Conference on Information Quality, October 2000
E. Rahm, Hong Hai Do (RaDo00)

Data Cleaning: Problems and current approaches.

IEEE Bulletin of the Technical Committee on Data Engineering, 2000, 24, 4
D. Bitton, D.J. DeWitt (BDeW83)

Duplicate record elimination in large data files.

ACM Transactions on Database Systems, 1983, 8, 2, 255-265
M.A. Hernandez, S.J. Stolfo (HeSt95)

The merge/purge problem for large databases.

Proceedings of the ACM SIGMOD Conference, 1995
A.E. Monge, C.P. Elkan (MoEl97)

An efficient domain-independent algorithm for detecting approximately duplicate database records.

Proceedings of the SIGMOD 1997 workshop on data mining and knowledge discovery, 1997
Mong Li Lee, Hongjun Lu, Tok Wang Ling, Yee Teng Ko (LLLK99)

Cleansing data for mining and warehousing.

Proceedings of the 10th International Conference on Database and Expert Systems Applications, Florence, Italy, August 1999
H. Galhardas, D. Florescu, D. Shasha, E. Simon (GFSS99)

An extensible framework for data cleaning.

INRIA Technical Report, 1999
H. Galhardas, D. Florescu, D. Shasha, E. Simon (GFSS00a)

Declaratively cleaning your data using AJAX.

16èmes Journées Bases de Données Avancées (BDA), Blois, France, October 2000
H. Galhardas, D. Florescu, D. Shasha, E. Simon (GFSS00b)

AJAX: An extensible data cleaning tool.

Proceedings of the ACM SIGMOD on Management of data, Dallas, TX USA, May 2000

H. Galhardas, D. Florescu, D. Shasha, E. Simon, C.-A. Saita (GFSSS01a)

Improving data cleaning quality using a data lineage facility.

Proceedings of the 3rd International Workshop on Design and Management of Data Warehouses, Interlaken, Switzerland, June 2001


H. Galhardas, D. Florescu, D. Shasha, E. Simon, C.-A. Saita (GFSSS01b)

Declarative data cleaning: Language, model, and algorithms.

Proceedings of the 27th VLDB Conference, Roma, Italy, 2001
Mong Li Lee, Tok Wang Ling, Wai Lup Low (LLL00)

IntelliClean: A knowledge-based intelligent data cleaner.



Proceedings of the ACM SIGKDD, Boston, USA, 2000
http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html

VecScreen is a system for quickly identifying segments of a nucleic acid sequence that may be of vector origin. NCBI developed VecScreen to combat the problem of vector contamination in public sequence databases. This web page is designed to help researchers identify and remove any segments of vector origin prior to sequence analysis or submission.


База данных защищена авторским правом ©shkola.of.by 2016
звярнуцца да адміністрацыі

    Галоўная старонка