Show simple item record

dc.contributor.advisorPeterson, G. Daniel
dc.contributor.advisorBridges, Susan
dc.contributor.authorSaha, Surya
dc.date2009
dc.date.accessioned2019-09-17T15:04:15Z
dc.date.available2019-09-17T15:04:15Z
dc.date.issued2009-07-01
dc.identifier.urihttps://hdl.handle.net/11668/15408
dc.description.abstractOur knowledge discovery algorithm employs a combination of association rule mining and graph mining to identify frequent spatial proximity relationships in genomic data where the data is viewed as a one-dimensional space. We apply mining techniques and metrics from association rule mining to identify frequently co-occurring features in genomes followed by graph mining to extract sets of co-occurring features. Using a case study of ab initio repeat finding, we have shown that our algorithm, ProxMiner, can be successfully applied to identify weakly conserved patterns among features in genomic data. The application of pairwise spatial relationships increases the sensitivity of our algorithm while the use of a confidence threshold based on false discovery rate reduces the noise in our results. Unlike available defragmentation algorithms, ProxMiner discovers associations among ab initio repeat families to identify larger more complete repeat families. ProxMiner will increase the effectiveness of repeat discovery techniques for newly sequenced genomes where ab initio repeat finders are only able to identify partial repeat families. In this dissertation, we provide two detailed examples of ProxMiner-discovered novel repeat families and one example of a known rice repeat family that has been extended by ProxMiner. These examples encompass some of the different types of repeat families that can be discovered by our algorithm. We have also discovered many other potentially interesting novel repeat families that can be further studied by biologists.
dc.publisherMississippi State University
dc.subject.lcshNucleotide sequence--Computer simulation.
dc.subject.lcshSequence alignment (Bioinformatics)--Computer simulation.
dc.subject.lcshData mining.
dc.subject.lcshComputer algorithms.
dc.subject.lcshRice--Genetics.
dc.subject.lcshPlant genomes.
dc.subject.lcshGenomics.
dc.subject.otherassociation rule mining
dc.subject.otherspatial rules
dc.subject.otherrepeat
dc.subject.otherdefragmentation
dc.subject.othergraph mining
dc.subject.otherassociation rule mining
dc.subject.otherspatial rules
dc.subject.otherrepeat
dc.subject.otherdefragmentation
dc.subject.othergraph mining
dc.subject.othernovel repeat regions
dc.subject.otherDNA
dc.titleProximity based association rules for spatial data mining in genomes
dc.typeDissertation
dc.publisher.departmentDepartment of Computer Science and Engineering.
dc.publisher.collegeEngineering
dc.date.authorbirth1978
dc.subject.degreeComputer Scinece
dc.subject.majorComputer Science
dc.contributor.committeePerkins, Andy
dc.contributor.committeeHansen, Eric
dc.contributor.committeeHodges, Julia
dc.contributor.committeeDandass, Yoginder


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record