FastPval: a fast and memory efficient program to calculate very low p-values from empirical distribution

Li, J; Wang, JJ

File Download

There are no files associated with this item.

Supplementary

Citations:
Appears in Collections:
- Biochemistry: Conference papers

Conference Paper: FastPval: a fast and memory efficient program to calculate very low p-values from empirical distribution

Title	FastPval: a fast and memory efficient program to calculate very low p-values from empirical distribution
Authors	Li, J Wang, JJ
Issue Date	2011
Publisher	Chinese Academy of Sciences and Sun Yat-Sen University.
Citation	5th IEEE International Conference on Systems Biology (ISB 2011), Zhuhai, China, 2-4 September 2011, p. Paper ID: 28 How to Cite?
Abstract	Resampling methods, such as permutation and bootstrap, have been widely used to generate an empirical distribution for assessing the statistical significance of a measurement. However, to obtain a very low p-value, a large size of resampling is required, where computing speed, memory and storage consumption become bottlenecks, and sometimes become impossible, even on a computer cluster. We have developed a multiple stage p-value calculating program called FastPval that can efficiently calculate very low (up to 10-9) p-values from a large number of resampled measurements. With only two input files and a few parameter settings from the users, the program can compute p-values from empirical distribution very efficiently, even on a personal computer. When tested on the order of 109 resampled data, our method only uses 52.94% the time used by the conventional method, implemented by standard quicksort and binary search algorithms, and consumes only 0.11% of the memory and storage. Furthermore, our method can be applied to extra large datasets that the conventional method fails to calculate. The accuracy of the method was tested on data generated from Normal, Poison and Gumbel distributions and was found to be no different from the exact ranking approach. We have applied our method to finding of transcription factor binding sites (TFBS) in promoter region and genome wide association studies (GWAS). It is proved a better computational efficiency and extensive application in bioinformatics statistical measurement. The FastPval executable file, the java GUI and source code, and the java web start server with example data and introduction, are available at http://wanglab.hku.hk/pvalue.
Persistent Identifier	http://hdl.handle.net/10722/252640

DC Field	Value	Language
dc.contributor.author	Li, J	-
dc.contributor.author	Wang, JJ	-
dc.date.accessioned	2018-04-27T04:10:12Z	-
dc.date.available	2018-04-27T04:10:12Z	-
dc.date.issued	2011	-
dc.identifier.citation	5th IEEE International Conference on Systems Biology (ISB 2011), Zhuhai, China, 2-4 September 2011, p. Paper ID: 28	-
dc.identifier.uri	http://hdl.handle.net/10722/252640	-
dc.description.abstract	Resampling methods, such as permutation and bootstrap, have been widely used to generate an empirical distribution for assessing the statistical significance of a measurement. However, to obtain a very low p-value, a large size of resampling is required, where computing speed, memory and storage consumption become bottlenecks, and sometimes become impossible, even on a computer cluster. We have developed a multiple stage p-value calculating program called FastPval that can efficiently calculate very low (up to 10-9) p-values from a large number of resampled measurements. With only two input files and a few parameter settings from the users, the program can compute p-values from empirical distribution very efficiently, even on a personal computer. When tested on the order of 109 resampled data, our method only uses 52.94% the time used by the conventional method, implemented by standard quicksort and binary search algorithms, and consumes only 0.11% of the memory and storage. Furthermore, our method can be applied to extra large datasets that the conventional method fails to calculate. The accuracy of the method was tested on data generated from Normal, Poison and Gumbel distributions and was found to be no different from the exact ranking approach. We have applied our method to finding of transcription factor binding sites (TFBS) in promoter region and genome wide association studies (GWAS). It is proved a better computational efficiency and extensive application in bioinformatics statistical measurement. The FastPval executable file, the java GUI and source code, and the java web start server with example data and introduction, are available at http://wanglab.hku.hk/pvalue.	-
dc.language	eng	-
dc.publisher	Chinese Academy of Sciences and Sun Yat-Sen University.	-
dc.relation.ispartof	IEEE International Conference on Systems Biology (ISB)	-
dc.title	FastPval: a fast and memory efficient program to calculate very low p-values from empirical distribution	-
dc.type	Conference_Paper	-
dc.identifier.email	Wang, JJ: junwen@hku.hk	-
dc.identifier.authority	Wang, JJ=rp00280	-
dc.identifier.hkuros	208323	-
dc.identifier.spage	Paper ID: 28	-
dc.identifier.epage	Paper ID: 28	-
dc.publisher.place	Zhuhai, China	-

File Download

Supplementary

Conference Paper: FastPval: a fast and memory efficient program to calculate very low p-values from empirical distribution

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats