HKU has designated the DataHub as the institutional repository to be long term storage and preservation of HKU research data. HKU researchers and research post-graduate students may deposit datasets to the DataHub. These may be replicating data sets for journal articles, theses or they may be standalone datasets produced by research projects.

Datasets will be linked to relevant authors, publications, patents, and grants, thereby raising visibility and the chance for discovery of all linked items. The DataHub will mint a DataCite DOI to the published dataset, well organize the dataset metadata, searchable by the search engine. It also support to publish data with conditions to keep the data privacy such as confidential files, embargoed files, metadata only dataset.

For more information about the DataHub, please access the DataHub LibGuide

HKU Data Policy

The HKU Policy on the Management of Research Data and Records sets several conditions for the retention and storage of research data. Amongst these are,

  • Research data and records should be retained for as long as they are of continuing value to the researcher and the wider research community, and as long as specified by research funder, patent law, legislative and other regulatory requirements.  The minimum retention period for research data and records is three years after publication or public release of the work of the research.  In many instances, researchers will resolve to retain research data and records for a longer period than the minimum requirement.
  • Researchers are responsible for [..] planning for the ongoing custodianship (at the University or using third-party services) of their data after the completion of the research or, in the event of their departure or retirement from the University, reaching agreement with the head of department/faculty (or his/her nominee) as to where such data will be located and how this will be stored;

To fulfill these requirements, HKU research data can be deposited into the HKU DataHub, or other repositories, some of which are described below.

What to Deposit?

The emphasis of the HKU RDM initiative is on "research integrity". Research results claimed in publications must be reproducible. Replication datasets must be preserved to enable this later reproducibility. All data, scripts, questionnaires, codebooks etc. necessary for a third party to arrive at the same research results claimed must be preserved.

As part of the data deposit, please indicate which datafiles are raw data (i.e. data that indicate the original data collection process such as questionnaires) and which are processed data (i.e. data ready for analysis in publications) – both are needed eventually, but raw data files are essential for any completion report.

Raw data may contain personal identifiers, and therefore must be stored in "restricted access". If the data contains sensitive, confidential or restricted data per the HKU Policy on Research Ethics, the researcher may, at his or her choice, wish to further make a version that anonymizes the data, for public access (with the approval of relevant IRBs or ethics committees), to show in open access.

  • Data Management Plan (DMP)
  • Dataset(s) quantitative and/or qualitative, raw and/or processed,
  • Metadata about all the data files including file formats (please use open formats wherever possible), Code book (i.e. description of variables), etc.
  • Readme file, giving particulars of data
  • Grant funder, name of grant, and number
  • Publication(s) if any, DOI, etc.
  • PI(s) and Co-I(s), identifiers (ORICD or Hub ID)
If data includes personal data,the data should be put under restricted access,
  • Personal data from clinical research (i.e. Institutional Review Board (IRB) approved)
    • provide approval code, consent forms, ethical application form when available, please state the risk of re-identification from the different datafiles and how the risk has been minimised for any dataset intended for sharing.
  • Personal data from non-clinical research (i.e. Human Research Ethics Committee (HREC) approved)
    • provide approval code, consent forms, ethical application form, please state the risk of re-identification from the different datafiles and how the risk has been minimised for any dataset intended for sharing.
If data includes interviews,
  • Interview transcripts
  • Blank questionnaire & interviewer guidelines
If field research data,
  • provide copy of file research notebook in digital format, preferably machine readable.
If lab research data,
  • copy of working papers and/or lab research notebooks in digital format, preferably machine readable.
If simulated data,
  • how was it generated? Please either explain or provide a link.
If other types of data, such as Image or video data, Creative or Design data,
  • please explain what type of data and how was it collected/generated.
If softwareis needed to read or analyze any of the datafiles,
  • please provide full details of software name, version needed, and any instructions necessary to obtain the software. If you have written your own script for analyzing the data, please include this script also in final deposit.
When ready,
  • final project reports and publications

Other Options, External

There are many options for data storage external to HKU. If your research team includes PIs/CO-Is from other institutions, those institutions may have recommendations or requirements for where to store and how to share research data from your research project. Your funder(s) and intended journal(s) for publication may have similar. The DCC has produced a guide to evaluating repositories for long-term data preservation. Some options below.  If you do deposit to an external repository, please also deposit the data, or the metadata only, in the HKU Scholars Hub.

  • The Registry of Research Data Repositories ( lists 1,500 research data repositories. It can be searched by subject, content type, and country, and filtered by many more categories.
  • DataCite Repository Selector tool is hosted by DataCite and queries the re3data registry of research data repositories.
  • Nature recommended lists of data repositories.
  • OpenDOAR, a global directory of Open Access repositories and their policies.
  • Open Access Directory's list of Disciplinary Repositories .
  • University of Minnesota Libraries webpage lists some of the more popular subject specific repositories by subject area.
  • Figshare will allow individuals to freely deposit up to 20 GB (single file size limit of 5 GB) of data, create a collaborative work space with other members of your research team, and mint a DOI for citing your dataset.
  • Dryad welcomes data files associated with any published article in the sciences or medicine, as well as software scripts and other files important to the article.
  • Zenodo