Research Data Management

What is Research Data Management (RDM)?

Research Data Management (RDM) is part of a lifecycle that aims to facilitate effective and efficient research. It typically starts from data planning and proposal writing and continues through to dissemination and archiving.

This word cloud highlights some RDM work areas. Some may seem obvious and easy to tackle while others might be hard to get your head around. In any case, a good RDM workflow requires careful and thoughtful planning, and a methodical approach will definitely pay off in the long term and help you avoid last minute panic attacks.

Invest in Planning

Planning forces you to think through your objectives and avoid future headaches, particularly if you are not a naturally methodical person. Planning is also important because funders, both local and international, now increasingly require Data Management Plans (DMPs) to be submitted as part of grant applications.

Think back the number of times you lost data simply due to negligence and lack of planning, only to regret it afterwards. Planning not only safeguards your data, it also allows you to monitor and adapt to changes as well as facilitate sharing and long term preservation, helping to ensure data longevity.

Start with a Data Management Plan (DMP)

A Data Management Plans (DMP) is not difficult to write. It is simply a document that describes the people behind the data, what data will be collected, and how they will be handled during and after the project. Here are the components of a typical DMP:

You can also download the HKBU DMP template to make a start. The two free DMP tools below can be helpful too, particularly if you are applying for overseas funding as they provide various funder templates to choose from. 

Tool Recommendations

DMP Tool

This tool has a selection of sample plans from funders such as National Science Foundation (NSF) and National Institute of Health (NIH). The click-through wizard takes you through the creation of a DMP to comply with funder requirements.

DMPOnline

Powered by the U.K.'s Digital Curation Centre (DCC) and based on funders' policy requirements, DMPOnline allows you to download templates, including European Commission (Horizon 2020) and Arts and Humanities Council (AHRC), etc.

Organize and Document Your Data

As you start the process of creating and collecting data, organizing and documenting your work will become increasingly demanding. The following is a few areas to be aware of. Then, you can get more tips on managing research data here.

Metadata

Recording metadata means listing information about data. These can include researcher names, dates and other project details. Lab Notebooks (paper or digital) or Data Dictionaries are both helpful tools. Clear metadata will help you refer back to old files and avoid painful searching later on.  

File Formats

Save data in non-proprietary (open) format whenever possible. Data Best Practices: Format Files include JPEG, PNG, PDF/A, MP3 and MOV. Including a readme.txt file in your directory with the name and version of the software used, and the company that produced the software, is good practice. 

Terminology

Use standardized terminology in your field to help avoid confusion. An example include the ICPSR Glossary of Social Science Terms. This Guide to developing taxonomies for effective data management provides an overview of the taxonomy concept. 

File Naming Convention

Be consistent and descriptive with Data Best Practices: Name Files to ensure data discovery later on. Set up a clear Directory Structure that includes information like project title, date and unique identifiers. Versioning file names to end with YYYYMMDD or YYMMDD will help sort them into chronological order. 

Quality Control

Data cleaning is almost always an extremely painful process and should be avoided at all costs. Exercise quality control from the start to ensure that you have Tidy Data - remember, prevention is always easier than cure! 

Why Share?

Increasingly, publishers and funding agencies require you to share your data. Data sharing encourages connection and collaboration, avoids duplication of effort, and is likely to increase citation rates. Grant agencies such as the National Science Foundation (NSF) has a compulsory data deposit policy since 2011. Another example is SpringerNature's Data Policies that mandate authors to submit datasets to recommended repositories such as figshare or the Dryad Digital Repository during submission, specifying that manuscripts will not be reviewed if otherwise. In Hong Kong, various Research Grants Council (RGC) schemes such as HSSPFS also encourage sharing by giving extra points to applications willing to share data.


Why not share? "Organising data in a presentable and useful way" was the most often cited reason, according to a 2018 Report from Springer Nature. While sharing practice differs widely across disciplines, the pressure on faculty to comply with funder mandates will only increase over time. This report not only indicates a serious gap in data management support and education, but also identifies the pressing need for a significant RDM culture change.

How to Cite Data

Just as with any other scholarly resource, data also require citations to acknowledge the original author/producer and to help other researchers find them. A dataset citation includes the same components as other citations. DataCite recommends the following examples:

Creator (Publication Year). Title. Publisher. Identifier

Creator (Publication Year). Title. Version. Publisher. ResourceType. Identifier

Digital Curation Centre's (DDC) How to Cite Datasets and Link to Publications guide is very helpful, particularly for those working on data-led research.

Data Management and Sharing Snafu in Short Acts

Watch this hilarious 4-min data management "horror" story describes what shouldn't happen when a researcher requests for data sharing! Topics include storage, documentation, and file formats.  

RDM Common Sense Approach

Extra Points for RGC Funds

Score extra points by sharing your research data! The Explanatory Notes of many HK Research Grants Council (RGC) schemes such as the Humanities & Social Sciences Prestigious Fellowship Scheme (HSSPFS), (Senior) Research Fellow Scheme and Early Career Scheme (ECS) all specify that:

"Awardees should assess data archive potential and opportunities for data sharing. Due additional weight will be given to an application where the applicants are willing to make research data available to others."

RGC also provides training videos and training materials on research data management. Enter your email address here to access these materials.