Skip to main content
Hong Kong Baptist University Library Hong Kong Baptist University
 

A Guide to Data Analytics and Software: Home

Overview

Data is a set of values of qualitative or quantitative variables that scholars draw upon to support their claims and/or produce new knowledge.

 

 

We will go over the six steps of the Data Life Cycle with corresponding tools recommended to you.


 

 

What is impact?

Step 1: Data Creation

Before collecting data, it is best to plan ahead and ask yourself: What types and formats of data will be collected? Is there any copyright issue involved? What are the best approaches to store and back up data? You may go to the Research Data Management library guide for more information.  

Data can be collected

  • Through observation – generally be collected once and is unique
  • By experimenting – through experiments; in general can be repeated
  • By simulation – test models; usually can be reproduced
  • By researching sources – deriving from literature, manuscripts, publications, etc.
  • By data processing – combining, reprocessing, (re)grouping, etc. of data created before
  • By using existing data

 

 Library Services: 

If you have difficulties in filling out research data management plans (DMP) requested by publishers or fund agencies, please feel free to contact our Scholarly Communications Librarian Pauline Lam at lamlhp@hkbu.edu.hk.

Why care?

Step 2: Data Processing

This step involves data inputting (if the raw data is not collected in a digital format), data conversion (from one system to another system, or from one format to another format), and data cleaning.

Data cleaning requires tedious and time-consuming manual work, but its importance should not be underestimated. Proper data cleaning can prevent researchers from coming back to this step at a later stage of the research and avoid drawing false conclusions.

The following data cleaning tips can serve as a starting point:

  • Clear field labeling – make sure you can understand the labels even after one year of time
  • Remove unwanted observations – including duplicate or irrelevant observations
  • Filter unwanted outliers – only for the suspicious measurements that are unlikely to be accurate
  • Handle missing data – by dropping observations with missing values or inputting missing values based on other observations
  • Fix structural errors – including typos, inconsistent capitalization, and inconsistent name formats
  • Controlled vocabularies may help – e.g., develop a small dictionary to remind yourself to use "United States" (instead of "USA" or "America") or "computer" (instead of "computers" or "PC") throughout the document
  • Beware of strange characters – especially when you directly copy and paste web contents into an Excel; an invisible strange character is usually added at the end of a sentence

Some of these points are mentioned in EliteDataScience. Go there for a more comprehensive explanation.

 

 Software Recommendations: 

Why Altmetrics?

Step 3: Data Analysis

This is the most challenging but also most exciting part of the cycle. It can involve quantitative analysis, qualitative analysis, machine learning, etc.

This guide does not intend to cover basic statistics that can be found on the Internet easily. (If you have no idea which internet sites to use, you may start with Statistics How To.) We hope to introduce commonly-used software tools instead.

 

 Software Recommendations: 

Quantitative Analysis Software

Qualitative Analysis Software

Programming Languages to Provide an Integrated Support from Data Preparation to Web Applications

The following two programming languages are quite powerful and can support many aspects of the data life cycle, including web crawling, statistics, data manipulation, machine learning, data visualization, web applications, etc.

 

 Library Services: 

Stay tuned for our semester-based Research Data Tools Series training if you want to learn how to use these software. We also offer a limited number of course-embedded basic training each year.

Step 2

Step 4: Data Storage

This step involves short-term measures such as proper file version control during a research project and long-term data archiving measures to migrate data to the best format and store it in the most suitable medium for your or your company's future use. You may learn more about this through TechTarget.

 

 Tool Recommendations: 

Version Control Tool

Step 5

Step 5: Data Sharing

Data storage is more on internal use of data, but data sharing refers to open data that can be accessed and re-used by the public for free. Open data is not only a trend but also an obligation that researchers are recommended to meet for the benefits of academia and the society. Some major publishers also request authors to share their data, e.g., Nature and Science.

Data can be shared in its original form (after removing privacy and sensitive information) through publicly accessible data repositories. Researchers can also choose to share their data through data visualizations or developing interactive web applications.

 

 Tool Recommendations: 

Data Repositories

There are many data repositories available online for you to share data sets; some are subject-based, material-type specific, or region specific. If you are new to this area, you may want to start from the following three platforms:

You can also develop your own data management / sharing systems using open source data platforms:

  • CKAN (https://ckan.org/– both the US and HK Governments use this open source platform to share governmental data   FREE 

Data Visualization Software

 

 Library Services: 

The Library provides Digital Scholarship Services to help faculty members develop interactive web applications for public access. We offer Digital Scholarship Grant and a track of non-grant application.

step 6

Step 6: Re-use of Data

There are many free and subscribed data resources available for researchers to re-use. We will prepare another library guide for data resources. Please stay tuned.

Watch this!

Why do They Share Data?

 Watch these videos on why and how HKBU researchers share data.

video

Why does Data Management Matter?

 Watch this video on the importance of data storage, documentation, and file formats.

Top tips

Top Analytics Software
2016-18

developed by KDnuggets

 

Read full article here.

other resources

Find out more

Feel free to contact me if you have questions about
Research Data Services
.

Rebekah Wong
Head, Digital & Multimedia Services
rebekahw@hkbu.edu.hk