Bayliss Data Bazaar 2020

Introduction to Data Management and Processing

Main Organizer: Dr. Akila Wijerathna-Yapa | | Office Hours: Wednesdays 2:00 pm - 3:00 pm @ School of Molecular Sciences; Room 4.16 or by appointment

Co-Organizer: Prof. Valerie Verhasselt | | Office Hours: by appointment

Overview

This workshop will familiarise you with the basics of R through the RStudio interface and the tidyverse suite of R packages. You will be introduced to modern approaches to data analysis and visualization. The focus is on mastering basic skills and showing you where to go for help so you can undertake future analyses independently. By the end of this workshop you will know how to create and organize new “projects” in RStudio; read in data files; visualize data using the popular ggplot2 package; perform various data manipulation, summarization and modelling tasks; and create reproducible reports for bioinformatics analysis pipelines.Further, this course discusses the practical issues and techniques for data importing, tidying, transforming, and modeling. It offers a gentle introduction to techniques for processing big data. Programming is a cross-cutting aspect of the course. Participants will gain experience with data science tools through short assignments. The course work includes a term project based on real-world data. Required topics include: Data management and processing: definition & background; Data transformation; Data import; Data cleaning; Data modeling; Relational and analytic databases; Basics of Excel; Programming in R; Data processing pipelines, connecting multiple data management and analysis components; Interaction between the capabilities and requirements of data analysis methods (data structures, basic statistics) and the choice of data storage and management tools; Repeatable and reproducible data analysis.

Most bioinformatics tools are designed to be run from the command line hence the ability to run simple command line programs is an essential bioinformatics skill. This workshop will familiarise you with the basics of the Unix command line interface. We will show you how to navigate the file structure, run simple programs with arguments and open files. To keep it relevant to bioinformatics, we will demonstrate the samtools program and learn how to peer inside some common bioinformatic file formars (e.g. BAM file and fastq files)

In this course, we will introduce you to the different types of scientific articles and provide an outline of how to write an original research article describing your original scientific research.To enable you to write scientific research articles that can be published and to equip them with skills to cite and manage references/ bibliography while writing using a reference management tool such as EndNote.

Course Material and Schedule

Course Material Bayliss Data Bazaar Instructions Manual

Schedule: 8.00 am - 5:00 pm

Location: 373 Economics & commerce conference room, PHYS: [ 215] Lecture Room

Dates: 28th - 30th August 2020

Teaching team

TA: Assoc.Prof. Josh Mylne |

TA: Dr. Jakob Petereit |

TA: Dr. Brady Johnston |

TA: Dr. Patricia Macchiaverni |

TA: Farley Kwok van der Giezen |

TA: Dr. Saskia Freytag |

TA: Dr. Joanne Edmondston |

TA: Dr.James Lloyd |

TA: Dr. Dulce Vargas Landin |

TA: Kylie Black |

Required Textbooks:

R for Data Science by Hadley Wickham and Garrett Grolemund

Text Mining with R by Silge and Robinson

Advanced R by Hadley Wickham

R Packages by Hadley Wickham

Timetable