CRIS Spotlight on …

Introduction to Advanced Research Computing (ARC)

Advanced research computing (ARC) resources—platforms, tools, software, supports, and training—available through the University of Toronto.

Advanced Computing Platforms Available Through U of T

  • SciNet is U of T’s primary high performance computing facility. SciNet offers computational resources, support, and expertise for Canadian researchers, as well as free education and training programs for students and users in advanced computing.
  • SciNet systems:
    • Niagara (a large homogeneous cluster)
    • Mist (the GPU extension of Niagara)
    • High Performance Storage System (HPSS) (a tape-backed hierarchical storage system)
    • Teach cluster (a small, homogeneous cluster)
  • Any qualified researcher at a Canadian University is eligible to use SciNet’s systems, for free. SciNet is part of Compute Canada; you must register with Compute Canada to access SciNet platforms.
  • For more information, support, or consultation email support@scinet.utoronto.ca.
  • Compute Canada deploys Advanced Research Computing (ARC) systems and storage, and provides support for large-scale, data-intensive research projects.
  • In addition to SciNet, there are other Compute Canada sites across Canada that also provide ARC resources. Compute Canada’s resources include:
    • Three heterogenous general purpose clusters spread across the country
    • One homogeneous cluster (SciNet’s Niagara) and its GPU cluster (SciNet’s Mist)
    • Multiple cloud systems
  • Compute Canada resources are free. Rapid Access Service allows users to access a modest amount of storage and computing resources without having to apply to a Resource Allocation Competition. Rapid Access Service computing resources are available only for opportunistic use. If you need a priority CPU or GPU allocation, or require larger quantities of storage and computing resources than available via Rapid Access Service, you must apply to one of Compute Canada’s Resource Allocation Competitions.
  • You must register with the Compute Canada Database (CCDB) to access computing systems and storage.
  • The Southern Ontario Smart Computing Innovation Platform (SOSCIP) is a consortium that promotes collaborations between small- and medium-sized enterprises and academia, including the University of Toronto.
  • SOSCIP’s advanced computing platforms:
    • GPU-Accelerated Platform (runs on SciNet’s Mist Cluster)
    • Parallel CPU Platform (homogeneous high-performance system attached to SciNet’s Niagara)
    • Cloud Analytics Platform (provides access to a broad array of IBM software tools)
  • To use SOSCIP resources, you must apply for access by submitting a project proposal. Projects must be an industry-academia collaboration; have advanced compute needs; and offer clear and realizable commercialization objectives. To find out more about SOSCIP project requirements, visit SOSCIP’s project guide.
  • SOSCIP also offers access to its computing platforms on a fee-for-service.
  • Need more information? Email info@soscip.org
  • The ITS Private Cloud service is an on-site U of T server and storage virtualization platform similar to public cloud providers. At the core of the ITS Private Cloud server virtualization environment is VMware’s vSphere platform, upon which ITS has developed services for self‐deployment and management of virtual machines (VMs).
  • Resources are available to meet your specific needs and can be added incrementally according to the pricing schedule outlined online.
  • The ITS Private Cloud is best suited to hosting individual VMs running web servers, databases, etc. and is not designed to accommodate large-scale server clusters or “high performance computing” (HPC) applications.
  • The ITS Data Centre & Private Cloud Overview document provides detailed information on the ITS data centre and the server and storage architecture which underpin the ITS Private Cloud service.
  • For more information, or to inquire for service, email hosting@utoronto.ca.

Tools, Software, and Resources

Project Jupyter

  • Jupyter is an open-source tool that supports interactive data science and scientific computing across programming languages.
  • Some Compute Canada regional initiatives offer access to computing resources through Jupyter Hubs, including a SciNet Niagara node that has been designated a Jupyter Hub for research use. There is also a U of T Jupyter Hub available for teaching.
  • A recording of the Jupyter Hub for Researchers 101 information session is available online. This session provides a basic overview of Jupyter; the use of Jupyter Notebooks and Labs to create interactive computing environments; and introduces the resources available through SciNet to support Jupyter Hub notebook sessions. The session explores the use of Jupyter through different use cases.

Other Software Options

  • The Licensed Software Office (LSO) negotiates and administers software license agreements with many vendors in order to decrease overall software costs to the University. Examples of software licenses negotiated through the LSO include SAS, SPSS, and MATLAB.
  • The Information Commons offers a list of available software.
  • The Map and Data Library offers a curated list of Tools & Tutorials.

Git and GitHub

  • Git is a distributed version control system (DVCS) commonly used for software development. DVCSs allow full access to every file, branch, and iteration of a project, and allow every user access to a full and self-contained history of all changes.
  • GitHub is a code hosting platform for version control and collaboration.
  • GitHub Guides offers several training guides and videos to help you get started with and understand Git and GitHub. GitHub’s Hello World tutorial covers essentials like repositories, branches, commits, and Pull Requests.
  • The University of Toronto Libraries can be found on GitHub under the username utlib.
  • Research Data Management (RDM) refers to the processes applied through the lifecycle of a research project to guide the collection, documentation, storage, sharing and preservation of research data. RDM is an important part of any research project and should be considered during planning stages.
  • Review the Tri-Agency Research Data Management Policy. The University of Toronto is developing an institutional RDM policy that is compliant with the Tri-Agency policy. All grant proposals submitted to a Tri-Agency (the Canadian Institutes of Health Research (CIHR), the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Social Sciences and Humanities Research Council of Canada (SSHRC)) should include methodologies that reflect best practices in RDM.
  • The University of Toronto Libraries offers information and guidance on research data management online. Librarians and library staff have expertise in finding, managing, visualizing, analyzing, preserving, and sharing all kinds of data. You can take advantage of the Map & Data Library’s Drop-In Hours or complete a Request Form for support.
  • U of T’s research data repository is Dataverse. U of T researchers can use Dataverse to deposit and share research data. To learn more about U of T Dataverse and other repositories, watch the Data Repositories @ U of T video.
  • Preservation planning may also be a part of your RDM planning. Watch the Digital Preservation Resources @ U of T video to learn about the basics of digital research preservation in the context of the research data lifecycle and the institution-wide digital preservation resources available at U of T.
  • The Portage Network offers a suite of tools and resources, including the DMP (Data Management Plan) Assistant (an online data management planning tool) and discipline- and methodology-specific Data Management Plan (DMP) templates, such as the DMP Template for Advanced Research Computing.
  • The Federated Research Data Repository (FRDR) is a platform for digital RDM and discovery that offers repository, discovery, and preservation services. You can use FRDR to search for and download data across Canadian repositories. Faculty members from Canadian post-secondary institutions may use FRDR to publish their data.
  • For additional RDM resources, check out the CRIS Spotlight on Working Securely: Remote Data Collection and Storage and RDM resources in the CRIS Resource Hub.
  • The University of Toronto Libraries offer many sources for data, both in its collections and through subscriptions. You can start your search for datasets through the Map and Data Library website.

Training

  • The University of Toronto Libraries (UTL) regularly offers workshops in the areas of Geographic Information Systems (GIS), Data & Statistics, Digital Tools, and Programming & Software. UTL also offers several self-paced modules and recorded workshops available online. There are other self-paced data and GIS specific tutorials available on the Map and Data Library website.
  • You can also find more resources and services in the areas of copyright and licensing; data discovery and cleaning; data visualization; digital collections and exhibits; digital publishing; GIS; research data management; statistics and data analysis; and text and data mining at UTL.
  • The Carpentries builds global capacity in essential data and computational skills for conducting efficient, open, and reproducible research. The Carpentries project comprises the Software CarpentryData Carpentry, and Library Carpentry communities of Instructors, Trainers, Maintainers, helpers, and supporters who share a mission to teach foundational computational and data science skills to researchers.