Advanced research computing (ARC) resources—platforms, tools, software, supports, and training—available through the University of Toronto.
What is Advanced Research Computing (ARC)?
Advanced research computing (ARC) includes the use of high performance computing, cloud computing, data storage and management, networking, and visualization to help solve complex problems.
You may need ARC if:
Your data is so large that you can’t store it locally, let alone process it;
Your simulations require huge compute;
You need help planning your workflows so that they finish in finite time; or
You want to use interactive computing that can be shared with others.
The Advanced Research Computing @ U of T video describes institution-wide ARC resources available through the University. Resources profiled include: SciNet and Compute Canada; The Southern Ontario Smart Computing Innovation Platform (SOSCIP); and Project Jupyter.
Advanced Computing Platforms Available Through U of T
SciNet is U of T’s primary high performance computing facility. SciNet offers computational resources, support, and expertise for Canadian researchers, as well as free education and training programs for students and users in advanced computing.
Compute Canada deploys Advanced Research Computing (ARC) systems and storage, and provides support for large-scale, data-intensive research projects.
In addition to SciNet, there are other Compute Canada sites across Canada that also provide ARC resources. Compute Canada’s resources include:
Three heterogenous general purpose clusters spread across the country
One homogeneous cluster (SciNet’s Niagara) and its GPU cluster (SciNet’s Mist)
Multiple cloud systems
Compute Canada resources are free. Rapid Access Service allows users to access a modest amount of storage and computing resources without having to apply to a Resource Allocation Competition. Rapid Access Service computing resources are available only for opportunistic use. If you need a priority CPU or GPU allocation, or require larger quantities of storage and computing resources than available via Rapid Access Service, you must apply to one of Compute Canada’s Resource Allocation Competitions.
GPU-Accelerated Platform (runs on SciNet’s Mist Cluster)
Parallel CPU Platform (homogeneous high-performance system attached to SciNet’s Niagara)
Cloud Analytics Platform (provides access to a broad array of IBM software tools)
To use SOSCIP resources, you must apply for access by submitting a project proposal. Projects must be an industry-academia collaboration; have advanced compute needs; and offer clear and realizable commercialization objectives. To find out more about SOSCIP project requirements, visit SOSCIP’s project guide.
The ITS Private Cloud service is an on-site U of T server and storage virtualization platform similar to public cloud providers. At the core of the ITS Private Cloud server virtualization environment is VMware’s vSphere platform, upon which ITS has developed services for self‐deployment and management of virtual machines (VMs).
Resources are available to meet your specific needs and can be added incrementally according to the pricing schedule outlined online.
The ITS Private Cloud is best suited to hosting individual VMs running web servers, databases, etc. and is not designed to accommodate large-scale server clusters or “high performance computing” (HPC) applications.
Researchers can apply for access to data by submitting a detailed proposal through the MicroData Access Portal. RDC staff members oversee the administrative aspects of projects that have already been approved to access RDC data holdings. For information, how to submit a proposal, and datasets available, please go to Statistics Canada’s RDC Program website.
A recording of the Jupyter Hub for Researchers 101 information session is available online. This session provides a basic overview of Jupyter; the use of Jupyter Notebooks and Labs to create interactive computing environments; and introduces the resources available through SciNet to support Jupyter Hub notebook sessions. The session explores the use of Jupyter through different use cases.
Other Software Options
The Licensed Software Office (LSO) negotiates and administers software license agreements with many vendors in order to decrease overall software costs to the University. Examples of software licenses negotiated through the LSO include SAS, SPSS, and MATLAB.
Git is a distributed version control system (DVCS) commonly used for software development. DVCSs allow full access to every file, branch, and iteration of a project, and allow every user access to a full and self-contained history of all changes.
GitHub is a code hosting platform for version control and collaboration.
GitHub Guides offers several training guides and videos to help you get started with and understand Git and GitHub. GitHub’s Hello World tutorial covers essentials like repositories, branches, commits, and Pull Requests.
The University of Toronto Libraries can be found on GitHub under the username utlib.
Research Data Management (RDM) refers to the processes applied through the lifecycle of a research project to guide the collection, documentation, storage, sharing and preservation of research data. RDM is an important part of any research project and should be considered during planning stages.
Review the Tri-Agency Research Data Management Policy. The University of Toronto is developing an institutional RDM policy that is compliant with the Tri-Agency policy. All grant proposals submitted to a Tri-Agency (the Canadian Institutes of Health Research (CIHR), the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Social Sciences and Humanities Research Council of Canada (SSHRC)) should include methodologies that reflect best practices in RDM.
U of T’s research data repository is Dataverse. U of T researchers can use Dataverse to deposit and share research data. To learn more about U of T Dataverse and other repositories, watch the Data Repositories @ U of T video.
Preservation planning may also be a part of your RDM planning. Watch the Digital Preservation Resources @ U of T video to learn about the basics of digital research preservation in the context of the research data lifecycle and the institution-wide digital preservation resources available at U of T.
The Federated Research Data Repository (FRDR) is a platform for digital RDM and discovery that offers repository, discovery, and preservation services. You can use FRDR to search for and download data across Canadian repositories. Faculty members from Canadian post-secondary institutions may use FRDR to publish their data.
You can also find more resources and services in the areas of copyright and licensing; data discovery and cleaning; data visualization; digital collections and exhibits; digital publishing; GIS; research data management; statistics and data analysis; and text and data mining at UTL.
The Carpentries builds global capacity in essential data and computational skills for conducting efficient, open, and reproducible research. The Carpentries project comprises the Software Carpentry, Data Carpentry, and Library Carpentry communities of Instructors, Trainers, Maintainers, helpers, and supporters who share a mission to teach foundational computational and data science skills to researchers.