CRIS Spotlight on …
Introduction to Advanced Research Computing (ARC)
Advanced research computing (ARC) resources—platforms, tools, software, supports, and training—available through the University of Toronto.
What is Advanced Research Computing (ARC)?
- Advanced research computing (ARC) includes the use of high performance computing, cloud computing, data storage and management, networking, and visualization to help solve complex problems.
- You may need ARC if:
- Your data is so large that you can’t store it locally, let alone process it;
- Your simulations require huge compute;
- You need help planning your workflows so that they finish in finite time; or
- You want to use interactive computing that can be shared with others.
- The Advanced Research Computing @ U of T video describes institution-wide ARC resources available through the University. Resources profiled include: SciNet and Compute Canada; The Southern Ontario Smart Computing Innovation Platform (SOSCIP); and Project Jupyter.
Advanced Computing Platforms Available Through U of T
SciNet
- SciNet is U of T’s primary high performance computing facility. SciNet offers computational resources, support, and expertise for Canadian researchers, as well as free education and training programs for students and users in advanced computing.
- SciNet systems:
- Niagara (a large homogeneous cluster)
- Mist (the GPU extension of Niagara)
- High Performance Storage System (HPSS) (a tape-backed hierarchical storage system)
- Teach cluster (a small, homogeneous cluster)
- Any qualified researcher at a Canadian University is eligible to use SciNet’s systems, for free. SciNet is part of Compute Canada; you must register with Compute Canada to access SciNet platforms.
- For more information, support, or consultation email support@scinet.utoronto.ca.
Compute Canada
- Compute Canada deploys Advanced Research Computing (ARC) systems and storage, and provides support for large-scale, data-intensive research projects.
- In addition to SciNet, there are other Compute Canada sites across Canada that also provide ARC resources. Compute Canada’s resources include:
- Three heterogenous general purpose clusters spread across the country
- One homogeneous cluster (SciNet’s Niagara) and its GPU cluster (SciNet’s Mist)
- Multiple cloud systems
- Compute Canada resources are free. Rapid Access Service allows users to access a modest amount of storage and computing resources without having to apply to a Resource Allocation Competition. Rapid Access Service computing resources are available only for opportunistic use. If you need a priority CPU or GPU allocation, or require larger quantities of storage and computing resources than available via Rapid Access Service, you must apply to one of Compute Canada’s Resource Allocation Competitions.
- You must register with the Compute Canada Database (CCDB) to access computing systems and storage.
Southern Ontario Smart Computing Innovation Platform (SOSCIP)
- The Southern Ontario Smart Computing Innovation Platform (SOSCIP) is a consortium that promotes collaborations between small- and medium-sized enterprises and academia, including the University of Toronto.
- SOSCIP’s advanced computing platforms:
- GPU-Accelerated Platform (runs on SciNet’s Mist Cluster)
- Parallel CPU Platform (homogeneous high-performance system attached to SciNet’s Niagara)
- Cloud Analytics Platform (provides access to a broad array of IBM software tools)
- To use SOSCIP resources, you must apply for access by submitting a project proposal. Projects must be an industry-academia collaboration; have advanced compute needs; and offer clear and realizable commercialization objectives. To find out more about SOSCIP project requirements, visit SOSCIP’s project guide.
- SOSCIP also offers access to its computing platforms on a fee-for-service.
- Need more information? Email info@soscip.org
Information Technology Services (ITS) Private Cloud
- The ITS Private Cloud service is an on-site U of T server and storage virtualization platform similar to public cloud providers. At the core of the ITS Private Cloud server virtualization environment is VMware’s vSphere platform, upon which ITS has developed services for self‐deployment and management of virtual machines (VMs).
- Resources are available to meet your specific needs and can be added incrementally according to the pricing schedule outlined online.
- The ITS Private Cloud is best suited to hosting individual VMs running web servers, databases, etc. and is not designed to accommodate large-scale server clusters or “high performance computing” (HPC) applications.
- The ITS Data Centre & Private Cloud Overview document provides detailed information on the ITS data centre and the server and storage architecture which underpin the ITS Private Cloud service.
- For more information, or to inquire for service, email hosting@utoronto.ca.
Toronto Region Statistics Canada Research Data Centre (RDC)
- Research Data Centres (RDCs) are secure computer labs where approved researchers can access and analyze confidential microdata.
- The Toronto Region Statistics Canada RDC is part of the Canadian Research Data Centre Network (CRDCN),a national network of centres offering secure access to Statistics Canada’s detailed microdata including census data, as well as to an increasing number of administrative data sets.
- Researchers can apply for access to data by submitting a detailed proposal through the MicroData Access Portal. RDC staff members oversee the administrative aspects of projects that have already been approved to access RDC data holdings. For information, how to submit a proposal, and datasets available, please go to Statistics Canada’s RDC Program website.
- For general RDC enquiries, email info@utoronto.ca.
Tools, Software, and Resources
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.
Software
- Jupyter is an open-source tool that supports interactive data science and scientific computing across programming languages.
- Some Compute Canada regional initiatives offer access to computing resources through Jupyter Hubs, including a SciNet Niagara node that has been designated a Jupyter Hub for research use. There is also a U of T Jupyter Hub available for teaching.
- A recording of the Jupyter Hub for Researchers 101 information session is available online. This session provides a basic overview of Jupyter; the use of Jupyter Notebooks and Labs to create interactive computing environments; and introduces the resources available through SciNet to support Jupyter Hub notebook sessions. The session explores the use of Jupyter through different use cases.
Other Software Options
- The Licensed Software Office (LSO) negotiates and administers software license agreements with many vendors in order to decrease overall software costs to the University. Examples of software licenses negotiated through the LSO include SAS, SPSS, and MATLAB.
- The Information Commons offers a list of available software.
- The Map and Data Library offers a curated list of Tools & Tutorials.
Git and GitHub
- Git is a distributed version control system (DVCS) commonly used for software development. DVCSs allow full access to every file, branch, and iteration of a project, and allow every user access to a full and self-contained history of all changes.
- GitHub is a code hosting platform for version control and collaboration.
- GitHub Guides offers several training guides and videos to help you get started with and understand Git and GitHub. GitHub’s Hello World tutorial covers essentials like repositories, branches, commits, and Pull Requests.
- The University of Toronto Libraries can be found on GitHub under the username utlib.
Research Data Management
- Research Data Management (RDM) refers to the processes applied through the lifecycle of a research project to guide the collection, documentation, storage, sharing and preservation of research data. RDM is an important part of any research project and should be considered during planning stages.
- Review the Tri-Agency Research Data Management Policy. The University of Toronto is developing an institutional RDM policy that is compliant with the Tri-Agency policy. All grant proposals submitted to a Tri-Agency (the Canadian Institutes of Health Research (CIHR), the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Social Sciences and Humanities Research Council of Canada (SSHRC)) should include methodologies that reflect best practices in RDM.
- The University of Toronto Libraries offers information and guidance on research data management online. Librarians and library staff have expertise in finding, managing, visualizing, analyzing, preserving, and sharing all kinds of data. You can take advantage of the Map & Data Library’s Drop-In Hours or complete a Request Form for support.
- U of T’s research data repository is Dataverse. U of T researchers can use Dataverse to deposit and share research data. To learn more about U of T Dataverse and other repositories, watch the Data Repositories @ U of T video.
- Preservation planning may also be a part of your RDM planning. Watch the Digital Preservation Resources @ U of T video to learn about the basics of digital research preservation in the context of the research data lifecycle and the institution-wide digital preservation resources available at U of T.
- The Portage Network offers a suite of tools and resources, including the DMP (Data Management Plan) Assistant (an online data management planning tool) and discipline- and methodology-specific Data Management Plan (DMP) templates, such as the DMP Template for Advanced Research Computing.
- The Federated Research Data Repository (FRDR) is a platform for digital RDM and discovery that offers repository, discovery, and preservation services. You can use FRDR to search for and download data across Canadian repositories. Faculty members from Canadian post-secondary institutions may use FRDR to publish their data.
- For additional RDM resources, check out the CRIS Spotlight on Working Securely: Remote Data Collection and Storage and RDM resources in the CRIS Resource Hub.
Other Resources
- The University of Toronto Libraries offer many sources for data, both in its collections and through subscriptions. You can start your search for datasets through the Map and Data Library website.
Training
SciNet
- SciNet offers training in research computing and data science through a large offering of cross-disciplinary, hands-on, and skill based workshops and lecture series. Users and students can get a certificate in Scientific Computing, Data Science, or High Performance Computing once they have completed enough SciNet credit-hours. A list of courses, recordings and slides of past courses is freely available on the SciNet education website.
The University of Toronto Libraries (UTL)
- The University of Toronto Libraries (UTL) regularly offers workshops in the areas of Geographic Information Systems (GIS), Data & Statistics, Digital Tools, and Programming & Software. UTL also offers several self-paced modules and recorded workshops available online. There are other self-paced data and GIS specific tutorials available on the Map and Data Library website.
- You can also find more resources and services in the areas of copyright and licensing; data discovery and cleaning; data visualization; digital collections and exhibits; digital publishing; GIS; research data management; statistics and data analysis; and text and data mining at UTL.
The Carpentries
- The Carpentries builds global capacity in essential data and computational skills for conducting efficient, open, and reproducible research. The Carpentries project comprises the Software Carpentry, Data Carpentry, and Library Carpentry communities of Instructors, Trainers, Maintainers, helpers, and supporters who share a mission to teach foundational computational and data science skills to researchers.