University of Surrey

I started work for the Faculty of Engineering and Physical Sciences (FEPS) at The University of Surrey on 13 February 2017, working as a Senior Systems Administrator  in a DevOps focused Systems Administration team of six.

I am responsible for the delivery and development of the FEPS IT Infrastructure to support teaching, research and professional services. My primary focus is on Linux systems, comprising of a large server estate and desktop/lab estate running Ubuntu (LTS). I am an active developer on the FEPS Chef infrastructure which underpins our platforms, allowing for fast, responsive and collaborative development of our estate taking the ‘Infrastructure as Code’ approach.

As part of the FEPS Systems Administration team, we are jointly responsible for third-line support for support tickets escalated to us from first/second line User Support. I am responsible to ensure tickets are completed within the expected Service Level Agreement. Where I am unable to resolve tickets I must use my skills to research the problem and to develop a solution, and then ensuring that the solution is documented through our Knowledge Base.

License management and deployment through various different license managers (including FlexLM) on both Windows and Linux in both Terminal Server and Workstation environments, as well as identifying the need for new licenses and servers.

I am also responsible for developing and supporting the Windows Desktop and Server infrastructure in use throughout FEPS.

DevOps

As systems administrators in the faculty, we manage our entire Linux infrastructure through the use of an automation tool known as Chef. This is underpinned by our in-house hosted version control system GITLab Enterprise.

Environments

Our Chef infrastructure is split into two environments. Production and Test. Production covers around 95% of our workstations, labs and servers while the Test environment covers a small selection of computers from each category along with all 3rd-line FEPS IT workstations and Test VM’s.

Each environment has an associated GITLab Branch:

GIT Branch Chef Environment
Master Production
Test Test
Development n/a

Changes are always built and submitted to development branches, and then merged into Test. Test is then merged into Master once changes have been verified (See Change Management for details). Submitting changes to Master and Test is not permitted through the use of Protected Branches.

Change Management

When a change is made, the change is first tested using Test Kitchen, or by using chef-zero to apply it straight one of our own systems. Once the change is self-verified, the change is upload to GIT (Push) and submitted as a Merge Request to the Test branch. A stringent peer-review process is now undertaken by another member of the Systems Administration team. This could involve running the update on a local system, a VM, or simply by vetting the code visually. Once complete, the change will be merged into the Test environment and then immediately checked on various platforms. If no issues are discovered, the change will automatically roll out to other machines within a two-hour window.

Once the change has been in the test-environment for a suitable amount of time (minimum 2 weeks) then the change will be merged with the production environment. This is typically done after another peer review and the final merge into Production will rarely be done by the change author.

During the whole change process, we ensure that we can always back out of any changes we make through the use of the GITLab Version Control system as well as the in-built Cookbook version control system in Chef.

Continuous Integration

Through the use of GITLab Continuous Integration runners (CI Runners), all Chef Cookbook changes are put through three phases. Syntax checking, Linting, Deployment. This is done through the use of a .gitlab-ci.yml file that automatically runs these stages.

All changes pushed or merged into branches are automatically checked for Syntax and Linting errors. The following commands are run against the files in the branch

Stage Checker Command
Syntax Ruby find . -name “*.rb” | xargs -I {} -t bash -c “ruby -c {}”
Syntax YAML find . -name “*.yml” | xargs -I {} -t bash -c “ruby -ryaml -e \” YAML.parse(‘{}’)\””
Syntax JSON find . -name “*.json” | xargs -I {} -t bash -c “ruby -rjson -e \” JSON.parse(File.read(‘{}’))\””
Linting Ruby rubocop -f s -D -S .
Linting Chef foodcritic -P .

Any issues detected during the submission of code automatically throw errors, which are both emailed to the developer and pushed to Slack for instant notification.

Once these stages have been passed (successfully), the Deployment stage is entered, but only if this is run against the Master or Test branches.

knife cookbook upload $CI_PROJECT_NAME -E <environment> --freeze

Storage Administration

In December 2017 I began working with the existing Senior Storage Administrator in FEPS to support the faculties extensive Storage systems.

Quantum StorNext

The Quantum StorNext Filesystem in use at the University of Surrey is in-place to provide high-performance and high-capacity  storage to Research Computing services within FEPS.

The Quantum StorNext system leverages a different approach to storage, separating the Object and Metadata into two separate array, the benefit of which is higher performance. The platform offers only Unstructured storage (files).

The University of Surrey Quantum StorNext system is split into two areas of protection – FlexSync and StorageManager.

FlexSync backups replicate to a second Quantum array housed in another off-site datacentre,

StorageManager goes to Quantum Lattuc is not optimised for handling large quantities of small file

Dell FluidFS

 

OnStor Bobcat

 

Areas of responsibility

Faculty of Engineering and Physical Sciences

Departments including:

  • Department of Chemistry
  • Department of Physics
  • Department of Computer Science
  • Department of Chemical and Process Engineering
  • Department of Mechanical Engineering Sciences
  • Department of Civil and Environmental Engineering
  • Department of Mathematics
  • Department of Electrical and Electronic Engineering
  • Centre for Environment and Sustainability

Research Centres:

  • Advanced Technology Institute (ATI)
  • Center for Vision, Speech and Signal Processing (CVSSP)
  • Institute of Communication Systems (ICS), including 5G Innovation Centre.
  • Ion Beam Centre

Platforms

Chef

Chef Automate, Compliance, Inspec

GIT

Administrator of a  Locally hosted GITLab instance. Used in conjunction with Chef for development of cookbooks. Includes CI (continuous integration via runners), and administration of projects, groups, and server.

SCCM

2012 version: packaging, software deployment, OS Installation / imaging for FEPS labs.

SVN

Administration, Maintenance

Apache2

Migration, management, security, etc

Physical Platforms

  • OnStor Bobcat – SAN > Network gateways
  • Quantum Dot.Hill – storage trays
  • QLogic Storage Switches

Projects

FEPS IT Ubuntu 16.04 Security Audit

Information coming soon!

HAProxy / KeepAliveD – High Availability

In mid 2017 I designed a new High Availability platform through the use of HAProxy and KeepAliveD. This platform allows us to deliver a single service with multiple end-points while presenting a single common service to our customers. KeepAliveD then underpinned the reliability of HAProxy by allowing multiple HAProxy gateways to exist in a redundant (active/passive) configuration. Servers were then placed on different virtual infrastructures to further improve reliability and the dependence on any single piece of infrastructure which could lead to outages.

For this project I was required to look at the departments upcoming requirements and to engineer a solution which could be scaled to future requirements. HAProxy allows us to provide a basic  High Availability front-end for services such as SSH, but also allows a more advanced implementation with HTTP/S services such as our external/internal web hosting, or our internal Apt-Repo, Apt-mirror and YUM-Repo services.

The HAProxy and KeepAliveD configurations were built within Chef and then bootstrapped using PXE/Kickstart. This approach (along with the use of templates within Chef) allows us to dynamically build/destroy HAProxy gateways at will.

HA SSH with Two-Factor Authentication service

Delivery an SSH service for access to FEPS systems while enforcing new security requirements that all external users must authenticate against services using two-factor methods.

Leveraging the existing centralised authentication and storage infrastructure, I deployed the Google Two-Factor authentication PAM Module to two new SSH Servers. Both of these were placed behind the load balancers so that users would be directed to each server on a round-robin basis. Deploying the Google 2FA module to all FEPS Linux Desktops and terminal servers allows users to generate / store Google 2FA tokens, which are accessed by the SSH Servers upon login.

Next I deployed two more SSH servers without the 2FA PAM module, adding them to a different resource pool and configuring the load balancer to direct internal logons to those instead, thus bypassing the 2FA requirements for users already inside the University networks.

The new SSH servers were built using Chef, with the 2FA PAM Module being turned on/off using Chef tags and templated configuration files. This method (along with bootstrapping the server build process into Kickstart/PXE) allows us to build/destroy SSH servers as needs arise, with new servers being automatically inserted into the pool of servers controlled by HAProxy.

Web Server Migration / consolidation

CentOS deployment through Chef

  • Reduction of file specificity within Chef to better support an OS-agnostic approach.

HA Web Server platform

Upcoming

Storage management

Coming soon

ICS