[LINK] Cybersecurity Training

Thu Jan 6 12:46:11 AEDT 2022

Toolset for Collecting Shell Commands and Its Application in Hands-on Cybersecurity Training

By Valdemar Švábenský, Jan Vykopal, Daniel Tovarnák & Pavel Celeda.  Masaryk University Brno, Czech Republic

https://arxiv.org/pdf/2112.11118.pdf

Abstract:

This Full Paper in the Innovative Practice category presents and evaluates a technical innovation for hands-on classes.

When learning cybersecurity, operating systems, or networking, students perform practical tasks using a broad range
of command-line tools.

Collecting and analyzing data about the command usage can reveal valuable insights into how students progress and where they make mistakes. However, few learning environments support recording and inspecting command-line inputs, and setting up an efficient infrastructure for this purpose is challenging.

To aid engineering and computing educators, we share the design and implementation of an open-source toolset for logging commands that students execute on Linux machines. Compared to basic solutions, such as shell history files, the toolset’s novelty and added value are threefold.

First, its configuration is automated so that it can be easily used in classes on different topics. Second, it collects metadata about the command execution, such as a timestamp, hostname, and IP address. Third, all data are instantly forwarded to central storage in a unified, semi-structured format. This enables automated processing of the data, both in real-time and post hoc, to enhance the instructors’ understanding of student actions.

The toolset works independently of the teaching content, the training network’s topology, or the number of students working in parallel. We demonstrated the toolset’s value in two learning environments at four training sessions. Over two semesters, 50 students played educational cybersecurity games using a Linux command-line interface. Each training session lasted approximately two hours, during which we recorded 4439 shell commands. The semiautomated data analysis revealed different solution patterns, used tools, and misconceptions of students.

Our insights from creating the toolset and applying it in teaching practice are relevant for instructors, researchers, and developers of learning environments. We provide the software and data resulting from this work so that others can use them in their hands-on classes.

I. INTRODUCTION

Hands-on training is vital for gaining expertise in computing disciplines. Topics such as cybersecurity, operating systems, and networking must be practiced in a computer environment so that students can try various tools and techniques. Such a learning environment is called a sandbox. It contains networked hosts that may be intentionally vulnerable to allow practicing cyber attacks and defense. These skills are grounded in the current cybersecurity curricular guidelines [1] to address the increasing shortage of cybersecurity workforce [2].

For the training, each student receives an isolated sandbox hosted locally or in a cloud. To solve the training tasks, students work with many tools, both in a graphical user interface (GUI) and a command-line interface (CLI). This paper focuses on the Linux CLI, which is common in higher education of computing, as well as software development in the industry practice.

Analyzing CLI interactions opens opportunities for educational research and classroom innovation. In traditional face-to-face classes, instructors must look at the students’ computer screens to observe the learning process. However, this approach does not scale for large classes, and it becomes difficult for distance education.

Instead, if the students’ executed commands are logged, instructors and researchers may leverage them to support learning. By employing the methods of educational data mining [3] and learning analytics [4], the CLI data can help achieve important educational goals, such as to:

• better understand students’ approaches to learning, both in face-to-face and remote classes,
• objectively assess learning, and
• provide targeted instruction and feedback.

This paper examines the following research question relevant for instructors: What can we infer from students’ command histories that is indicative of their learning processes? Specifically, our goal is to understand how students solve cybersecurity assignments by analyzing their CLI usage. To address this question, we propose a generic method for collecting CLI logs from hands-on training. Then, we evaluate this method by gathering the logs from 50 students at four training sessions and investigating three sub-questions / use cases of the data:

1) What does the command distribution indicate about the students’ approach to solving the tasks? Our motivation
is to analyze which tools are commonly used and how effective they are with respect to the training tasks.

2) Which commands are used immediately after the student accesses the learning environment? We can observe if
the students started solving the initial task, familiarized themselves with the environment, or displayed off-task
behavior. This allows for providing suitable scaffolding.

3) How much time do students spend on the tasks, and how often do they attempt an action? Observing the time
differences between successive commands can indicate the students’ skill level and support assessment.

Although there are many cybersecurity learning environments, which we review in Section II, their logging support
is often limited or non-existent. The current solutions do not allow instructors to uniformly collect CLI data and metadata with minimal setup and then correlate the logs from multiple sandboxes for advanced analyses.

This paper addresses this gap by presenting and evaluating a technical innovation for hands-on classes that employ CLI tools. We created a toolset that collects Linux shell commands in physical or virtual learning environments and stores them in a unified format. Compared to the previous practice, where observing students’ learning was difficult or even impossible, the proposed innovation enables understanding student approaches at scale. It also allows employing educational data mining and learning analytics techniques to gain further insights.

The toolset design is explained in Section III. In Section IV, we introduce a study that deploys the toolset in practice
and evaluates it in authentic educational contexts. Section V presents the results of the study and addresses the questions above. Section VI discusses the study and proposes multiple research ideas that further leverage the collected data. Finally, Section VII summarizes our contributions. We also publish the toolset as open-source software. Instructors, researchers, and developers can use it to enhance computing classes, such as teaching cybersecurity, operating systems, and networking. (snip)