# README.md

## Introductory Information

**Title of the Dataset:**
Data from JELAI usage at high school, learning Python through an online environment.

**Dataset Author & Contact:**
Stijn Risseeuw
*Contact:* srisseeuw@tudelft.nl

**General Introduction:**
This dataset contains fine-grained interaction logs from 61 high-school students (ages 15–17) in an introductory Python course. The data was collected over one semester as students used JELAI, an open-source Jupyter-based learning environment.

The dataset includes approximately 27,000 code execution events, 650,000 code editing events, and 1,200 chatbot question-answer pairs.

This data was collected to support the report "From prompt to progress: measuring impacts of student LLM queries on learning." The study aimed to analyze student interaction patterns (e.g., prompt classification, code edit analysis) and trace their relationship to learning goals and final course grades.

**File Formats:**
All data files are provided in Comma-Separated Values (`.csv`) format.

**File Relationships:**
The files are relational and can be linked using common identifiers:

* `User ID`: Links all files. It is the primary key in `users.csv` and a foreign key in `edits.csv`, `executions.csv`, `file_versions.csv`, and `messages.csv`.
* `Execution ID`: Links execution events. It is the primary key in `executions.csv` and a foreign key in `execution_errors.csv` and `execution_outputs.csv`.

## Methodological Information

**Data Collection Method:**
The data was collected from 61 high-school students aged 15-17 during their informatics classes. Students were learning basic Python concepts (variables, strings, conditionals, loops, functions).

## Data-Specific Information

### Dataset Structure and File Descriptions

#### `edits.csv`
Contains a high-resolution record of every code-editing event.

* **Edit ID:** (Primary Key) Unique identifier for each edit event.
* **User ID:** (Foreign Key) Identifier of the student who made the edit.
* **Datetime:** Timestamp of the edit.
* **Event type:** Type of edit event (e.g., 'typing', 'copy-pasting').
* **Filename:** Name of the file being edited.
* **Selection:** Text segment involved in the edit (e.g., the pasted content).

#### `executions.csv`
Summarizes all code execution events initiated by students.

* **Execution ID:** (Primary Key) Unique identifier for the execution.
* **User ID:** (Foreign Key) Identifier of the student executing the code.
* **Datetime:** Timestamp of execution.
* **Filename:** Name of the executed file.

#### `execution_errors.csv`
Logs all errors encountered during code execution events.

* **Execution error ID:** (Primary Key) Unique identifier for each recorded error.
* **Execution ID:** (Foreign Key) Links to the associated event in `executions.csv`.
* **Error name:** Type or name of the error (e.g., 'SyntaxError', 'RuntimeError').
* **Error value:** Descriptive value or message associated with the error.
* **Traceback:** Full traceback output from the error.

#### `execution_outputs.csv`
Contains standard output (e.g., `print()` statements) generated by code executions.

* **Execution output ID:** (Primary Key) Unique identifier for each output event.
* **Execution ID:** (Foreign Key) Links to the related execution in `executions.csv`.
* **Output type:** Category of output (e.g., 'stream').
* **Output text:** The actual text output produced by the program.

#### `file_versions.csv`
Stores complete snapshots of files at different points in time, typically upon execution or saving.

* **File version ID:** (Primary Key) Unique identifier for each file version.
* **User ID:** (Foreign Key) Identifier of the student who created the version.
* **Datetime:** Timestamp of version creation.
* **Filename:** Name of the file.
* **Code:** Full code content of the file at that version.

#### `messages.csv`
Contains all messages exchanged between students and the integrated chatbot.

* **Message ID:** (Primary Key) Unique identifier for each message.
* **User ID:** (Foreign Key) Identifier of the message sender.
* **Datetime:** Timestamp of the message.
* **Body:** Message text content.
* **Automated:** Boolean flag indicating the sender. `True` = chatbot, `False` = student.

#### `users.csv`
Lists user-level metadata for all consenting participants.

* **User ID:** (Primary Key) Unique identifier for each participant.
* **Grade:** The student’s final grade in the course.