Content filters ensure CurricuLLM only provides safe, age-appropriate, and curriculum-aligned responses for different groups of users.

What content filters are

A content filter is a set of rules that controls what CurricuLLM will allow or block.
Toggling on a content filter rule blocks that category of content from being input by users or output by AI models. Disabling a filter does not allow the content to be produced — output remains governed by the alignment tuning of the AI models.
Administrators can create multiple filters and assign them to different roles (e.g. younger students vs. staff).

Managing content filters in CurricuLLM

View filters
- The sidebar lists all available filters with their name, role count, and a search bar at the top.
- Select a filter to view and edit its rules.
- The active count is displayed in the top-right corner (e.g. "11/15 active").
Create a new filter
- Click + Add Filter and give the filter a unique name.
Edit filter rules
- Toggle individual filter rules on or off using the switches next to each category.
- Each rule shows the category name and a description of what it covers.
Delete a filter
- Filters can only be deleted if no role is currently using them.
- If a filter is assigned to a role, reassign that role to a different filter first.

Filter rule categories

Violent Crimes — Responses that enable, encourage, or endorse the commission of violent crimes, including unlawful violence toward people (e.g. terrorism, genocide, murder, child abuse, assault, battery, kidnapping) and unlawful violence toward animals (e.g. animal abuse).
Non-Violent Crimes — Responses that enable, encourage, or endorse the commission of non-violent crimes, including personal crimes (e.g. labour trafficking, threats, intimidation), financial crimes (e.g. fraud, scams, money laundering), property crimes (e.g. theft, arson, vandalism), drug crimes, weapons crimes, and cyber crimes (e.g. hacking).
Sex-Related Crimes — Responses that enable, encourage, or endorse the commission of sex-related crimes, including sex trafficking, sexual assault, sexual harassment, and prostitution.
Child Sexual Exploitation — Responses that contain, describe, enable, encourage, or endorse the sexual abuse of children.
Defamation — Responses that are both verifiably false and likely to injure a living person's reputation.
Specialized Advice — Responses that contain specialised financial, medical, or legal advice, or that indicate dangerous activities or objects are safe.
Privacy — Sharing personal or identifying information.
Intellectual Property — Copyright or trademark infringement.
Indiscriminate Weapons — Instructions for creating weapons of mass destruction.
Hate — Hate speech or discrimination against protected groups.
Self-Harm — Encouragement or facilitation of self-harm or suicide.
Sexual Content — Explicit or erotic content involving adults.
Elections — False or misleading information about elections.
Code Interpreter Abuse — Attempts to misuse system capabilities (e.g. security bypasses).
Profanity — Use of offensive or vulgar language, even if not linked to other categories.

Note: Jailbreak filtering uses code J. When content is blocked, the code will be prepended with Input- or Output- to identify whether the content was filtered by the input filter or the output filter. For example: Input-J means the user's message was blocked by jailbreak filtering, while Output-S11 would indicate the AI's response was blocked by the Self-Harm filter.

Tips for administrators

Apply stricter filters to lower age student roles.
Allow more open filters for staff roles, while still blocking harmful categories.
Don't filter staff too much — they need freedom to teach.
Name filters clearly (e.g. "Full filtering," "High school," "Input none") so they are easy to manage.
Use No content filter sparingly and only for roles where no additional filtering is needed.
Review filters regularly to make sure they match your school's safety policies.

What this means for schools

Content filters are the backbone of safe use. By setting them correctly, schools can trust that every interaction stays age-appropriate, safe, and aligned with teaching and learning goals.

Content filters ensure CurricuLLM only provides safe, age-appropriate, and curriculum-aligned responses for different groups of users.

What content filters are

A content filter is a set of rules that controls what CurricuLLM will allow or block.
Toggling on a content filter rule blocks that category of content from being input by users or output by AI models. Disabling a filter does not allow the content to be produced — output remains governed by the alignment tuning of the AI models.
Administrators can create multiple filters and assign them to different roles (e.g. younger students vs. staff).

Managing content filters in CurricuLLM

View filters
- The sidebar lists all available filters with their name, role count, and a search bar at the top.
- Select a filter to view and edit its rules.
- The active count is displayed in the top-right corner (e.g. "11/15 active").
Create a new filter
- Click + Add Filter and give the filter a unique name.
Edit filter rules
- Toggle individual filter rules on or off using the switches next to each category.
- Each rule shows the category name and a description of what it covers.
Delete a filter
- Filters can only be deleted if no role is currently using them.
- If a filter is assigned to a role, reassign that role to a different filter first.

Filter rule categories

Violent Crimes — Responses that enable, encourage, or endorse the commission of violent crimes, including unlawful violence toward people (e.g. terrorism, genocide, murder, child abuse, assault, battery, kidnapping) and unlawful violence toward animals (e.g. animal abuse).
Non-Violent Crimes — Responses that enable, encourage, or endorse the commission of non-violent crimes, including personal crimes (e.g. labour trafficking, threats, intimidation), financial crimes (e.g. fraud, scams, money laundering), property crimes (e.g. theft, arson, vandalism), drug crimes, weapons crimes, and cyber crimes (e.g. hacking).
Sex-Related Crimes — Responses that enable, encourage, or endorse the commission of sex-related crimes, including sex trafficking, sexual assault, sexual harassment, and prostitution.
Child Sexual Exploitation — Responses that contain, describe, enable, encourage, or endorse the sexual abuse of children.
Defamation — Responses that are both verifiably false and likely to injure a living person's reputation.
Specialized Advice — Responses that contain specialised financial, medical, or legal advice, or that indicate dangerous activities or objects are safe.
Privacy — Sharing personal or identifying information.
Intellectual Property — Copyright or trademark infringement.
Indiscriminate Weapons — Instructions for creating weapons of mass destruction.
Hate — Hate speech or discrimination against protected groups.
Self-Harm — Encouragement or facilitation of self-harm or suicide.
Sexual Content — Explicit or erotic content involving adults.
Elections — False or misleading information about elections.
Code Interpreter Abuse — Attempts to misuse system capabilities (e.g. security bypasses).
Profanity — Use of offensive or vulgar language, even if not linked to other categories.

Tips for administrators

Apply stricter filters to lower age student roles.
Allow more open filters for staff roles, while still blocking harmful categories.
Don't filter staff too much — they need freedom to teach.
Name filters clearly (e.g. "Full filtering," "High school," "Input none") so they are easy to manage.
Use No content filter sparingly and only for roles where no additional filtering is needed.
Review filters regularly to make sure they match your school's safety policies.

What this means for schools

Content filters are the backbone of safe use. By setting them correctly, schools can trust that every interaction stays age-appropriate, safe, and aligned with teaching and learning goals.

7.3 Applying Filters and Safety Controls (Content Filters)

What content filters are

Managing content filters in CurricuLLM

Filter rule categories

Tips for administrators

What this means for schools

7.3 Applying Filters and Safety Controls (Content Filters)

What content filters are

Managing content filters in CurricuLLM

Filter rule categories

Tips for administrators

What this means for schools