Architecture

Architecture

The Chirps application will execute scans against a taget.

What is in a Scan?

A scan executes one or more policies against one or more assets. A policy is a list of rules. Each rule has a query which is executed against the asset(s). The rule has a match regular expression that will be used to search the results of the query. If a match is found, it is flagged.

When a user kicks off a scan, a Celery task is queued. If multiple assets are selected, multiple tasks are queued. The scan task, found in ./scan/tasks.py, will iterate through each rule in a policy, executing the queries against the scan asset. Results are stored in the database via the Result and Finding models.

What are Assets?

An asset is a destination that rule queries are executed against. Asset providers are responsible for executing the queries and handing back the results to the scan task.

Policy Application

The Policy application provides functionality for managing scanning policies and rules. A Policy consists of a set of rules that define the steps to be executed when scanning an asset. Policies can be created by users or preloaded as templates.

Models

Policy

The Policy model represents a scanning policy. It contains the following fields:

  • name: A CharField with a maximum length of 256 characters.
  • description: A TextField for storing a detailed description of the policy.
  • is_template: A BooleanField indicating whether the policy is a template for other policies.
  • user: A ForeignKey to the User model, binding the policy to a specific user if it isn’t a template. This field is nullable and can be left blank.
  • archived: A BooleanField indicating whether the policy has been archived.
  • current_version: A ForeignKey to the PolicyVersion model, binding the policy to a specific version. This field is nullable and can be left blank.

PolicyVersion

The PolicyVersion model represents a particular version of a Policy. It contains the following fields:

  • number: An IntegerField that keeps track of the policy’s version.
  • created_at: A DateTimeField indicating when the PolicyVersion was created.
  • policy: A ForeignKey to the Policy model.

Rule

The Rule model represents a step to be executed within a policy. It contains the following fields:

  • name: A CharField with a maximum length of 256 characters.
  • query_string: A TextField for storing the query to be run against the asset.
  • query_embedding: A TextField for storing the embedding of the query string. This field is nullable and can be left blank.
  • regex_test: A TextField for storing the regular expression to be run against the response documents.
  • severity: An IntegerField indicating the severity of the problem if the regex test finds results in the response documents.
  • policy: A ForeignKey to the Policy model, indicating the policy this rule belongs to.

Views

dashboard

The dashboard view renders the dashboard for the Policy application. It fetches a list of all available template policies and paginates the results, displaying a default of 25 policies per page.

create

The create view renders the form for creating a new policy.

Loading Policies from JSON Files

Policies can be loaded from JSON files stored in the fixtures/policy directory. All policies are automatically loaded when running the ./manage.py initialize_app command. To load a new policy added to the fixtures directory, use the following command:

./manage.py loaddata /policy/fixtures/policy/<new_plan>.json

Scan Application

Overview

The Scan application provides functionality for managing scans and their results. Scans are executed against one or more assets using selected policies, each of which consists of a set of rules. The results of the scan include the findings for each rule.

Models

Scan

The Scan model represents a single scan run against an asset. It contains the following fields:

  • started_at: A DateTimeField indicating the start time of the scan, automatically set when the scan is created.
  • finished_at: A DateTimeField indicating the completion time of the scan. This field is nullable.
  • description: A TextField for storing a description of the scan.
  • policies: A ManyToManyField to the Policy model.
  • celery_task_id: A CharField with a maximum length of 256 characters, used for storing the associated Celery task ID. This field is nullable.
  • user: A ForeignKey to the User model, indicating the user who initiated the scan. This field is nullable.
  • status: A CharField with a maximum length of 32 characters, storing the status of the scan. The options are ‘Queued’, ‘Running’, ‘Complete’, ‘Failed’, with ‘Queued’ being the default.

Additional methods of Scan model:

  • __str__: Returns the description of the scan.
  • progress: Computes the progress of the scan.
  • duration: Calculates the duration the scan has run.
  • asset_count: Fetches the number of scan assets associated with this scan.
  • findings_count: Fetches the number of findings associated with this scan.

ScanAsset

The ScanAsset model represents a single asset that was scanned. It contains the following fields:

  • started_at: A DateTimeField indicating the start time of the scan of the asset, automatically set when the scan is created.
  • finished_at: A DateTimeField indicating the completion time of the scan of the asset. This field is nullable.
  • scan: A ForeignKey to the Scan model, with the related name ‘scan_run_assets’.
  • asset: A ForeignKey to the BaseAsset model.
  • celery_task_id: A CharField with a maximum length of 256 characters, used for storing the associated Celery task ID. This field is nullable.
  • progress: An IntegerField for storing the progress percentage of the scan of the asset, with a default of 0.

Additional methods of ScanAsset model:

  • __str__: Returns the name of the asset.
  • celery_task_status: Fetches the status of the Celery task associated with this scan.
  • celery_task_output: Fetches the output of the Celery task associated with this scan.

Result

The Result model represents a single result from a rule. It contains the following fields:

  • scan_asset: A ForeignKey to the ScanAsset model, with the related name ‘results’.
  • text: An EncryptedTextField for storing the raw text that was scanned.
  • rule: A ForeignKey to the Rule model.

Additional methods of Result model:

  • has_findings: Returns True if the result has findings, False otherwise.
  • findings_count: Returns the number of findings associated with this result.
  • __str__: Returns the rule name and scan ID as a string.

Finding

The Finding model identifies the location of a finding within a result. It contains the following fields:

  • result: A ForeignKey to the Result model, with the related name ‘findings’.
  • offset: An IntegerField indicating the starting position of the finding in the result text.
  • length: An IntegerField indicating the length of the finding in the result text.

Additional methods of Finding model:

  • __str__: Returns the offset and length as a string, separated by a colon.
  • text: Returns the text of the finding.
  • surrounding_text: Returns the text of the finding, with some surrounding context, highlighted with the ‘text-danger’ CSS class.
  • with_highlight: Returns the entire text searched by the finding’s rule, with the finding highlighted with the ‘bg-danger text-white’ CSS class.

Tasks

scan_task

The scan_task is a Celery task that performs the scan process. It iterates through a policy’s rules and executes them against the asset. The results and findings are then persisted in the database.

Views

finding_detail

The finding_detail view renders the finding detail page. It retrieves a specific finding based on the provided finding_id.

result_detail

The result_detail view renders the scan result detail page. It retrieves specific results based on the provided scan_id, policy_id, and rule_id.

view_scan

The view_scan view renders the details for a particular scan based on the provided scan_id. It aggregates the results and findings of the scan, as well as the severity count for display on the scan detail page.

create

The create view renders the scan creation page and handles the creation of new scans. When a new scan is created, it initiates a scan_task Celery task for each selected asset.

dashboard

The dashboard view renders the scan dashboard. It displays the user’s scans, paginated with a default of 25 scans per page.

status

The status view returns the status of a scan job. It responds with the Celery task status and the progress percentage of the scan.

asset_status

The asset_status view returns the status of a particular scan asset job. It responds with the Celery task status and the progress percentage of the scan asset.

findings_count

The findings_count view returns the number of findings associated with a particular scan. The response is the count of findings for the scan.

Asset Application

Overview

The Asset application provides functionality for managing and interfacing with various vector database services used for storing and searching document embeddings. The supported asset types include Mantium, Redis, and Pinecone.

Models

BaseAsset

The BaseAsset model is a polymorphic base class that all asset models inherit from. It contains the following fields:

  • name: A CharField with a maximum length of 128 characters.
  • user: A ForeignKey to the User model. On delete, it follows the cascade strategy. This field is nullable.
  • html_logo: This is a string field used to define a path to the logo of the asset. The path should be within the static directory. It defaults to None.
  • REQUIRES_EMBEDDINGS: This is a boolean field used to indicate whether or not the model requires embeddings. It defaults to False.

Each derived asset model should implement the search(), test_connection(), and logo_url() methods.

The logo_url() method fetches the logo URL for the asset.

Derived Asset Models

MantiumAsset

The MantiumAsset model represents a Mantium asset. It contains the following fields:

  • app_id: A CharField with a maximum length of 256 characters.
  • client_id: A CharField with a maximum length of 256 characters.
  • client_secret: An EncryptedCharField with a maximum length of 256 characters.
  • top_k: An IntegerField with a default value of 100.
  • html_logo: A string field that represents the path to the logo of the Mantium asset. The path should be within the static directory.
  • html_name: A string field that stores the name of the Mantium asset.
  • html_description: A string field that stores a description of the Mantium asset.

The search() method performs a vector database search against the Mantium asset.

PineconeAsset

The PineconeAsset model represents a Pinecone asset. It contains the following fields:

  • api_key: An EncryptedCharField with a maximum length of 256 characters. This field is editable.
  • environment: A CharField with a maximum length of 256 characters. This field is nullable and can be left blank.
  • index_name: A CharField with a maximum length of 256 characters. This field is nullable and can be left blank.
  • project_name: A CharField with a maximum length of 256 characters. This field is nullable and can be left blank.
  • metadata_text_field: A CharField with a maximum length of 256 characters. This field is not nullable.
  • embedding_model: A CharField with a default value of ‘text-embedding-ada-002’ and a maximum length of 256 characters.
  • embedding_model_service: A CharField with a default value of ‘OpenAI’ and a maximum length of 256 characters.
  • html_logo: A string field that represents the path to the logo of the Pinecone asset. The path should be within the static directory.
  • html_name: A string field that stores the name of the Pinecone asset.
  • html_description: A string field that stores a description of the Pinecone asset.

The search() method performs a search against the Pinecone asset.

RedisAsset

The RedisAsset model represents a Redis asset. It contains the following fields:

  • host: A CharField with a maximum length of 1048 characters.
  • port: A PositiveIntegerField.
  • database_name: A CharField with a maximum length of 256 characters.
  • username: A CharField with a maximum length of 256 characters.
  • password: A CharField with a maximum length of 2048 characters. This field is nullable and can be left blank.
  • index_name: A CharField with a maximum length of 256 characters.
  • text_field: A CharField with a maximum length of 256 characters.
  • embedding_field: A CharField with a maximum length of 256 characters.
  • embedding_model: A CharField with a default value of ‘text-embedding-ada-002’ and a maximum length of 256 characters.
  • embedding_model_service: A CharField with a default value of ‘OpenAI’ and a maximum length of 256 characters.
  • html_logo: A string field that represents the path to the logo of the Redis asset. The path should be within the static directory.
  • html_name: A string field that stores the name of the Redis asset.
  • html_description: A string field that stores a description of the Redis asset.

The search() method performs a search against the Redis asset.

Views

dashboard

The dashboard view renders the asset dashboard. It displays the user’s assets, paginated with a default of 25 assets per page.

create

The create view renders the asset creation page and handles the creation of new assets.

ping

The ping view tests the connection to a RedisAsset database using the test_connection() function.

delete

The delete view deletes an asset from the database.

Providers

These files contain the logic for interfacing with each asset type.

Account Application

Overview

This Django-based Python web application provides user authentication and account management features. The application allows users to sign up, log in, and update their profile information. The user’s profile includes a custom field for storing an OpenAI API key, which is hashed before being saved to the database.

Models

Profile

The Profile model is a custom user profile model that extends Django’s built-in User model with a one-to-one relationship. It contains the following field:

  • openai_api_key: A CharField with a maximum length of 100 characters. This field is optional and can be left blank. It is used to store the user’s OpenAI API key, which is hashed before being saved to the database.

Forms

ProfileForm

ProfileForm is a ModelForm for the Profile model. It includes a custom method clean_openai_key() to hash the OpenAI API key before saving it to the database.

LoginForm

LoginForm is a simple form for handling user logins. It contains two CharFields, username and password, both with a maximum length of 256 characters.

SignupForm

SignupForm is a form for handling user registration. It includes the following fields:

  • username: A CharField with a maximum length of 256 characters.
  • email: An EmailField with a maximum length of 256 characters.
  • password1: A CharField with a maximum length of 256 characters, displayed as a password input field.
  • password2: A CharField with a maximum length of 256 characters, displayed as a password input field. This field is used for password confirmation.

Views

profile

The profile view handles rendering the user’s profile page and processing updates to the profile. If the request method is POST, the view updates the user’s profile with the submitted data. If the request method is GET, the view renders the profile page with the user’s current profile information.

signup

The signup view handles rendering the user registration page and processing new user registrations. If the request method is POST, the view validates the submitted data and creates a new user account and corresponding profile if the data is valid. If the request method is GET, the view renders the registration page.

login_view

The login_view handles rendering the login page and processing user logins. If there are no users in the database, the view redirects to an installation page. If the request method is POST, the view authenticates the user and logs them in if the provided credentials are valid. If the request method is GET, the view renders the login page.