Skip to content

Panic with max_extracted_len #8

@CodedNil

Description

@CodedNil
❯ 21:04:43 [server] thread 'tokio-rt-worker' (104310) panicked at /home/dan/.cargo/registry/src/index.crates
  .io-1949cf8c6b5b557f/rs-trafilatura-0.2.2/src/extract.rs:1115:29:
  21:04:43 [server] assertion failed: self.is_char_boundary(new_len)

This code can split within a character boundary which panics

      // Apply maximum length limit
      if result.content_text.len() > options.max_extracted_len {
          result.content_text.truncate(options.max_extracted_len);
          result.warnings.push(format!(
              "Content truncated to max length: {}",
              options.max_extracted_len
          ));
      }

This can be fixed with the rust 1.91 feature floor_char_boundary or other methods if you want to stick with older rustc.

result.content_text.truncate(result.content_text.floor_char_boundary(options.max_extracted_len));

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions