Skip to content

Add optional support for digit separators and cpp prefixes#369

Open
zaewc wants to merge 1 commit into
fastfloat:mainfrom
zaewc:main
Open

Add optional support for digit separators and cpp prefixes#369
zaewc wants to merge 1 commit into
fastfloat:mainfrom
zaewc:main

Conversation

@zaewc

@zaewc zaewc commented Feb 8, 2026

Copy link
Copy Markdown
Contributor

Add optional support in from_chars_advanced to:

  • skip a configurable digit separator (e.g. ')
  • skip standard cpp prefixes (0x/0X, 0b/0B) before parsing (decimal-only; no hex/binary floats)

Resolves #124

no measurable regression on standard inputs (canada.txt / mesh.txt).

@lemire

lemire commented Feb 15, 2026

Copy link
Copy Markdown
Member

Thanks, to be reviewed.

@lemire

lemire commented Jun 6, 2026

Copy link
Copy Markdown
Member

I rebased the PR here : https://github.com/fastfloat/fast_float/tree/pr-369-rebase

I get that this PR slows down the performance by 5% to 10%.

We could hack things so that the it is performance neutral.

But I am not convinced. I don't know who needs this feature. I am concerned about adding a function that nobody would use but that we would need to maintain.

zaewc added a commit to zaewc/fast_float that referenced this pull request Jun 18, 2026
Re-implements the optional digit-separator and base-prefix parsing
(originally PR fastfloat#369) on top of the current store_spans hot-path
architecture, with the goal of zero overhead when the features are not
used.

Key changes vs. the original PR:

- has_separator is now a *compile-time* template parameter on
  parse_number_string, dispatched once (from_chars_advanced ->
  parse_number_string_options) based on options.digit_separator. The
  has_separator==false instantiation that every default caller uses is
  byte-for-byte the separator-free parser: no separator comparison ever
  enters a digit loop and the SIMD eight-digit fast path is preserved.
  The separator-aware code lives only in the cold true instantiation.

- The store_spans no-span hot path (added to main after the original PR
  was branched) is preserved.

- parse_options_t fields are ordered so the two new single-byte fields
  (digit_separator, format_options) fall into the existing padding for
  UC == char. sizeof(parse_options_t<char>) stays 16 bytes, so the
  struct is still register-passed and the call boundary is unchanged.

Result: for the default (no separator, no prefix) path, the generated
assembly of from_chars<double> is identical to main, and benchmarks show
no measurable regression (the original PR was 5-10% slower).

Adds basictest coverage for digit separators (including the >19-digit
overflow re-scan paths) and prefix skipping.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Re-implements the optional digit-separator and base-prefix parsing
(originally PR fastfloat#369) on top of the current store_spans hot-path
architecture, with the goal of zero overhead when the features are not
used.

Key changes vs. the original PR:

- has_separator is now a *compile-time* template parameter on
  parse_number_string, dispatched once (from_chars_advanced ->
  parse_number_string_options) based on options.digit_separator. The
  has_separator==false instantiation that every default caller uses is
  byte-for-byte the separator-free parser: no separator comparison ever
  enters a digit loop and the SIMD eight-digit fast path is preserved.
  The separator-aware code lives only in the cold true instantiation.

- The store_spans no-span hot path (added to main after the original PR
  was branched) is preserved.

- parse_options_t fields are ordered so the two new single-byte fields
  (digit_separator, format_options) fall into the existing padding for
  UC == char. sizeof(parse_options_t<char>) stays 16 bytes, so the
  struct is still register-passed and the call boundary is unchanged.

Result: for the default (no separator, no prefix) path, the generated
assembly of from_chars<double> is identical to main, and benchmarks show
no measurable regression (the original PR was 5-10% slower).

Adds basictest coverage for digit separators (including the >19-digit
overflow re-scan paths) and prefix skipping.
@zaewc

zaewc commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

@lemire

I think the useful precedent here is rust-lexical. Its float parser uses an Eisel-Lemire implementation, so this is not a random general-purpose parser with unrelated tradeoffs.

lexical explicitly supports this class of use case: configurable digit separators, base prefixes/radices, and predefined number formats for data formats and language literal grammars such as JSON, TOML, YAML, Rust, Python, C++, C#, FORTRAN, COBOL, and others.

“who would use this”

-> users parsing numeric literals or config/data formats whose grammar is intentionally wider than std::from_chars.

@zaewc

zaewc commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

@lemire

The performance concern is also handled in the same spirit. In lexical, the number format is a compile-time format specification; in this PR, the separator path is similarly isolated behind a compile-time has_separator template parameter. The default has_separator == false instantiation used by normal callers does not put a separator comparison in the digit loop, preserves the existing SIMD fast path, and in the current revision generates the same assembly for the default from_chars<double> path as main.

So I agree this feature should not regress the default parser. My point is that the current design confines the extra behavior to an opt-in cold instantiation, while keeping the standard-number hot path unchanged. That seems to address both concerns: no runtime cost for default users, and a bounded maintenance surface for users who actually need language/config literal parsing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature?: ignored characters `'_ / allowing prefixs 0x and 0b etc...

2 participants