Update Rust Float-Parsing Algorithms to use the Eisel-Lemire algorithm. #86761
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @scottmcm (or someone else) soon. Please see the contribution instructions for more information. |
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
#31109 |
I expect that this PR would fix #31407 |
AsciiStr I've updated AsciiStr to use the slice primitive internally, and also deduplicated a lot of code. Name Changes I've also updated numerous names for methods in common.rs to better reflect safety/unsafety. For example, Testing A few UI regression tests and core regression tests have been added, to address #31109 and #31407, while a few UI tests have been deleted (since they are no longer relevant). Deleted UI Tests Added UI Tests // run-pass
// ignore-tidy-linelength
// Regression test for #31109 and #31407.
// This test should compile and produce expected output.
pub fn main() {
let n: f64 = 0.3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333;
assert_eq!(n.to_bits(), 0x3fd5555555555555);
let n: f64 = 1234567890123456789012345678901234567890e-340;
assert_eq!(n.to_bits(), 0x1752a64e34ba0d3);
} Added Core Tests #[test]
fn issue31109() {
// Regression test for #31109.
// Ensure the test produces a valid float with the expected bit pattern.
let p = dec2flt::<f64>(
"0.3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
);
assert_eq!(p.map(|x| x.to_bits()), Ok(0x3fd5555555555555));
}
#[test]
fn issue31407() {
// Regression test for #31407.
// Ensure the test produces a valid float with the expected bit pattern.
let p = dec2flt::<f64>("1234567890123456789012345678901234567890e-340");
assert_eq!(p.map(|x| x.to_bits()), Ok(0x1752a64e34ba0d3));
} A few things, however, I am unsure about, and maybe @est31, @estebank, or @jyn514 might help enlighten me on this: --- a/compiler/rustc_mir_build/src/thir/cx/mod.rs
+++ b/compiler/rustc_mir_build/src/thir/cx/mod.rs
@@ -67,12 +67,6 @@ fn new(tcx: TyCtxt<'tcx>, def: ty::WithOptConstParam<LocalDefId>) -> Cx<'tcx> {
match self.tcx.at(sp).lit_to_const(LitToConstInput { lit, ty, neg }) {
Ok(c) => c,
- Err(LitToConstError::UnparseableFloat) => {
- // FIXME(#31407) this is only necessary because float parsing is buggy
- self.tcx.sess.span_err(sp, "could not evaluate float literal (see issue #31407)");
- // create a dummy value and continue compiling
- self.tcx.const_error(ty)
- }
--- a/compiler/rustc_mir_build/src/thir/constant.rs
+++ b/compiler/rustc_mir_build/src/thir/constant.rs
@@ -47,7 +47,7 @@
trunc(if neg { (*n as i128).overflowing_neg().0 as u128 } else { *n })?
}
(ast::LitKind::Float(n, _), ty::Float(fty)) => {
- parse_float(*n, *fty, neg).map_err(|_| LitToConstError::UnparseableFloat)?
+ parse_float(*n, *fty, neg).map_err(|_| LitToConstError::Reported)?
}
--- a/compiler/rustc_mir_build/src/thir/pattern/mod.rs
+++ b/compiler/rustc_mir_build/src/thir/pattern/mod.rs
@@ -563,10 +562,6 @@ fn lower_lit(&mut self, expr: &'tcx hir::Expr<'tcx>) -> PatKind<'tcx> {
LitToConstInput { lit: &lit.node, ty: self.typeck_results.expr_ty(expr), neg };
match self.tcx.at(expr.span).lit_to_const(lit_input) {
Ok(val) => *self.const_to_pat(val, expr.hir_id, lit.span, false).kind,
- Err(LitToConstError::UnparseableFloat) => {
- self.errors.push(PatternError::FloatBug);
- PatKind::Wild
- }
Err(LitToConstError::Reported) => PatKind::Wild,
Err(LitToConstError::TypeError) => bug!("lower_lit: had type error"),
} Likewise, --- a/compiler/rustc_mir_build/src/thir/pattern/check_match.rs
+++ b/compiler/rustc_mir_build/src/thir/pattern/check_match.rs
@@ -84,7 +84,7 @@ fn visit_param(&mut self, param: &'tcx hir::Param<'tcx>) {
}
impl PatCtxt<'_, '_> {
- fn report_inlining_errors(&self, pat_span: Span) {
+ fn report_inlining_errors(&self) {
for error in &self.errors {
match *error {
PatternError::StaticInPattern(span) => {
@@ -96,14 +96,6 @@ fn report_inlining_errors(&self, pat_span: Span) {
PatternError::ConstParamInPattern(span) => {
self.span_e0158(span, "const parameters cannot be referenced in patterns")
}
- PatternError::FloatBug => {
- // FIXME(#31407) this is only necessary because float parsing is buggy
- rustc_middle::mir::interpret::struct_error(
- self.tcx.at(pat_span),
- "could not evaluate float literal (see issue #31407)",
- )
- .emit();
- }
--- a/compiler/rustc_mir_build/src/thir/pattern/mod.rs
+++ b/compiler/rustc_mir_build/src/thir/pattern/mod.rs
@@ -31,7 +31,6 @@
AssocConstInPattern(Span),
ConstParamInPattern(Span),
StaticInPattern(Span),
- FloatBug,
NonConstPath(Span),
} The major question I have is:
Questions 2 and 3 refers to parse_float here. Update:
|
@Alexhuszagh fixing bugs does not require approval from t-lang. I don't know about the other questions but it's ok to make changes as follow-ups if this one gets too large. |
Ok, if there's any other changes required (or this PR is too large to adequately review), I'm more than happy to break it into multiple PRs. Let me know, and I'll gladly make the changes as is seen fit. My goal here is to make (what is obviously a fairly large PR with some complex logic) as easy and simple to understand (and audit) as is possible. |
This comment has been hidden.
This comment has been hidden.
The section in the PR description should be updated to use github's issue closing keywords for the fixed issues. |
I got about halfway through and ran out of time; I will probably have similar comments for dec2flt::slow
though.
In general, you have a lot of unsafe here, mostly related to indexing. I tried to review it all but I'm sure I missed some. Have you benchmarked it without the unsafe indexing? It would make me feel better if this had more bounds checks ...
You should be able to fix the test failures with |
I have not, I'm going to try to remove a lot of the unsafe now, and then benchmark it vs. the original implementation. I was trying to remove some unsafe, and then work on making the rest easy to audit, but obviously, that has its own limitations (fast-float-rust is littered with unsafe code that doesn't need to be unsafe and deceptively marked as safe, but the code is well fuzzed). I'm going to do some extensive assembly checks to ensure with decent optimization levels that everything looks similar, and then check benchmarks to ensure that in the macro sense, nothing changes. |
In which sense did the old code rely on UB? |
I meant the old code in fast-float-rust, not in Rust core. There was a function or two (with PRs currently waiting approval) that rely on UB. I've updated this in case it wasn't clear. |
Oh I see, thanks. Once this lands, Miri will also test it. You can try it yourself with the scripts and instructions at https://github.com/RalfJung/miri-test-libstd. If you looked at the float parsing tests i libcore you probably saw the |
@bors r+ Approving for merging now, but still putting this on blast at @rust-lang/compiler in case anyone has any objections and they wish to stop this before merge, or discuss on the next meeting (next thursday) and potentially revert then. |
|
@Alexhuszagh I don't suppose you have any insight on whether Rust could be doing any better in the other direction, from floats into strings? cc #52811 |
Update Rust Float-Parsing Algorithms to use the Eisel-Lemire algorithm. # Summary Rust, although it implements a correct float parser, has major performance issues in float parsing. Even for common floats, the performance can be 3-10x [slower](https://arxiv.org/pdf/2101.11408.pdf) than external libraries such as [lexical](https://github.com/Alexhuszagh/rust-lexical) and [fast-float-rust](https://github.com/aldanor/fast-float-rust). Recently, major advances in float-parsing algorithms have been developed by Daniel Lemire, along with others, and implement a fast, performant, and correct float parser, with speeds up to 1200 MiB/s on Apple's M1 architecture for the [canada](https://github.com/lemire/simple_fastfloat_benchmark/blob/0e2b5d163d4074cc0bde2acdaae78546d6e5c5f1/data/canada.txt) dataset, 10x faster than Rust's 130 MiB/s. In addition, [edge-cases](rust-lang#85234) in Rust's [dec2flt](https://github.com/rust-lang/rust/tree/868c702d0c9a471a28fb55f0148eb1e3e8b1dcc5/library/core/src/num/dec2flt) algorithm can lead to over a 1600x slowdown relative to efficient algorithms. This is due to the use of Clinger's correct, but slow [AlgorithmM and Bellepheron](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.45.4152&rep=rep1&type=pdf), which have been improved by faster big-integer algorithms and the Eisel-Lemire algorithm, respectively. Finally, this algorithm provides substantial improvements in the number of floats the Rust core library can parse. Denormal floats with a large number of digits cannot be parsed, due to use of the `Big32x40`, which simply does not have enough digits to round a float correctly. Using a custom decimal class, with much simpler logic, we can parse all valid decimal strings of any digit count. ```rust // Issue in Rust's dec2fly. "2.47032822920623272088284396434110686182e-324".parse::<f64>(); // Err(ParseFloatError { kind: Invalid }) ``` # Solution This pull request implements the Eisel-Lemire algorithm, modified from [fast-float-rust](https://github.com/aldanor/fast-float-rust) (which is licensed under Apache 2.0/MIT), along with numerous modifications to make it more amenable to inclusion in the Rust core library. The following describes both features in fast-float-rust and improvements in fast-float-rust for inclusion in core. **Documentation** Extensive documentation has been added to ensure the code base may be maintained by others, which explains the algorithms as well as various associated constants and routines. For example, two seemingly magical constants include documentation to describe how they were derived as follows: ```rust // Round-to-even only happens for negative values of q // when q ≥ −4 in the 64-bit case and when q ≥ −17 in // the 32-bitcase. // // When q ≥ 0,we have that 5^q ≤ 2m+1. In the 64-bit case,we // have 5^q ≤ 2m+1 ≤ 2^54 or q ≤ 23. In the 32-bit case,we have // 5^q ≤ 2m+1 ≤ 2^25 or q ≤ 10. // // When q < 0, we have w ≥ (2m+1)×5^−q. We must have that w < 2^64 // so (2m+1)×5^−q < 2^64. We have that 2m+1 > 2^53 (64-bit case) // or 2m+1 > 2^24 (32-bit case). Hence,we must have 2^53×5^−q < 2^64 // (64-bit) and 2^24×5^−q < 2^64 (32-bit). Hence we have 5^−q < 2^11 // or q ≥ −4 (64-bit case) and 5^−q < 2^40 or q ≥ −17 (32-bitcase). // // Thus we have that we only need to round ties to even when // we have that q ∈ [−4,23](in the 64-bit case) or q∈[−17,10] // (in the 32-bit case). In both cases,the power of five(5^|q|) // fits in a 64-bit word. const MIN_EXPONENT_ROUND_TO_EVEN: i32; const MAX_EXPONENT_ROUND_TO_EVEN: i32; ``` This ensures maintainability of the code base. **Improvements for Disguised Fast-Path Cases** The fast path in float parsing algorithms attempts to use native, machine floats to represent both the significant digits and the exponent, which is only possible if both can be exactly represented without rounding. In practice, this means that the significant digits must be 53-bits or less and the then exponent must be in the range `[-22, 22]` (for an f64). This is similar to the existing dec2flt implementation. However, disguised fast-path cases exist, where there are few significant digits and an exponent above the valid range, such as `1.23e25`. In this case, powers-of-10 may be shifted from the exponent to the significant digits, discussed at length in rust-lang#85198. **Digit Parsing Improvements** Typically, integers are parsed from string 1-at-a-time, requiring unnecessary multiplications which can slow down parsing. An approach to parse 8 digits at a time using only 3 multiplications is described in length [here](https://johnnylee-sde.github.io/Fast-numeric-string-to-int/). This leads to significant performance improvements, and is implemented for both big and little-endian systems. **Unsafe Changes** Relative to fast-float-rust, this library makes less use of unsafe functionality and clearly documents it. This includes the refactoring and documentation of numerous unsafe methods undesirably marked as safe. The original code would look something like this, which is deceptively marked as safe for unsafe functionality. ```rust impl AsciiStr { #[inline] pub fn step_by(&mut self, n: usize) -> &mut Self { unsafe { self.ptr = self.ptr.add(n) }; self } } ... #[inline] fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 { // the first character is 'e'/'E' and scientific mode is enabled let start = *s; s.step(); ... } ``` The new code clearly documents safety concerns, and does not mark unsafe functionality as safe, leading to better safety guarantees. ```rust impl AsciiStr { /// Advance the view by n, advancing it in-place to (n..). pub unsafe fn step_by(&mut self, n: usize) -> &mut Self { // SAFETY: same as step_by, safe as long n is less than the buffer length self.ptr = unsafe { self.ptr.add(n) }; self } } ... /// Parse the scientific notation component of a float. fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 { let start = *s; // SAFETY: the first character is 'e'/'E' and scientific mode is enabled unsafe { s.step(); } ... } ``` This allows us to trivially demonstrate the new implementation of dec2flt is safe. **Inline Annotations Have Been Removed** In the previous implementation of dec2flt, inline annotations exist practically nowhere in the entire module. Therefore, these annotations have been removed, which mostly does not impact [performance](aldanor/fast-float-rust#15 (comment)). **Fixed Correctness Tests** Numerous compile errors in `src/etc/test-float-parse` were present, due to deprecation of `time.clock()`, as well as the crate dependencies with `rand`. The tests have therefore been reworked as a [crate](https://github.com/Alexhuszagh/rust/tree/master/src/etc/test-float-parse), and any errors in `runtests.py` have been patched. **Undefined Behavior** An implementation of `check_len` which relied on undefined behavior (in fast-float-rust) has been refactored, to ensure that the behavior is well-defined. The original code is as follows: ```rust #[inline] pub fn check_len(&self, n: usize) -> bool { unsafe { self.ptr.add(n) <= self.end } } ``` And the new implementation is as follows: ```rust /// Check if the slice at least `n` length. fn check_len(&self, n: usize) -> bool { n <= self.as_ref().len() } ``` Note that this has since been fixed in [fast-float-rust](aldanor/fast-float-rust#29). **Inferring Binary Exponents** Rather than explicitly store binary exponents, this new implementation infers them from the decimal exponent, reducing the amount of static storage required. This removes the requirement to store [611 i16s](https://github.com/rust-lang/rust/blob/868c702d0c9a471a28fb55f0148eb1e3e8b1dcc5/library/core/src/num/dec2flt/table.rs#L8). # Code Size The code size, for all optimizations, does not considerably change relative to before for stripped builds, however it is **significantly** smaller prior to stripping the resulting binaries. These binary sizes were calculated on x86_64-unknown-linux-gnu. **new** Using rustc version 1.55.0-dev. opt-level|size|size(stripped) |:-:|:-:|:-:| 0|400k|300K 1|396k|292K 2|392k|292K 3|392k|296K s|396k|292K z|396k|292K **old** Using rustc version 1.53.0-nightly. opt-level|size|size(stripped) |:-:|:-:|:-:| 0|3.2M|304K 1|3.2M|292K 2|3.1M|284K 3|3.1M|284K s|3.1M|284K z|3.1M|284K # Correctness The dec2flt implementation passes all of Rust's unittests and comprehensive float parsing tests, along with numerous other tests such as Nigel Toa's comprehensive float [tests](https://github.com/nigeltao/parse-number-fxx-test-data) and Hrvoje Abraham [strtod_tests](https://github.com/ahrvoje/numerics/blob/master/strtod/strtod_tests.toml). Therefore, it is unlikely that this algorithm will incorrectly round parsed floats. # Issues Addressed This will fix and close the following issues: - resolves rust-lang#85198 - resolves rust-lang#85214 - resolves rust-lang#85234 - fixes rust-lang#31407 - fixes rust-lang#31109 - fixes rust-lang#53015 - resolves rust-lang#68396 - closes aldanor/fast-float-rust#15
I do have some insight, but Rust is already decently good (IIRC, it uses Grisu3, a decent algorithm by default, and for precision formatting, it uses Dragon4, which is slower but always rounds correctly). The Ryu algorithm, which Dtolnay has an implementation of, and an even faster one (the Schubfach algorithm). I'd be happy to write an implementation (I'm already planning to), but this will take a while. A fast implementation based on it exists here. |
Implementation is based off fast-float-rust, with a few notable changes. - Some unsafe methods have been removed. - Safe methods with inherently unsafe functionality have been removed. - All unsafe functionality is documented and provably safe. - Extensive documentation has been added for simpler maintenance. - Inline annotations on internal routines has been removed. - Fixed Python errors in src/etc/test-float-parse/runtests.py. - Updated test-float-parse to be a library, to avoid missing rand dependency. - Added regression tests for rust-lang#31109 and rust-lang#31407 in core tests. - Added regression tests for rust-lang#31109 and rust-lang#31407 in ui tests. - Use the existing slice primitive to simplify shared dec2flt methods - Remove Miri ignores from dec2flt, due to faster parsing times. - resolves rust-lang#85198 - resolves rust-lang#85214 - resolves rust-lang#85234 - fixes rust-lang#31407 - fixes rust-lang#31109 - fixes rust-lang#53015 - resolves rust-lang#68396 - closes aldanor/fast-float-rust#15
@bors r=estebank |
|
|
Just a heads up: This PR was flagged as causing a 3.2% regression to the incr-patched compile-times on style-servo-debug. I do see that a perf run was done on this PR locally which did not predict any regression to style-servo-debug. So I'm inclined to assume that the regression was just noise. But I did want to make a note of it, nonetheless. (And of course this PR did also have a bunch of performance improvements, which were predicted from the perf run done on the PR itself.) |
@pnkfelix those regressions are from |
Alexhuszagh commentedJun 30, 2021
•
edited
Summary
Rust, although it implements a correct float parser, has major performance issues in float parsing. Even for common floats, the performance can be 3-10x slower than external libraries such as lexical and fast-float-rust.
Recently, major advances in float-parsing algorithms have been developed by Daniel Lemire, along with others, and implement a fast, performant, and correct float parser, with speeds up to 1200 MiB/s on Apple's M1 architecture for the canada dataset, 10x faster than Rust's 130 MiB/s.
In addition, edge-cases in Rust's dec2flt algorithm can lead to over a 1600x slowdown relative to efficient algorithms. This is due to the use of Clinger's correct, but slow AlgorithmM and Bellepheron, which have been improved by faster big-integer algorithms and the Eisel-Lemire algorithm, respectively.
Finally, this algorithm provides substantial improvements in the number of floats the Rust core library can parse. Denormal floats with a large number of digits cannot be parsed, due to use of the
Big32x40
, which simply does not have enough digits to round a float correctly. Using a custom decimal class, with much simpler logic, we can parse all valid decimal strings of any digit count.Solution
This pull request implements the Eisel-Lemire algorithm, modified from fast-float-rust (which is licensed under Apache 2.0/MIT), along with numerous modifications to make it more amenable to inclusion in the Rust core library. The following describes both features in fast-float-rust and improvements in fast-float-rust for inclusion in core.
Documentation
Extensive documentation has been added to ensure the code base may be maintained by others, which explains the algorithms as well as various associated constants and routines. For example, two seemingly magical constants include documentation to describe how they were derived as follows:
This ensures maintainability of the code base.
Improvements for Disguised Fast-Path Cases
The fast path in float parsing algorithms attempts to use native, machine floats to represent both the significant digits and the exponent, which is only possible if both can be exactly represented without rounding. In practice, this means that the significant digits must be 53-bits or less and the then exponent must be in the range
[-22, 22]
(for an f64). This is similar to the existing dec2flt implementation.However, disguised fast-path cases exist, where there are few significant digits and an exponent above the valid range, such as
1.23e25
. In this case, powers-of-10 may be shifted from the exponent to the significant digits, discussed at length in #85198.Digit Parsing Improvements
Typically, integers are parsed from string 1-at-a-time, requiring unnecessary multiplications which can slow down parsing. An approach to parse 8 digits at a time using only 3 multiplications is described in length here. This leads to significant performance improvements, and is implemented for both big and little-endian systems.
Unsafe Changes
Relative to fast-float-rust, this library makes less use of unsafe functionality and clearly documents it. This includes the refactoring and documentation of numerous unsafe methods undesirably marked as safe. The original code would look something like this, which is deceptively marked as safe for unsafe functionality.
The new code clearly documents safety concerns, and does not mark unsafe functionality as safe, leading to better safety guarantees.
This allows us to trivially demonstrate the new implementation of dec2flt is safe.
Inline Annotations Have Been Removed
In the previous implementation of dec2flt, inline annotations exist practically nowhere in the entire module. Therefore, these annotations have been removed, which mostly does not impact performance.
Fixed Correctness Tests
Numerous compile errors in
src/etc/test-float-parse
were present, due to deprecation oftime.clock()
, as well as the crate dependencies withrand
. The tests have therefore been reworked as a crate, and any errors inruntests.py
have been patched.Undefined Behavior
An implementation of
check_len
which relied on undefined behavior (in fast-float-rust) has been refactored, to ensure that the behavior is well-defined. The original code is as follows:And the new implementation is as follows:
Note that this has since been fixed in fast-float-rust.
Inferring Binary Exponents
Rather than explicitly store binary exponents, this new implementation infers them from the decimal exponent, reducing the amount of static storage required. This removes the requirement to store 611 i16s.
Code Size
The code size, for all optimizations, does not considerably change relative to before for stripped builds, however it is significantly smaller prior to stripping the resulting binaries. These binary sizes were calculated on x86_64-unknown-linux-gnu.
new
Using rustc version 1.55.0-dev.
old
Using rustc version 1.53.0-nightly.
Correctness
The dec2flt implementation passes all of Rust's unittests and comprehensive float parsing tests, along with numerous other tests such as Nigel Toa's comprehensive float tests and Hrvoje Abraham strtod_tests. Therefore, it is unlikely that this algorithm will incorrectly round parsed floats.
Issues Addressed
This will fix and close the following issues:
The text was updated successfully, but these errors were encountered: