You should absolutely not need to handle ISO 8601 and RFC 3339 manually. They are supported via the Display
and FromStr
trait implementations on every main type in Jiff (Span
, Zoned
, Timestamp
, civil::DateTime
, civil::Date
and civil::Time
). It's technically an implementation of a mixture of ISO 8601, RFC 3339 and RFC 9557, but the grammar is specified precisely by Temporal. See: https://docs.rs/jiff/latest/jiff/fmt/temporal/index.html
How are you doing a date/time library without platform dependencies like libc
or windows-sys
? Are you rolling your own bindings in order to get the local time zone? (Or perhaps you aren't doing that at all.)
Ah gotya, thanks!
Disclosure: I'm the author of the memchr
crate.
You mention the memchr
crate, but you don't seem to have benchmarked it. Instead, you benchmarked the needle
crate (last updated 7 years ago). Can you explain a bit more about your methodology?
The memchr
crate in particular doesn't just use Rabin-Karp. It also uses Two-Way. And SIMD (with support for x86-64, aarch64 and wasm32).
Both Perl and Python use backtracking regex engines and are thus susceptible to similar problems as discussed in the OP.
Cross-posting from reddit:
The PR has more details, but here are a few ad hoc benchmarks using ripgrep on my M2 mac mini while searching a 5.5GB file.
This one is just a case insensitive search. A case insensitive regex expands to something like (ignoring Unicode) [Ss][Hh][Ee][Rr]...
, which means that it has multiple literal prefixes. In fact, you can enumerate them! As long as the set is small enough, this is something that the new SIMD acceleration on aarch64
can handle (and has done for a long time on x86-64
):
$ time rg-before-teddy-aarch64 -i -c 'Sherlock Holmes' OpenSubtitles2018.half.en
3055
real 8.208
user 7.731
sys 0.467
maxmem 5600 MB
faults 191
$ time rg-after-teddy-aarch64 -i -c 'Sherlock Holmes' OpenSubtitles2018.half.en
3055
real 1.137
user 0.695
sys 0.430
maxmem 5904 MB
faults 203
And of course, using multiple literals explicitly also uses this optimization:
$ time rg-before-teddy-aarch64 -c 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' OpenSubtitles2018.half.en
3804
real 9.055
user 8.580
sys 0.474
maxmem 4912 MB
faults 11
$ time rg-after-teddy-aarch64 -c 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' OpenSubtitles2018.half.en
3804
real 1.121
user 0.697
sys 0.422
maxmem 4832 MB
faults 11
And it doesn't just work for prefixes, it also works for inner literals too:
$ time rg-before-teddy-aarch64 -c '\w+\s+(Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty)\s+\w+' OpenSubtitles2018.half.en
773
real 9.065
user 8.586
sys 0.477
maxmem 6384 MB
faults 11
$ time rg-after-teddy-aarch64 -c '\w+\s+(Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty)\s+\w+' OpenSubtitles2018.half.en
773
real 1.124
user 0.702
sys 0.421
maxmem 6784 MB
faults 11
If you're curious about how the SIMD stuff works, you can read my description of Teddy here. I ported this algorithm out of the Hyperscan project several years ago, and it has been one of the killer ingredients for making ripgrep fast in a lot of common cases. But it only worked on x86-64
. With the rise and popularity of aarch64
and Apple silicon, I was motivated to port it over. I just recently finished analogous work for the memchr
crate as well.
Shortly after we resigned, the top-level team leads, project directors to the Foundation, core team members and the new mods got together to form an interim leadership cohort. Sometimes called the "leadership chat." That then evolved into the Leadership Council by way of an RFC on goverance.
Yeah this is a tough one. I'm not sure the right thing to do is for me to go around blasting PRs at those projects. They're probably already carrying support for both
chrono
andtime
, and asking them to support a third that is brand new is a bit of a stretch I think. Especially since I've promised breaking changes in the not-too-distant future. (Although I would like to do a Jiff 1.0 release about 1 year from now and commit to stability.) At least, I know I'd be hesitant if I were on the other side of it. But maybe folks are more flexible than me, I'm not sure.I've been noodling on just adding these integrations to
jiff
itself. I do worry that if I do that, then the integrations will always stay with Jiff, even at 1.0. But maybe there just isn't another feasible choice.But, why do you mention
humantime
?humantime
doesn't have any integrations withtime
orchrono
.humantime
is more like a thin wrapper on top ofstd::time::Duration
andstd::time::SystemTime
to make parsing and printing a bit nicer.