this post was submitted on 05 May 2024
13 points (100.0% liked)

Code Review

152 readers
1 users here now

Welcome to the programming.dev code review community! This is a place where you can review other people's code or request reviews on your own

Before posting please read the sections below (click on one to expand it)

What is a code review?

What is a code review?

A code review is a process where people who did not write a piece of code look over it to find errors, suggest improvements, and make sure it follows coding standards

When doing a code review look for these aspects:

  1. Correctness: Does the code behave as expected and meet the requirements?
  2. Readability: Is the code easy to understand and maintain?
  3. Efficiency: Are there more efficient ways to accomplish the same task?
  4. Best Practices: Does the code follow established coding standards and best practices?
  5. Security: Are there potential security vulnerabilities?
  6. Scalability: Will the code perform well as the application scales?

Post Guidelines

Guidelines

  • Put the programming language you used for your code in square brackets ahead of your post title. e.g. [Python], [React], [C#], etc.
  • Provide a link to a spot where people can view your code (e.g. github, etc.) in your post body
  • In your post title put a small description of what the thing is that you want reviewed and then you can expand on that in more details if you want in your post body

Rules

Rules

Any code review requests you post after your first must be done after you have reviewed two other people's code (and then you need another two for each request after that as well).

Youre free to post your first code review request in this community though without needing to follow this


Credits

Credits

Icon base by Lorc under CC BY 3.0 with modifications to add a gradient




Wormhole

[email protected]

founded 11 months ago
MODERATORS
 

I came up with an algorithm and implemented it in Rust.
I want to ensure the code is of sufficient quality before releasing it on Crates.io and GitHub. You can also find more info about the algorithm and its purpose there.

I'm eager to hear your feedback and thoughts and am open to ANY suggestions that may help me improve the code further.
The code should be performant, clean, easy to use, idiomatic, etc.
I'm also looking for potentially better names for certain variables and stuff.

Note that some things have changed a bit since the last release (for example - the hash calculation has been changed from blake3(blake3(OBJECT) || blake3(DATE)) to blake3::keyed_hash(DATE, OBJECT) to improve performance by eliminating 2 out of 3 hash calculations), but the README is still valid (and hopefully at least somewhat decent source of info about the algorithm - I did my best trying to explain stuff, but I'm still looking to improve it further in the future).

The functions can panic if they are given a date where the year is outside of the 0000 to 9999 range.
I'm not sure if I should make the function return the result/option instead of panicking since I'd like to avoid having to unwrap() the output and would prefer if the function returned just a simple u32/chrono::NaiveTime (but I'm open to changing my mind on this one).

I'm also curious if there is a cleaner way to create a key from a date. Previously, I've been using format("%Y-%m-%d") to convert it into a string slice and copy it into the key array, but I've found that approach to be very slow (over 50% of the CPU time was being spent just for that alone), so I opted out for an approach you can see below.

The newest function rdvt() is not yet documented anywhere, but I'll do my best to explain it here:   In addition to an object and a date (same as rdv()), it also takes a rank (which is a positive integer - u32).
The function calculates the keyed blake3 hash of an object, using the date and a rank as key, and uses the resulting (pseudorandom) bits to generate and return a random uniform time (chrono::NaiveTime) between 0h and 24h (down to nanosecond precision).
Time is calculated by taking the first bit of the hash and in case it's a binary one - 12h is added to the time, then we add 6h if the 2nd bit of the hash is a one, 3h for the 3rd bit, 1.5h for 4th and so on until the increment reaches the small enough value where it doesn't contribute anything to the time (when it becomes less than 1ns, essentially).   This means if all of the bits in the hash were zeros - time would be zero, and if they were all ones - time would be 23:59:59:999:999:999h (the very last and highest possible value). The code short-circuits and stops earlier than going through all 256 bits since we usually only need around 46 bits before the increment becomes smaller than 1ns (the code stops only in case the sum of tiny sub 1ns increments can't contribute enough to change the last digit in the total time (even if all of the rest of the bits in the hash were to be ones))):  

Feel free to ask ANY questions regarding the code, the algorithm, its function, use cases, or anything else you'd like explained or clarified.  

Here's the code (I haven't written any tests yet, so they are not included for review):  

//! The official Rust implementation of the [RANDEVU](https://github.com/TypicalHog/randevu) algorithm
//!
//! # Example
//! ```rust
//! use chrono::Utc;
//! use randevu::{rdv, rdvt};
//!
//! fn main() {
//!     let object = "THE_SIMPSONS";
//!     let date = Utc::now();
//!     let rdv = rdv(object, &date);
//!     let rdvt = rdvt(0, object, &date);
//!
//!     println!("Object {} has RDV{} today with RDVT0 at {:?}", object, rdv, rdvt);
//! }
//! ```

use blake3;
use chrono::{DateTime, Datelike, NaiveTime, TimeDelta, Utc};
use itoa;

/// Returns the 32-byte KEY `[u8; 32]` created from a given DATE `&DateTime<Utc>` and an optional RANK `Option<u32>`
fn create_key(date: &DateTime<Utc>, rank: Option<u32>) -> [u8; 32] {
    let mut key = [0u8; 32];

    let mut year = Datelike::year(date);
    let mut month = Datelike::month(date);
    let mut day = Datelike::day(date);

    let mut year_len = 4;
    let mut prefix_len = 0;

    // Add a prefix (-/+) if the year is not between 0 and 9999 (-YYYY-MM-DD / +YYYY-MM-DD)
    if year < 0 {
        key[0] = b'-';
        prefix_len = 1;

        year = year.abs(); // Make year positive
    } else if year > 9999 {
        key[0] = b'+';
        prefix_len = 1;
    }

    // Adjust year_len for very large years (both positive and negative)
    if year > 9999 {
        year_len += 1;
        if year > 99999 {
            year_len += 1;
        }
    }

    let full_year_len = prefix_len + year_len;

    // If a rank is provided, write it into the key after the date, separated by an '_'
    if rank != None {
        let mut buffer = itoa::Buffer::new();
        let rank_str = buffer.format(rank.unwrap());
        key[7 + full_year_len..7 + full_year_len + rank_str.len()]
            .copy_from_slice(&rank_str.as_bytes()[..rank_str.len()]);

        key[6 + full_year_len] = b'_';
    }

    // Write the day into the key
    key[5 + full_year_len] = b'0' + (day % 10) as u8;
    day /= 10;
    key[4 + full_year_len] = b'0' + day as u8;

    key[3 + full_year_len] = b'-';

    // Write the month into the key
    key[2 + full_year_len] = b'0' + (month % 10) as u8;
    month /= 10;
    key[1 + full_year_len] = b'0' + month as u8;

    key[full_year_len] = b'-';

    // Write the year into the key
    for i in (prefix_len..full_year_len).rev() {
        key[i] = b'0' + (year % 10) as u8;
        year /= 10;
    }

    key
}

/// Returns the RDV value `u32` for an OBJECT `&str` on a specific DATE `&DateTime<Utc>`
///
/// **RDV = number of leading zero bits in blake3::keyed_hash(key: DATE, data: OBJECT)**
pub fn rdv(object: &str, date: &DateTime<Utc>) -> u32 {
    let hash = blake3::keyed_hash(&create_key(date, None), object.as_bytes());

    // Count the number of leading zero bits in the hash
    let mut rdv = 0;
    for &byte in hash.as_bytes() {
        rdv += byte.leading_zeros();

        if byte != 0 {
            break;
        }
    }

    rdv
}

/// Returns the RDVT time `DateTime<Utc>` of a given RANK `u32` for an OBJECT `&str` on a specific DATE `&DateTime<Utc>`
pub fn rdvt(rank: u32, object: &str, date: &DateTime<Utc>) -> DateTime<Utc> {
    let hash = blake3::keyed_hash(&create_key(date, Some(rank)), object.as_bytes());

    // Calculate the time using bits from the hash
    let mut total: f64 = 0.0;
    let mut increment = 12.0 * 60.0 * 60.0 * 1_000_000_000.0; // 12h in nanoseconds
    for (i, byte) in hash.as_bytes().iter().enumerate() {
        for j in (0..8).rev() {
            let bit = (byte >> j) & 1;
            if bit == 1 {
                total += increment;
            }
            increment /= 2.0;
        }
        // Stop once increments become too small to affect the total
        if i > 4 && (2.0 * increment) < (1.0 - total.fract()) {
            break;
        }
    }

    // Construct the RDVT time from total
    let rdvt = date.with_time(NaiveTime::MIN).unwrap() + TimeDelta::nanoseconds(total as i64);

    rdvt
}

top 10 comments
sorted by: hot top controversial new old
[–] [email protected] 3 points 6 months ago* (last edited 6 months ago) (2 children)

I haven't finished reading the post yet, but I must say: consider publishing it on GitHub (or any other public repository) as it is, no need to make the code super ideal first. Collaboration is exactly what those platforms are good for.

Edit: I mistook you text as not desiring to publish at all before refining. I see that you already have a version published, maybe a branch with updated code could do?

Edit2: NaiveDate omits timezone, I would expect that to cause problems if the code is used to get same rendezvous times independently on different devices

Edit3: instead of bit arithmetic I would've used a precomputed table or a macro that generates the same code as a table. So that it will be something like

match byte {
    0 => {}
    1 => /* add increment*/
    …
    8 => /* add increment*(2 - 1/2⁷) */
}
/* divide increment by 2⁸ */

Maybe it makes sense to do bitshift instead of dividing, maybe it makes sense to instead go from lowest to highest and multiply, but then stop condition should be calculated in advance

Edit4: also, I don't understand the stopping condition of 2*increment > 1 - total.fraq()

[–] [email protected] 1 points 6 months ago (2 children)

I don't know if it's because I'm stupid but I read through this post and the GitHub and I still don't really understand what it does. Could you eli5?

[–] [email protected] 1 points 5 months ago* (last edited 5 months ago)

It's because I suck at explaining stuff.

Imagine if you and a bunch of other people were huge fans or something, let's say some videogame, but there are never enough people playing that game at the same time for it to function (imagine it's a multiplayer game). Now, instead of a single person from the group remembering the game and playing it alone at random times when no one else is - imagine there was a "thing" where you put in the name of the object (in this example a game) and the "thing" spits out a number for that object each day. And every single person who has put in the same exact object (yes, kinda like a password) will get the same numbers out as everyone else. This thing outputs numbers randomly each day (but once again the random number is the same for everyone who put in the same object). There is a 50% chance the system will output 0, 25% chance it will output 1, 12.5% for 2, 6.25% for 3 and so on. Each number is twice as rare as the previous. This means we can expect a 0 about every other day on average, 1 every 4 days, 2 every 8 days, 3 -> 16, 4 -> 32, 5 -> 64, and so on. Ok, so now that we have multiple people getting the same numbers for that game each day - we can decide we'll all come online and play the game when the system gives us a number higher than a certain threshold. For someone would would like to play that game about once a month, they might set the threshold to 5 (2^5 = 32), because we can expect the system to give us a number 5 or more about every 32 days. Another person who is a more hardcore fan might set it at 3 (2^3 = 8) and another user might have it set to like 8 (2^8 = 256). So, when the number hits 5 first 2 users would be notified and might decide to come play the game. When it hits 8 (every 256 days on average), a whole bunch of people would be notified (everyone who has the threshold set to 8 or less). Instead of random people playing the game on completely random and uncoordinated days and there never being a decent amount of people online at the same time. If everyone used this system - even though the game will be dead for most of the time, on certain days, when people get notified to play it they could all come online and play it at the same time. There might be no people playing it for months, but every once in a while (let's say every 64 (2^6) days - when the system outputs 6) everyone interested could come play it together with everyone else who is using the system and the previously dead game could come back alive, if only once every 64ish days.

This spreadsheet I made for a Minecraft server could maybe help with understanding the thing: https://bit.ly/CJRDV

UPDATE: I have an analogy. Imagine you're tossing a coin each day for a certain object until you get tails. Now, imagine if everyone else did the same, but the coin was magical and landed on the same side for all people who tossed it. People could toss the coin each day and decide to come do something, like play an obscure game or watch their favorite movie or YT video on a day where the coins lands head a certain number of time in a row (we stop tosses when we get tails). One day we may get 0 heads (aka we got tails on first throw), tomorrow we may get 2 heads before we get tails. In a week there may come a day where we got 6 heads in a row and assuming out threshold was 6 or less - we decide to come play the game. We can do this for any amount of objects, each object can be thought of as a different password which gives completely different and independent coins tosses. So, everyone who had it set to XONOTIC (a game), would see the coin land the same way.

[–] [email protected] 1 points 6 months ago (1 children)

It lets several people choose the same time for contacting each other when everyone knows some specific event name.

At least I think so, maybe there's more to it, I wasn't very attentive

[–] [email protected] 1 points 6 months ago (1 children)

Oh so it's like knowing the password for a repeating event?

[–] [email protected] 1 points 6 months ago

Maybe ¯\_(ツ)_/¯

[–] [email protected] 1 points 6 months ago* (last edited 6 months ago) (1 children)

Thank you for the answer!

This version is a huge overhaul of the currently live version. I could've made a separate branch for it, but I've never done that before since I haven't really had a need yet. I definitely need to learn how to do it tho, eventually (learn git branches).
I've decided to create version 2.0 and just publish it once I think it's good enough and when I'm confident I've finally decided on the final version of the algorithm.

The algorithm uses NaiveDate and NaiveTime because it's UTC-based. Users need to make sure they pass UTC time to the algorithm. Are you saying problems could arise if people pass a timezonelss local time to the function instead of UTC time?

I might try your suggestion about using the match statement, just not sure how to go about it or if it's even possible to do considering the way total is calculated, but it seems like it might not be very clean (or perhaps it will be even cleaner, idk). I haven't really used macros yet. I don't think we can go from lowest to highest due to the nature of the calculation, except if we go over all 32 bytes.

2.0 * increment < 1 - total.fraq() checks if the sum of all increments from the current point on are able to contribute enough to make the whole part of the total increase. Btw, note the < instead of >.

Example:
Let's imagine the increment is 0.1 and the total is 123.2. Since 2 * 0.1< 1 - 0.2 is true, we can conclude that even if we added all of the rest of the increments to the total (0.1 + 0.05 + 0.025 + 0.0125 + 0.00625 + ... = 0.2), will never be able to change total enough to cause its integer part to increase from 123 to 124, thus, we stop. If the increment was 0.1 and the total was 123.9, we would continue because it is possible the rest of the increments (somewhere between 0.0 to 0.2) could make 123.9 go over 124.

[–] [email protected] 1 points 6 months ago (1 children)

Regarding NaiveDate: if there is a type for date and time that enforces UTC or includes timezone information, use it. Documenting that the user must pass UTC time is a weak spot, whatever you can enforce in types, you should enforce in types.

Regarding the stopping condition: yeah, I got it, it may be a good place to add a comment about why (although it is quite straightforward, so I might have been too stupid with that)

Regarding the macro usage: well, I'm not sure how that should be, but I imagine using a macro to precalculate what are possible coefficients multiplied by increment. Since coefficients will be calculated in compile-time, you may ignore the extra precision (it will not require additional computation) and only check per byte.

Also, it may be so, that removing precision check and doing the math for all the positions you care about will be more efficient because of how optimisations work. Less branching is usually better, and separate bytes may be processed in parallel. That depends on the compiler and the target processor, and I'm far from being expert, so you may want to research that.

[–] [email protected] 2 points 6 months ago* (last edited 6 months ago) (1 children)

Hi, I've modified my code and implemented your suggestion about enforcing UTC with types. I edited the post to include the newest code. I'm just not sure if I should still call the variables/arguments "date" since now it's a DateTime.

[–] [email protected] 2 points 6 months ago

If you feel like datetime or date_time would be clearer, go with it

I will try to take a fresh look and tell how it feels some time later