Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support HAR exporter (http) layer in rama #357

Open
GlenDC opened this issue Dec 4, 2024 · 12 comments
Open

support HAR exporter (http) layer in rama #357

GlenDC opened this issue Dec 4, 2024 · 12 comments
Assignees
Labels
easy An easy issue to pick up for anyone. good first issue Good for newcomers mentor available A mentor is available to help you through the issue.
Milestone

Comments

@GlenDC
Copy link
Member

GlenDC commented Dec 4, 2024

HAR File wikipedia: https://en.wikipedia.org/wiki/HAR_(file_format)

Only official looking doc I could find was https://w3c.github.io/web-performance/specs/HAR/Overview.html

Goal of this issue is to support Recording and exporting of HAR files,
of course it doesn't have to be a file to which we export,
but in the most common use case it will be.

This feature can be used by:

  • any developer to help get insights in the traffic flowing through their rama-based client/server/proxy
  • some (UA emulation related) rama tests regarding http data preservation (e.g. header casing, order, pseudoheaders, cookies, ...)

Requirements

Both service and layer will take the following input:

  • a Trigger trait, for now probably only implemented for FnOnce + Clone? Used to create a future (signal) to toggle between record and stop, where on stop the data is exported

The service starts in stopped modus (== not active) and is to be activated once signal is true, this can be spawned (gracefully) as a task really where the service just keeps track of an AtomicBool. Important is that that background tasks does exit in case the context is cancelled.

When the service is in record mode it will need to buffer each incoming http request in a prepared manner, har-ready.

This data can be kept already in a custom struct format (with these structs being serde::Serialize+Deserialize).

When the toggle stops the record it will need to serialize the buffer as a json blob and write it to the Writer, which is basically just an ascyn io writer.

Required input for the Service/Layer:

  • Trigger object
  • Writer

Custom options for the service/layer:

  • meta information such as service info, author info, ...
  • custom comments

This feature should also be well unit tested.

We also require an Example file which inspects the har file exists and is valid (can be as basic as just verifying the default format, not need to have 100% fail proof test. If the file exists, and it is a valid json object, which contains the expected root object and entries, where one entry can be verified a bit more in depth, it is probably good enough for what that e2e example test is concerned.

@GlenDC GlenDC added good first issue Good for newcomers easy An easy issue to pick up for anyone. mentor available A mentor is available to help you through the issue. labels Dec 4, 2024
@GlenDC GlenDC modified the milestone: v0.2 Dec 4, 2024
@GlenDC GlenDC added this to the v0.2 milestone Dec 5, 2024
@hafihaf123
Copy link
Contributor

Hello,

I am very interested in contributing to this issue. Although I am new to Rust and this would be my first time contributing to open-source, I am eager to learn and would greatly appreciate any guidance to help me address this task. While I will do my best to make progress promptly, I may require additional time and support to ensure the solution meets the project’s standards.

Thank you for considering my request!

@GlenDC
Copy link
Member Author

GlenDC commented Dec 7, 2024

Hi @hafihaf123 this issue is still available, it might however be a bit much both in terms of complexity and the amount of work required if I read your self described background.

I just created a new ticket. Would you be interested in try that one (first)? It also has me as a mentor available.

#358

If you want to pick that one up you can let it be known in that issue.

Nice to meet you btw!

@Hrushi20
Copy link
Contributor

Hrushi20 commented Jan 1, 2025

Hey! I'm interested in working on this issue. Looks fun to solve it. Is it open for grabs?

@GlenDC
Copy link
Member Author

GlenDC commented Jan 1, 2025

Yes, but I would prefer that you finish your existing issue prior to starting a new one.
If by then it is still available it is all yours.

@GlenDC
Copy link
Member Author

GlenDC commented Jan 2, 2025

Do ask if something is not clear btw @Hrushi20 or if you need help/guidance in any way. This is by no means a small issue :) But at the same time be aware of scope creep. The goal of this issue is minimal enough.

@ASamedWalker
Copy link

Hi,
I am very interested in contributing to this issue. Although I am new to Rust and this would be my first time contributing to open-source, I am eager to learn and would greatly appreciate any guidance to help me address this task. While I will do my best to make progress promptly, I may require additional time and support to ensure the solution meets the project’s standards. Thank you for considering my request!

@GlenDC
Copy link
Member Author

GlenDC commented Jan 4, 2025

I believe @Hrushi20 was going to pick this one up already. Is that still the case @Hrushi20 ?

@Hrushi20
Copy link
Contributor

Hrushi20 commented Jan 4, 2025

Yeah. I was going through the rama documentation to understand things on a high level

@Hrushi20
Copy link
Contributor

Hrushi20 commented Jan 4, 2025

Hey! I just wanted to clarify my understanding about problem statement:

  • The trigger trait which needs to be created takes a function as input. This function is responsible for starting/stopping HAR file recording. We control the Trigger Trait via the Layer. The Layer communicates the Trigger to the Service.
  • The HAR service is spawned as a new task listening to requests.
  • The HAR service internally has an Atomic Boolean. Toggling the boolean ensures tracking the state of recording of HAR files. What does the stopped modulus mean here?
  • The HAR service is also responsible for observing incoming Requests, validate if it's HAR file. (Or can we assume HAR service always receives HAR files as part of request?) and write the data using a writer.
  • Will create a HAR data structs as per the w3c spec.

@GlenDC
Copy link
Member Author

GlenDC commented Jan 4, 2025

Hmmm you need to take a step back I think.

Take a look at https://github.com/plabayo/rama/blob/main/rama-http/src/layer/traffic_writer/request.rs, as the HarService and HarLayer are very similar to this.

the HarLayer is a rama::Layer which produces a HarService (struct) that implements rama::Service. That's the basic gist of it. And the HarService Service implementation would be in pseudo-real code something like this:

impl<State, S, W, ReqBody, ResBody> Service<State, Request<ReqBody>> for HarService<S, W>
where
    State: Clone + Send + Sync + 'static,
    S: Service<State, Request, Response = Response<ResBody>, Error: Into<BoxError>>,
    W: RequestWriter,
    ReqBody: http_body::Body<Data = Bytes, Error: Into<BoxError>> + Send + Sync + 'static,
    ResBody: Send + 'static,
{
    type Response = Response<ResBody>;
    type Error = BoxError;

    async fn serve(
        &self,
        ctx: Context<State>,
        req: Request<ReqBody>,
    ) -> Result<Self::Response, Self::Error> {
        match self.recorder() {
             Some(recorder) => {
                     recorder.record_request(&req)?;
                     let response = self.inner.serve(ctx, req).await?;
                     recorder.record_response(&response)?;
                     Ok(response)
             },
             None => Ok(self.inner.serve(ctx, req).await?),
        }
    }
}

Just an example though, but the high level idea is as simple as that.

The recorder would be just something internally nothing we expose as this is not a generic service. Whether or not there is an active recorder depends whether or not we are recording.

So the layer and service could have the following configuration options:

And that's about it.

the Recorder would be there to collect the request and send it together with the response to the internal collector when it drops. This also mean it can send it as just the request in case there was never a response, that's fine as well. But it's something internal to the har service no need to expose this.

When HarService gets created you would spawn a task (using the Guard or tokio::spawn) which has a loop:

loop {
     let mut active = false;
      tokio::select! {
          _ = shutdown => { /* clean up + exit */ }
          _ = toggle.toggle() => { active = !active }
      }

      if active { /* ... */ } else { /* .... */ }
}

(it's in this loop that you'll use the configured HarSink to send the har object to upon exit or toggle-off).

Toggle could be as simple as:

trait Toggle {
       fn toggle(&self) => impl Future + Send + '_; // needs to be cancel safe
}

There's for example Signal in tokio: https://docs.rs/tokio/latest/tokio/signal/unix/struct.Signal.html

so we can easily implement already Toggle for it as:

#[if on unix]
impl Toggle for tokio::signal::unix::Signal {
      #[inline]
      async fn toggle(&self)  {
           self.recv().await
      }
}

and same for the windows ones if on windows

You can also implement it already for https://docs.rs/tokio/latest/tokio/sync/mpsc/index.html's Receiver

and finally you can already impleemnt it for any Fn as well which returns a future.

Finally the HarSink would be something like:

trait HarSink  {
      // only send error if it's a fatal error, this will make sure the loop exits and the `HarService` will
      // from this point onward never be in record-modus anymore
      fn send_har(&mut self, har: Har) -> impl Future<Output = Result<(), Self::Error>> 
}

You can already implement this HarSink for:

  • mpsc::Sender<Har>: just send it
  • tokio::io::AsyncWrite serialize the Har object as bytes and write it in the writer

And that's about it.

I hope this makes it a bit more clear.

I would say keep it simple for now though as there's a lot of information that can written to a har file that you don't have information yet on and that do not need to be in this pass. Some extra things that you can already do are:

  • provide a HarComment struct that can be created as HarComment::new(fmt::Display) which can be injected into a Context<S>'s Extensions so you can add it as a "comment" property for that request/response pair's entry object :)

But some things are just a lot harder for now and wouldn't worry about it:

  • e.g. there are the timing objects which can have a lot of granularity (including dns, etc... I would justl eave a lot or most of that empty
  • etc etc

See this ticket mostly as an MVP (Minimal Viable Product). We can always expand the support feature set of Har as time progresses and the need is there for it. For now if the basic timings and the the recording of the request and response objects are already there it would already be a pretty great start.

@Hrushi20
Copy link
Contributor

Hrushi20 commented Jan 8, 2025

I've started implementation, I was thinking about Sink, Toggle Trait. As per my understanding,

  • Toggle Trait is used for Clients to Enable/Disable Har Service. Toggle Trait Ensure the Recorder is enabled/disabled in the Service
  • Sink Trait is used to finally output the data when toggle (false)/ shutting down the process.

@GlenDC
Copy link
Member Author

GlenDC commented Jan 9, 2025

Yes, @Hrushi20, that sounds about right. Think of it like a record button on a tape recorder. Of course, nothing stops you from running it indefinitely, but that's a bit futile since it will all be stored in volatile memory.

Browser dev tools work the same way; at least Chrome allows you to click "record" and "stop recording," which makes sense to me.

Sadly, HAR is a bit of a silly format because it can't really be streamed, as it's all one large object. Of course, you could stream it by working with an intermediate export step, so there are possibilities, but for this initial implementation, it's fine to just keep it in RAM.

If there's ever a need for it to work in long-running sessions via streaming, we can address that issue at the appropriate time later on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
easy An easy issue to pick up for anyone. good first issue Good for newcomers mentor available A mentor is available to help you through the issue.
Projects
None yet
Development

No branches or pull requests

4 participants