Git Product home page Git Product logo

Comments (10)

dholroyd avatar dholroyd commented on June 29, 2024

Useful feedback on this topic from @scottlamb in #3 (comment)

from h264-reader.

scottlamb avatar scottlamb commented on June 29, 2024

Continuing from the comment you mentioned: I'm working on my RTSP crate (now in its own repository and MIT/Apache-licensed). I've realized that on the wire, almost all the VCL NALU are fragmented. Right now I accumulate the fragments via copying into a bytes::BytesMut, but I eventually plan to use multi-chunk bytes::Bufs instead that reference the original bytes::Bytes via reference-counting.

I'd also like to eventually decode the slice headers (to track reference frames). I think my ideal is something like this:

  • Write a h264_reader::rbsp::RbspBufDecoder which takes a &mut dyn Buf and provides a std::io::Read implementation that lazily processes the Buf, skipping the emulation prevention three bytes. It could be behind a bytes feature to avoid adding a required dependency on bytes.
  • Have h264_reader::rbsp::RbspBitReader take a std::io::Read instead of a slice. I think would require a switch from bitreader to bitstream-io, but anyway bitstream-io is (a) more popular (b) used in eg your adts-reader crate already, and (c) faster, according to the benchmarks here.

This would allow parsing the slice headers without allocation or copying for RBSP processing. Most of the NAL wouldn't be even examined because presumably the slice headers are a very short prefix of it.

I might eventually send you a PR for this if you like this plan. I've got a bunch of other code to write before I get too serious about processing the slice headers though.

from h264-reader.

scottlamb avatar scottlamb commented on June 29, 2024

Actually, I have a much more radical idea. The only emulation prevention three byte-removing logic can be a std::io::Read wrapper around std::io::BufRead, to be used by push parser case and my case (giving it a full NAL, TBD whether that's a single &[u8] or a non-contiguous bytes::Buf). It can keep the RbspDecoder name.

The push parser case would give it a std::io::BufRead implementation that accumulates NAL bytes on push. It'd return ErrorKind::WouldBlock on hitting the end of the buffer if end hasn't been called. RbspDecoder would pass that through, as would RbspBitDecoder (both in read_* and in has_more_rbsp_data), and any slice parsers that want to be capable of trying to parse partial data.

For most NAL types it'd make sense to only try parsing once in end. For slice NALs, where SliceHeader parsing can succeed without most of the bytes, it'd probably make sense to instead try on each push call and stop buffering once parsing returns non-incomplete (success or failure). It'd try from scratch each time.

I think this would simplify the logic for the push parsing as well as sharing more of it with the already-buffered parsing. There'd be just one place push parsing buffers instead of three and counting (PicParameterSetNalHandler, SeqParameterSetNalHandler, and SeiBuffer). And it'd allow finishing the SliceLayerWithoutPartitioningRbsp implementation which I suspect is unfinished in part because ParseState::Continue(header) is awkward to deal with.

Also, I imagine some of the time the caller wants to both parse and keep the NAL bytes to send onward (similar to my RTSP library). This might allow them to share the buffer (at the cost of some more API complexity).

A code sketch:

pub trait RbspHandler {
    type Read: RbspBitRead;
    type Ctx;

    /// Processes a partially buffered NAL unit. `read` always reads from the beginning.
    /// Returns false if no additional calls are desired for this NAL unit.
    fn partial(&mut self, ctx: &mut Context<Self::Ctx>, header: NalHeader, read: Self::Read) -> bool {
        true
    }

    /// Processes a fully buffered NAL unit. `read` always reads from the beginning.
    /// Called iff every prior `partial` call for this NAL unit (zero or more) returned true.
    fn complete(&mut self, ctx: &mut Context<Self::Ctx>, header: NalHeader, read: Self::Read) {}
}

impl<Read: RbspBitRead, Ctx> RbspHandler for PicParameterSetNalHandler<Ctx> {
    type Read = Read;
    type Ctx = Ctx;

    fn complete(&mut self, ctx: &mut Context<Self::Ctx>, header: NalHeader, read: Self::Read) {
        match PicParameterSet::read(ctx, read) {
            Ok(pps) => ctx.put_pic_param_set(pps),
            Err(e) => error!("pps: {:?}", e),
        }
    }
}

impl<Read: RbspBitRead, Ctx> RbspHandler for SliceNalHandler<Ctx> {
    type Read = Read;
    type Ctx = Ctx;

    fn partial(&mut self, ctx: &mut Context<Self::Ctx>, header: NalHeader, read: Self::Read) -> bool {
        match SliceHeader::read(ctx, read) {
            Ok(header) => {
                info!("TODO: expose to caller: {:#?}", header);
                false
            },
            Err(SliceHeaderError::Incomplete) => true,
            Err(e) => {
                error!("slice_header() error: SliceHeaderError::{:?}", e);
                false
            },
        }
    }

    fn complete(&mut self, ctx: &mut Context<Self::Ctx>, header: NalHeader, read: Self::Read) -> bool {
        match SliceHeader::read(ctx, read) {
            Ok(pps) => info!("TODO: expose to caller: {:#?}", header),
            Err(e) => error!("slice_header() error: SliceHeaderError::{:?}", e),
        }
    }
}

What do you think?

from h264-reader.

scottlamb avatar scottlamb commented on June 29, 2024

Also, I imagine some of the time the caller wants to both parse and keep the NAL bytes to send onward (similar to my RTSP library). This might allow them to share the buffer (at the cost of some more API complexity).

On second thought, I don't think it's any more complex. Also, I think we could get rid of NAL/SEI switches, RefCell, user_context, and the need to add plumbing for error reporting. Altogether I find the API sketch below a lot easier to understand than the status quo.

NAL handlers could be a lambda rather than a trait; a little lighter-weight but less explicit. I could go either way on that part.

mod nal {
    /// A single, partially- or completely-buffered NAL.
    /// The push module uses a [ByteSliceNal], but it's possible to implement
    /// this interface with any type that can produce a `std::io::BufRead`.
    pub trait Nal {
        type BufRead: std::io::BufRead;

        /// Returns if the NAL is completely buffered.
        pub fn complete(&self) -> bool;

        /// Returns the NAL header or error if missing/corrupt.
        pub fn header(&self) -> Result<NalHeader, NalHeaderError> { ... }

        /// Returns the bytes in NAL form (including the header byte and
        /// any emulation-prevention-three-bytes). If the NAL is incomplete,
	/// reads may fail with `ErrorKind::WouldBlock`.
        pub fn reader(&self) -> Self::BufRead;

        /// Returns the bytes in RBSP form (skipping header byte and
        /// emulation-prevention-three-bytes).
        pub fn rbsp_bytes(&self) -> rbsp::ByteReader<Self::BufRead> { ... }

        /// Returns a helper to access bits within the RBSP.
        pub fn rbsp_bits(&self) -> rbsp::BitReader<rbsp::ByteReader<Self::BufRead>> { ... }
    }

    /// A NAL which stores its bytes contiguously.
    pub struct ByteSliceNal<'a> { buf: &'a [u8], complete: bool }
    impl<'a> Nal for ByteSliceNal<'a> { ... }
    impl<'a> AsRef<u8> for ByteSliceNal<'a> { ... }
}

mod push {
    /// The results of a NAL handler during push parsing.
    pub enum HandlerResult {
        /// If this NAL is incomplete, continue buffering it for a following call.
        ReadMore,

        /// Stop buffering this NAL; move on to the next one immediately.
        /// This is no different than `ReadMore` when the NAL is complete.
        Skip,

        /// Abort the push call with the given error. Following pushes will also fail.
        AbortPush(Box<dyn Error>),
    }

    /// Accumulator for NAL push parsers.
    /// This is meant to be used by parsers for a specific format: Annex B, AVC, MPEG-TS, RTP, etc.
    /// Accumulates NALs in an internal buffer and delegates to a caller-supplied handler.
    pub struct Accumulator<'a> {
        buf: Vec<u8>,
        state: AccumulatorState,
        handler: &'a mut FnMut(ByteSliceNal),
    }

    impl Accumulator {
        pub fn new(nal_handler: &mut FnMut(ByteSliceNal) -> HandlerResult) -> Self;

        pub fn start(&mut self);
        // TODO: maybe should be push(..., end: bool) instead to avoid an extra !complete call to nal_handler.
        pub fn push(&mut self, buf: &[u8]) -> Result<(), Error>;
        pub fn end(&mut self);
    }

    /// Push parser for an Annex B form bytestream
    /// (one with NALs separated by `00 00 01` or `00 00 00 01`).
    pub struct AnnexBParser<'a> { accumulator: Accumulator<'a>, state: AnnexBParseState }

    impl<'a> AnnexBParser<'a> {
        pub fn new(nal_handler: &mut FnMut(nal: ByteSliceNal) -> HandlerResult) -> Self;
        pub fn push(&mut self, buf: &[u8]) -> Result<(), Error>;
        pub fn end_units(&mut self) -> Result<(), Error>;
    }
}

/// RBSP stuff, independent of push or pull parsing.
mod rbsp {
    /// Strips header byte and `emulation-prevention-three-byte` elements from the single-NAL input
    /// to yield the RBSP.
    pub struct ByteReader<N: std::io::BufRead> { ... }
    impl<N: std::io::BufRead> std::io::Read for ByteReader<N> { ... }
    impl<N: std::io::BufRead> std::io::BufRead for ByteReader<N> { ... }

    /// Reads the RBSP as bits given a byte-level reader.
    // (This only needs `BufRead` rather than `Read` for `has_more_rbsp_data`.)
    pub struct BitReader<R: std::io::BufRead>(bitstream_io::Reader<R, bitstream_io::BigEndian>);
    impl<R: std::io::BufRead> BitReader<R> {
        ...
    }
}

// Example usage
fn my_annexb_reader() {
    let mut ctx = Context::new();
    let mut parser = h264_reader::push::AnnexBParser::new(|nal| {
        match nal.header().unit_type() {
            Ok(UnitType::PicParameterSet) if !nal.complete() => return NalResult::ReadMore,
            Ok(UnitType::PicParameterSet) => {
                match PicParameterSet::read(ctx, nal.rbsp_bits()) {
                    Ok(pps) => ctx.put_pic_param_set(pps),
                    Err(e) => error!("Bad pps: {:?}", e),
                }
            },
            Ok(UnitType::SliceWithPartitioningIdr) => {
                match SliceHeader::read(ctx, nal.rbsp_bits()) {
                    Ok(hdr) => info!("Slice header: {:#?}", hdr),
                    Err(SliceHeaderError::NeedMoreData) => return NalResult::ReadMore,
                    Err(e) => error!("Bad slice header: {:?}", e),
                }
            },
            Ok(_) => {},
            Err(e) => error!("bad NAL header: {:#?}", e),
        }
        NalResult::Skip
    });
    let f = File::open("foo.h264").unwrap();
    let mut reader = BufReader::new(f);
    loop {
        let buf = reader.fill_buf().unwrap();
        if buf.is_empty() {
            break;
        }
        parser.push(buf);
        read.consume(buf.len());
    }
}

from h264-reader.

dholroyd avatar dholroyd commented on June 29, 2024

Does the underlying use of Read / BufRead mean that the implementation must end up copying all bytes out of the source buffers and into some destination buffer? It seems to me that it would, but I might be missing a trick! :)

My hope has always been to be able to extract metadata (sps + pps now; eventually slice_header() too) while avoiding copying the bytes of slice_data( ).

All this said, if there is an irreconcilable choice between either API usability or avoiding slice_data() copies, I think API usability needs to win.

from h264-reader.

dholroyd avatar dholroyd commented on June 29, 2024

@scottlamb I've added you to the project.

Please continue to PR changes; however if I don't respond for a few days and you are blocked then please know that I trust your sense of taste and am ok with you merging in order to move forward.

Don't feel that this places an obligation on you to pick up maintenance work if you don't have the time or inclination - I'm just aiming to make life easy.

For my part I will try to get into the habit of PR'ing my own changes rather than committing direct to master.

from h264-reader.

scottlamb avatar scottlamb commented on June 29, 2024

Does the underlying use of Read / BufRead mean that the implementation must end up copying all bytes out of the source buffers and into some destination buffer? It seems to me that it would, but I might be missing a trick! :)

After some tweaking of the details (now in draft PR #26): it copies on push only after the handler requests to see an incomplete NAL again. So if the whole NAL is in one push, there's no copying. If the SliceHeader can be parsed from the first push alone, there's no copying. If the SliceHeader can be parsed on the second push, it copies the first but not the second.

I think this is strictly an improvement over what PicParameterSetNalHandler, SeqParameterSetNalHandler, and SeiBuffer were doing and means most of a slice's bytes aren't copied even when examining the headers.

@scottlamb I've added you to the project.

Thanks for the vote of confidence!

from h264-reader.

scottlamb avatar scottlamb commented on June 29, 2024

For my part I will try to get into the habit of PR'ing my own changes rather than committing direct to master.

Are you looking for me to review/merge your PRs as you'd do for mine? Or is that just to kick the CI into running before you commit?

from h264-reader.

scottlamb avatar scottlamb commented on June 29, 2024

PR #26 is now ready to review with a significant API overhaul.

from h264-reader.

scottlamb avatar scottlamb commented on June 29, 2024

@dholroyd Could I talk you into taking a look at the PR? This one is much more invasive than previous changes so I've held off on merging it to get a sense how comfortable you are with the new interface.

from h264-reader.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.