C++ Senior Freelancer — Multithreading, Real-Time, Performance (Jetson Nano)
Upwork

Remoto
•1 dia atrás
•Nenhuma candidatura
Sobre
C++ Senior Freelancer — Multithreading, Real-Time, Performance (Jetson Nano) Project Overview I need an advanced C++ engineer to harden and optimize a real-time video + telemetry pipeline on NVIDIA Jetson Nano for my PhD project. The system captures camera frames, processes them (CPU/GPU), displays original/processed streams, and records SoC & power metrics. A key task is implementing time-windowed, clipped trapezoidal integration for accurate, per-frame energy/CPU averages from sparse samples (1–10 Hz). Deadline: November 30, 2025 (hard). Location/Time zone: Remote — preference for overlapping a few hours with America/Manaus (UTC−4). Tech Stack & Context C++17/20, modern concurrency (std::thread, atomics, lock-free patterns a plus) Linux/POSIX, real-time patterns (priority scheduling, CPU affinity, zero-copy) V4L2 camera capture (YUYV), zero-copy buffers via mmap + custom deleters Queues for fan-out: cam2Alg, cam2Disp, alg2Disp Display: SDL2 (side-by-side original/processed) Optional GPU: CUDA filters and CPU/GPU switching System metrics: tegrastats parsing (SoCConcrete), Lynsyn power board (PowerConcrete) Central SystemMetricsAggregator (frameId + timestamps) → CSV + NDJSON Target: robust, time-weighted averages/energy via Trapezoidal Integration with clipping to [start,end] frame windows Key Tasks Clipped Trapezoidal Integration (core) Given sparse samples (t_i, v_i), compute: Energy (J) and time-weighted averages within [t_start, t_end] using trapezoids, Clip first/last segments to window boundaries; handle edge cases (no points inside, single point, duplicates, out-of-order). Clean C++ API, unit tests, and benchmarks. Multithreading & Real-Time Hardening Audit queues/back-pressure; eliminate stalls & priority inversions. Optional: set CPU affinity/priority, reduce jitter, ensure graceful shutdown. Performance Optimization Profiling (perf, std::chrono, custom counters), micro-opts on hot paths. Zero-copy correctness, minimal copies, cache-friendly data layouts. Metrics Correlation Ensure aggregator joins by frameId + canonical captureTime. Tolerant join windows, missing/future samples, monotonicity checks. Logging & Validation Deterministic CSV/NDJSON schemas; invariants + assertions. Small visualization hooks (optional): dump debug series for quick plots. Deliverables Header-only or small lib implementing clipped_trapezoid() utilities + tests Integration PRs to the pipeline (clear commits + code comments) Benchmarks (throughput, latency, allocation counts) & a one-page perf summary README with API, edge cases, and usage examples Must-Have Experience Advanced C++17/20 (RAII, move semantics, templates, chrono, alloc awareness) Multithreading (lock-free or low-contention designs, condition vars, atomics) Real-time or near-real-time systems on Linux Numerical integration / signal processing on irregular time series Strong testing discipline (GoogleTest/Catch2/doctest) Nice-to-Have Jetson (Nano/Xavier) experience, CUDA basics V4L2, SDL2, CSV/NDJSON pipelines Parsers for tegrastats / embedded power monitors (Lynsyn or similar) Timeline & Availability Start ASAP. Ship the integration and tests before Nov 30, 2025. Minimum 10–20 focused hours in the first two weeks. Budget Competitive; open to hourly or fixed-price milestones for core integration + optimization. How to Apply (DM only) Please send: 3–5 bullets on relevant C++/real-time wins (with measurable outcomes) A short note on how you’d implement clipped trapezoid robustly (edge cases) Links to a code sample or gist (ideally multithreaded or numeric) Availability through Nov 30, hourly rate (or milestone proposal) Any Jetson/V4L2/SDL2/CUDA experience (brief) Short Screening Task (2–3 hours) Implement integrate_window(samples, start, end) with: Strict clipping, stable double math, monotonic time checks, O(n) single pass. Unit tests: no in-window points, single boundary touch, duplicated timestamps, descending input (should sort or reject), large gaps. Bonus: micro-bench with 1e6 samples (synthetic), report ns/sample. Success Criteria Correct, numerically stable energy/average within ≤1% of high-resolution ground truth in synthetic tests. No frame drops attributable to the new code; bounded latency and memory. Clean API, readable code, reproducible tests. High-Level Architecture: The provided code implements a complex, high-performance, real-time video processing and metrics-gathering pipeline. The architecture is modern, modular, and asynchronous, built around a "staged" pipeline design. Data Flow: Camera (DataConcrete) runs in its own thread, capturing YUYV frames from a V4L2 device. It uses a zero-copy mechanism (mmap'd buffers managed by std::shared_ptr custom deleters) to push frame data to two separate queues: cam2Alg (for processing) and cam2Disp (for original display). Algorithm (AlgorithmConcrete) runs in its own thread, consuming frames from the cam2Alg queue. It performs a selected image processing operation (either on CPU or GPU via CUDA) and places the newly created (processed) frame data into a third queue: alg2Disp. Display (SdlDisplayConcrete) runs on the main thread. It "drains" both the cam2Disp and alg2Disp queues (taking only the newest frame from each) to prevent display lag. It then renders the original and processed frames side-by-side. Metrics Flow (Parallel): SoC (SoCConcrete) and Power (LynsynMonitorConcrete) modules run in their own threads, independently polling for hardware statistics (CPU/GPU temps, power usage, etc.). All modules (Camera, Algorithm, Display, SoC, Power) push their individual statistics to a central Aggregator (SystemMetricsAggregatorConcrete_v3_2). The Aggregator, which runs its own flush thread, correlates these disparate stats using frameId and timestamps. It assembles them into complete SystemMetricsSnapshot objects, which are then written to both CSV and NDJSON files for logging and analysis. Orchestration: The SampleTestIntegrationCode_v13.cpp file contains the main() function and a ConfigManager class. This class is responsible for reading a config.json file, building the entire pipeline (instantiating all modules, queues, and the aggregator), and injecting all dependencies (like queues and the aggregator) into the modules that need them. A central ThreadManager is used to track all worker threads, allowing for a clean, graceful shutdown when the application receives a signal (like SIGINT). github: https://github.com/AntonioRodriguezUFAM/ERL_Stage_1_Framework_38.git




