美文网首页
Topic | Advancements in Embodied

Topic | Advancements in Embodied

作者: 与阳光共进早餐 | 来源:发表于2023-12-02 05:21 被阅读0次

    1. 写在前面

    简略地了解一下基于LLMs的embodied AI进展

    2. paper:Embodied Task Planning with Large Language Models (arxiv23)

    2.1 basic info

    • task: embodied task planning
    • model: TaPA (TAsk Planing Agent) framework is proposed.
    • main idea: aligns large language models (LLMs) with visual perception models to generate executable plans in physical environments.

    2.2 main contribution

    1. Multimodal Dataset Construction
    • a dataset containing triplets of <indoor scenes, instructions, and action plans>
    1. Grounded Plan Tuning
    • Finetuning pre-trained LLMs for grounded planning, considering the physical constraints of the scene.
    1. Extending Open-Vocabulary Object Detection
      Enhanced detection for multi-view RGB images, crucial for understanding scene context.

    2.3 main idea

    The TaPA framework integrates LLMs with visual information from open-vocabulary object detectors. It processes human instructions and available object lists to generate feasible action plans for navigation and manipulation tasks.

    2.4 results

    3. paper: Large Language Models as Generalizable Policies for Embodied Tasks (arxiv23)

    3.1 basic info

    • task: visual embodied tasks
    • model: Large Language model Reinforcement Learning Policy (LLaRP)
    • main idea: integrates pre-trained LLMs with egocentric visual observations to directly output actions in the environment.

    3.2 main contribution

    1. LLaRP Framework
    • A new framework that combines LLMs with reinforcement learning for embodied AI tasks.
    1. Generalization Capabilities
    • Demonstrated robustness to paraphrased instructions and ability to generalize to novel tasks.
    1. Language Rearrangement Benchmark
      Introduction of a new benchmark comprising 150,000 training tasks and 1,000 test tasks for language-conditioned rearrangement.

    3.3 main idea

    image.png
    • use pre-trained frozen LLM to process text instructions and visual observations;
    • some blocks (highlighted in red) are trained through reinforcement learning;
    • then the frozen LLM and the blocks can generalize to novel tasks.

    4. else papers

    • GOAT: GO to Any Thing
    • CLIP-Fields Weakly Supervised Semantic Fields
    • Discuss Before Moving: Visual Language Navigation via Multi-expert Discussions

    :(之后有机会再针对每篇文章写一些详细的

    相关文章

      网友评论

          本文标题:Topic | Advancements in Embodied

          本文链接:https://www.haomeiwen.com/subject/zklhgdtx.html