Debian is the latest in an ever-growing list of projects to wrestle (again) with the question of LLM-generated contributions; the latest debate stared in mid-February, after Lucas Nussbaum opened a discussion with a draft general resolution (GR) on whether Debian should accept AI-assisted contributions. It seems to have, mostly, subsided without a GR being put forward or any decisions being made, but the conversation was illuminating nonetheless.

Nussbaum said that Debian probably needed to have a discussion “to understand where we stand regarding AI-assisted contributions to Debian” based on some recent discussions, though it was not clear what discussions he was referring to. Whatever the spark was, Nussbaum put forward the draft GR to clarify Debian’s stance on allowing AI-assisted contributions. He said that he would wait a couple of days to collect feedback before formally submitting the GR.

His proposal would allow “AI-assisted contributions (partially or fully generated by an LLM)” if a number of conditions were met. For example, it would require explicit disclosure if “a significant portion of the contribution is taken from a tool without manual modification”, and labeling of such contributions with “a clear disclaimer or a machine-readable tag like ‘[AI-Generated]’.” It also spells out that contributors should “fully understand” their submissions and would be accountable for the contributions, “including vouching for the technical merit, security, license compliance, and utility of their submissions”. The GR would also prohibit using generative-AI tools with non-public or sensitive project information, including private mailing lists or embargoed security reports.

  • ProdigalFrog@slrpnk.net
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    2 days ago

    I’m surprised they’re not adamantly against it, since AI code so frequently spits out garbage code, yet Debian is known for its stability. How can they not see the conflict here, especially to their reputation?

    • Venator@lemmy.nz
      link
      fedilink
      English
      arrow-up
      1
      ·
      23 hours ago

      It can be good at generating boilerplate code or copying an existing solution, so maybe it might be useful for less critical parts such as adding a GUI for some feature that was previously limited to the command line…

      • ProdigalFrog@slrpnk.net
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        19 hours ago

        Unless that AI is hosted locally and only trained exclusively on public domain or GPL code (AFAIK, no AI model like that exists), it’s both unethical to use, and potentially corrupting the code-base with proprietary code it copied from somewhere else.

        If a developer uses an AI hosted in a datacenter, then every use of it encourages the waste of water and fossil fuels to run the datacenter, it encourages the more to built in vulnerable neighborhoods where they can’t do anything about the pollution they generate, and it enriches the pocketbooks of the techno-fascists that run those datacenters.

        • Venator@lemmy.nz
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          19 hours ago

          it enriches the pocketbooks of the techno-fascists that run those datacenters.

          Depends, in a lot of cases it costs them more money to service the query than they charge 😅.

          Although it’s all borrowed money so it doesn’t matter to them…

          Still causing all that havoc on the environment and poisoning with potentially proprietary stolen code though…

          They’re gonna be running those data centers regardless though, as most of the compute time is spent on training new models…

          • ProdigalFrog@slrpnk.net
            link
            fedilink
            English
            arrow-up
            2
            ·
            19 hours ago

            Although it’s all borrowed money so it doesn’t matter to them…

            Right, and they tend to convince investors to give them more by the amount of users adopting and using the product, which they can then extrapolate out with wild projections to convince venture capital to give them even more investments.

            If more people abandon it entirely, the less venture capital they may be able to entice into their trap.

          • Venator@lemmy.nz
            link
            fedilink
            English
            arrow-up
            1
            ·
            19 hours ago

            The copyright concerns can be mitigated somewhat by prompting to follow existing patterns in the codebase(and double checking that it has done that when reviewing the generated code)