Skip to content

Invalid file information in SPDX documents #1240

@armintaenzertng

Description

@armintaenzertng

Note: This uses the new version of the SPDX generation introduced in #1233. The old version sports the same errors and a few more that have been already fixed in the new version.

Describe the bug
SPDX outputs with file information have a number of validation issues:

  • some files don't have a checksum (maybe this is only the case for empty files, so currently this resorts to the SHA1 of the empty string so that the SpdxDocument can at least be generated)
  • some files have invalid SpdxIds like SPDXRef-None-None or SPDXRef-v2"
  • some license references from LicenseInfoInFile are not present in the ExtractedLicensingInfo section

To Reproduce
I used tern report -i golang:1.12-alpine -f spdxjson -sv 2.3 -o output.json to produce the output and then ran pyspdxtools -i output.json on it (note that the validation takes a while due to large SPDX document).
I'm not sure whether -x scancode would also be required as I recall that the above command used to not produce any file information before. In case there are problems, I attached my output.json as output.txt (JSON format is not supported by GitHub, it seems).

Error in terminal
Here are the validation issues:

Unrecognized license reference: LicenseRef-21495e9. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-21495e9
Unrecognized license reference: LicenseRef-1c734cf. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-1c734cf
Unrecognized license reference: LicenseRef-1b79b75. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-1b79b75
Unrecognized license reference: LicenseRef-fa9fd02. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-fa9fd02
Unrecognized license reference: LicenseRef-39c3ee0. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-39c3ee0
Unrecognized license reference: LicenseRef-21495e9. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-21495e9
Unrecognized license reference: LicenseRef-4ccf56f. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-4ccf56f
Unrecognized license reference: LicenseRef-45c771b. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-45c771b
Unrecognized license reference: LicenseRef-ca2312b. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-ca2312b
spdx_id must only contain letters, numbers, "." and "-" and must begin with "SPDXRef-", but is: SPDXRef-v2"-None
spdx_id must only contain letters, numbers, "." and "-" and must begin with "SPDXRef-", but is: SPDXRef-v2"-None
did not find the referenced spdx_id "SPDXRef-None-None" in the SPDX document

Expected behavior
Tern's generated SPDX documents with file information should be valid.

Environment you are running Tern on
Enter all that apply

  • tern at 047e1cb
  • Ubuntu 22.0.4
  • Python 3.10.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions