Curated PyPI Sources
This section provides an overview of what curated-pypi sources are, why they are useful, and how to use them. If you'd just like to get started, view the quick-start guide on how to create your first curated subset of PyPI.
Overview#
Curated PyPI sources are based on a current mirror of PyPI. To learn more about PyPI, visit the PyPI Mirror documentation to understand how Package Manager's PyPI source works. Since PyPI has over 400,000 packages, it can be useful to only include certain packages and versions within a source. This is especially helpful in the context of package security, where only verified sets of packages are allowed.
Creating a Curated PyPI Source#
$ rspm create source --name=pypi-subset --type=curated-pypi
<< Source 'pypi-subset':
<< Type: Curated PyPI
Curated PyPI sources don't need to be pinned to a specific snapshot date at the time of creation, any date can be picked when adding packages with rspm update (described below). Once the source has been created, be sure to subscribe a repository to the source to make the packages available to users:
# Create a repository:
$ rspm create repo --name=pypi --type=python --description='Access curated PyPI packages'
# Subscribe a repository to the curated-pypi source:
$ rspm subscribe --repo=pypi --source=pypi-subset
Including Packages in a Curated PyPI Source#
Packages are included in a curated-pypi source by uploading a requirements.txt definition with rspm update. First, we'll look at how a requirements file is defined. Then, we will look at how to use a requirements file to include packages in a curated-pypi source.
Requirements Files#
A requirements.txt can be created from scratch, or you can use a pre-existing file that an organization or team already uses to define local environments. The requirements.txt format that Package Manager looks at is defined as:
As an example, a requirements.txt file could look like:
This fetches and installs:
- All available versions of
shiny - All versions of
tensorflowgreater than or equal to2.4.0, less than2.5, and explictly not include2.4.2 - Only
numpyversion1.24.2 - All packages from
requirements2.txt
As shown in the example above, a package doesn't need to have any version constraints defined. It can also have as many version constraints as needed. The versions made available to Package Manager will depend on what is available at the snapshot date specified when updating the source.
All Python version parsing and matching criteria is based on PEP-440, refer to that documentation for information on version formatting and constraints. For more information on the Requirements File Format, refer to pip's documentation.
Note
Not everything defined in the Requirements File Format specification is supported in Package Manager. The curated-pypi source only parses package names, version ranges, and recursive file references. Any other definitions (e.g., extras, option flags, environment markers) within an uploaded requirements.txt file is ignored.
The requirements.txt file also supports declaring multiple references of the same package with different version constraints:
This will be treated as an OR operator, leading the curated-pypi source to evaluate the defined version constraints as:
In this example, Package Manager will pull in both version 2.4.2 and version 2.4.3. This can be helpful when combining requirements.txt files from multiple sources, ensuring all versions you are expecting to be included.
Note
Be careful when referencing a package multiple times when using a != constraint. As an example:
This will still include version 2.4.2 because it is being evaluated as:
To guarantee that version 2.4.2 is excluded, include all version constraints on a single line so Package Manager evaluates all constraints together:
Using Pipfiles Instead#
If you already have a Pipfile or Pipfile.lock defined, then you may prefer to use that. Although Package Manager doesn't support uploading the Pipfile directly, there are a few methods to convert them to the requirements.txt format.
One method is to run pip freeze from within the defined pipenv environment:
Another alternative could be to use jq to parse the Pipfile.lock file and turn it into a requirements.txt file:
These methods should be useful to get your package specifications into a format that Package Manager can handle.
Updating a Curated PyPI Source#
To make packages available in a Curated PyPI source, all that is necessary is to run rspm update with a requirements file for a specific PyPI snapshot date. Package Manager allows running a dry-run before committing the changes to the source:
# Do a dry-run to visualize the changes to the source before doing them
$ rspm update --source=pypi-subset --file-in=/path/to/requirements.txt --snapshot=2023-03-24
A preview of the changes is presented:
Packages from 'requirements.txt' to update source 'pypi-subset' at PyPI snapshot date '2023-03-24':
Name Version
numpy 1.24.2
shiny 0.1, 0.2.1, 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8, 0.2.9, 0.2.10
tensorflow 2.4.0, 2.4.1, 2.4.3, 2.4.4
If the output above looks correct, execute this command again with the --commit flag to update the source with the new set of packages.
Note
If your requirements.txt file includes more than 1,000 packages, the output of the update command is simplified for performance purposes.
To commit the changes, repeat the command, adding the --commit flag:
# Now commit the changes to the source:
$ rspm update --source=pypi-subset --file-in=/path/to/requirements.txt --snapshot=2023-03-24 --commit
The finalized contents of the source are then printed:
Successfully updated source 'pypi-subset' at PyPI snapshot date '2023-03-24' with the following packages from 'requirements.txt':
Name Version
numpy 1.24.2
shiny 0.1, 0.2.1, 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8, 0.2.9, 0.2.10
tensorflow 2.4.0, 2.4.1, 2.4.3, 2.4.4
Note
Running rspm update on a Curated PyPI source will overwrite the source with only the packages defined in your requirements.txt file. However, previous snapshots of the source are still available with a pinned repo URL.
To update the source to a different snapshot date, use the update command again:
# Update packages in a curated-pypi source:
$ rspm update --source=pypi-subset --file-in=/path/to/requirements.txt --snapshot=2021-02-03 --commit
Curated PyPI sources can be pinned to any date for which Posit has a PyPI snapshot (typically, once per weekday). Curated PyPI sources also support using any date, regardless of the previously used snapshot dates. If the source was initially set to 2021-02-03, it can then be set to a later date with --snapshot=2022-06-01. If later you would like to pin it back to the original date used, that can be done by running rspm update again with --snapshot=2021-02-03.
Tip
This allows you to set the Curated PyPI source to any date where a PyPI snapshot has been taken on our servers. If you are trying to pin to a version of a package that doesn't exist on PyPI anymore, try pinning to a date when it existed.