Mirroring large number of GitHub or Bitbucket repositories
Use case:
I want to create local clone of a large number of repositories from an organization in GitHub / Bitbucket.
Assumptions
- user has SSH key registered with the GH / BB - can clone using ssh protocol
- user has successfully cloned at least one repo from command line before (the /etc/hosts contains host entries for BB and GH)
- user has modern version of Python 3 installed (3.6 or better)
- user has local installation of a reasonably modern Git
Steps
First I use the ./bb-repo-links.sh
and ./gh-repo-links.sh
to generate list of repositories.
See these blog articles: How to list repositories in Bitbucket and How to list repositories in github
These are text files and have the structure like this:
EXFO|git@bitbucket.org:thinkwrap/exfo-tw-docker-jenkins.git
FARO|git@bitbucket.org:thinkwrap/faro-infrastructure.git
H3|git@bitbucket.org:thinkwrap/headless-api-gateway.git
H3|git@bitbucket.org:thinkwrap/headless-cart-service.git
H3|git@bitbucket.org:thinkwrap/headless-catalog-service.git
H3|git@bitbucket.org:thinkwrap/headless-connector-atg.git
H3|git@bitbucket.org:thinkwrap/headless-connector-contentful.git
The first field is the project prefix, and the second is the SSH url for cloning the repository. They are separated by the | character.
Cloning is trivial:
python3 ./process_repos.py NAME-OF-THE-CONTROL-FILE ....
e.g.
python3 ./process_repos.py pivotree_h3.bb-repo2 pvtrlabs.gh-repo
The Python script uses ../MIRROR
as the starting point for the mirrored tree.
This can be changed by exporting the variable MIRROR_ROOT
export MIRROR_ROOT = '../MIRROR2'
The generated structure looks like this:
PLAFORM > ACCOUNT > PROJECT > REPO
where:
- PLATFORM is
bitbucket.org
orgithub.com
- Account is organization name within the platform
- PROJECT is project name for Bitbucket, defaults to ‘src’ for GitHub
- REPO is the repository name
NOTE: the names of some projects are < edited >.
.
├── bitbucket.org
│ ├── pivotree_h3
│ │ ├── BrownBagExamples
│ │ │ └── apigateway
│ │ ├── DATA
│ │ │ └── sample-data-aws
│ │ ├── DIVE
│ │ │ ├── customer-mgmt-api
│ │ │ ├── data-science-practice
...
│ │ │ ├── pvt-admin-mgmt-api
│ │ │ └── sagemaker-microservices
│ │ ├── DIVE20
│ │ │ ├── dive-data-manager
...
│ │ │ └── tr-sgm-duplicates-svc
│ │ ├── DIVECX
│ │ │ ├── dive-cx-account
│ │ │ ├── dive-cx-account-fork
...
│ │ │ └── pvt-ssm-manager
│ │ └── starter
│ │ ├── pvt-node-module-starter
│ │ ├── pvt-node-starter
│ │ ├── pvt-pipeline-example
│ │ └── pvt-sls-starter
│ └── thinkwrap
│ ├── <PRJ>-CLOUD
│ │ └── <prj>-cloud
│ ├── Admin
│ │ └── tw-bitbucket-admin
│ ├── <PROJECT_1>
│ │ ├── <prj1>-infra
│ │ └── <prj1>-robot
│ ├── BOPIS
│ │ ├── bopis_automation_platform
│ │ └── pvt-bopis
│ ├── <PRJ2>
│ │ └── <prj2>-vtex-poc
│ ├── <PRJ3>
...
│ │ └── <prj4>-testautomation
│ ├── ZZZ-Archived-Repos
│ │ └── oneclick-starter
│ └── microbase
│ ├── micro-alexa-search
│ ├── micro-alexa-search-skill
│ ├── micro-backoffice
│ ├── micro-cart-service
│ ├── micro-catalog-service
│ ├── micro-customer-service
│ ├── micro-docker-nginx
│ ├── micro-docker-service
│ ├── micro-oauth-service
│ ├── micro-payment-service
│ ├── micro-promotion-service
│ ├── micro-recommendation-service
│ ├── micro-stock-service
│ ├── micro-tax-service
│ ├── microbase
│ └── www.microbase.io
└── github.com
├── pivotree-tech-blog
│ └── src
│ ├── hugo101
│ ├── pivotree-tech-blog-source-poc
│ ├── pivotree-tech-blog.github.io
│ └── story
├── pvtrlabs
│ └── src
│ ├── action-skip-ci
│ ├── at-aws-onelogin-multiapp
│ ├── at-cloudhealth
│ ├── at-opsgenie-integrations
....
│ ├── terraform-tfe-workspace
│ ├── terraform-tfe-workspace-full
│ ├── test-limited-tfc-access
│ └── test-repo
└── thinkwrap
└── src
├── ML-Live-Beginner
├── ML-Live-Intermediate
├── Tiny-Python-3.6-Notebook
├── aaac-starter-template
├── aws-cost-and-usage-report
├── aws-ml-guide
├── blast-radius
├── docker-aws-cli
├── docker-kubernetes-cli
├── docker-kubernetes-cli-helm
├── docker-node-base-pipelines
├── docker-node-sls-pipelines
├── docker-node-sls-pipelines-aws
├── docker-python-base-pipelines
├── docker-sls-cli
├── docker-terraform-cli
├── docker-terraform-cli-helm
├── ecs-deep-dive-2018
├── kubernetes-in-action
├── programming-with-data
├── sgviz
├── ssm-parameter-store
├── terraform-ecs-fargate
└── universal-recommender
549 directories, 0 files
Updating
If the Python script encounters an existing repository, it tries to do an update by fetching the remote updates. It does NOT merge.