title: HelixFold3 weight: 60

HelixFold3

ModeProteinRNASmall-moleculePTMConstraintspLMMSA serverSplit MSA
HelixFold3

General Usage

HelixFold3 mode can be run using the command below:

nextflow run nf-core/proteinfold \
    --input samplesheet.csv \
    --outdir <OUTDIR> \
    --mode helixfold3 \
    --helixfold3_db <null (default) | PATH> \
    --use_gpu \
    -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>

File Structure

The file structure of --helixfold3_db must be as follows:

Directory structure
<helixfold3_db>/
├── bfd
│   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffdata
│   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffindex
│   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffdata
│   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffindex
│   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffdata
│   └── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffindex
├── maxit-v11.200-prod-src
│   ├── annotation-v1.0
│   └── ...
├── mgnify
│   └── mgy_clusters.fa
├── params
│   ├── ccd_preprocessed_etkdg.pkl.gz
│   └── HelixFold3-240814.pdparams
├── pdb_mmcif
│   ├── mmcif_files
│   └── obsolete.dat
├── pdb_seqres
│   └── pdb_seqres.txt
├── rfam
│   └── Rfam-14.9_rep_seq.fasta
├── small_bfd
│   └── bfd-first_non_consensus_sequences.fasta
├── uniprot
│   └── uniprot.fasta
├── uniref30
│   ├── UniRef30_2023_02_a3m.ffdata
│   ├── UniRef30_2023_02_a3m.ffindex
│   ├── UniRef30_2023_02_cs219.ffdata
│   ├── UniRef30_2023_02_cs219.ffindex
│   ├── UniRef30_2023_02_hhm.ffdata
│   ├── UniRef30_2023_02_hhm.ffindex
│   └── UniRef30_2023_02.md5sums
└── uniref90
    └── uniref90.fasta

If individual components are available at different locations in the filesystem, they can be set using the following flags:

--helixfold3_init_models_path </PATH/TO/params/HelixFold3-240814.pdparams>
--helixfold3_ccd_preprocessed_path </PATH/TO/params/ccd_preprocessed_etkdg.pkl.gz>
--helixfold3_rfam_path </PATH/TO/rfam/Rfam-14.9_rep_seq.fasta>
--helixfold3_maxit_src_path </PATH/TO/maxit-v11.200-prod-src>
--helixfold3_bfd_path </PATH/TO/bfd/*>
--helixfold3_small_bfd_path </PATH/TO/small_bfd/*>
--helixfold3_mgnify_path </PATH/TO/mgnify/*>
--helixfold3_pdb_mmcif_path </PATH/TO/pdb_mmcif/mmcif_files>
--helixfold3_obsolete_path </PATH/TO/pdb_mmcif/obsolete.dat>
--helixfold3_uniclust30_path </PATH/TO/uniref30/*>
--helixfold3_uniref90_path </PATH/TO/uniref90/*>
--helixfold3_pdb_seqres_path </PATH/TO/pdb_seqres/*>
--helixfold3_uniprot_path </PATH/TO/uniprot/*>

Without setting the --helixfold3_db flag, all of the required data files will be downloaded during the workflow execution.

Warning

The HelixFold3 reference sequence databases require ~2TB of disk space.

JSON format

HelixFold3 supports modelling of general molecular structures. Currently, only protein entities are supported using the FASTA format. Non-protein entities must be specified in an input JSON file according to the HelixFold3 specification.

HelixFold3 JSON files can be run with proteinfold in helixfold3 mode by substituting the typical FASTA file in the input samplesheet.

id,fasta
T1024,T1024.json
Note

Structures predicted from the helixfold3 json input will not be compatible with running multiple modes simultaneously.

Additional Parameters

See the HelixFold3 documentation for a full description of additional arguments. The arguments supported by the proteinfold workflow are described briefly below:

ParameterDefaultDescription
--helixfold3_max_template_date2038-01-19Structural templates from the PDB are used as additional context when making predictions. Molecules with solved structures in the PDB can be trivially predicted by using these structures as inputs. When benchmarking model performance it can be useful to restrict the use of templates to those deposited before a fixed date to ensure solved structures do not bias predictions.
--helixfold3_precisionbf16Controls the numerical precision during neural network inference. bf16 is supported by GPU accelerators A100, H100 and higher, while others will require fp32 inference. (bf16/fp32)
--helixfold3_infer_times4The number of independent seeds used to generate structure predictions using the HelixFold3 model.

You can override any of these parameters via the command line or a params file.