How to use
Once the pyCompressor package is installed, runnig a compression is very easy. It just takes an input run card in which all the input parameters are defined. The run card is subdivided into two distincts part: the compression and the GANs parameters.
pycompressor runcards/runcard.yml [--threads NUMB_THREADS]
The compression per-say requires the following keys:
compressed
- int : size of the compressed replicas
minimizer
- str : name of the minimizer (genetic or cma)
est_dic
- dict : dictionary of estimators
gans
- dict : dictionary containing input parameters for GANs
One of the keys for the gan
entry is a runcard
which gets passed to the ganpdfs code.
For details on how to set the parameters for the GAN, have a look here.
An example of an input card is shown below:
###################################################
# PDF Set #
###################################################
pdfsetting:
pdf: NNPDF40_nnlo_as_0118_1000
existing_enhanced: False
###################################################
# Size of compressed PDF replicas #
###################################################
compressed: 500
###################################################
# Choice of Minimizer #
# Options: #
# - genetic #
# - cma #
###################################################
minimizer: genetic
###################################################
# Statistical Estimators #
# Extra-options for Moment: #
# - moment5th #
# - moment6th #
###################################################
est_dic:
corr_estimators:
- correlation
stat_estimators:
- kolmogorov_smirnov
moment_estimators:
- mean
- stdev
- skewness
- kurtosis
###################################################
# Enhance statistics of Prior #
###################################################
gans :
enhance : False
runcard : ganpdfs
total_replicas: 3000
Running GANs within pyCompressor
Although it is advised to run the ganpdfs code independently, it is possible to call it
within the pyCompressor code by setting enhance
to True in the runcard. In this
scenario, the code will first enhance the statistic the prior using GANs.
Once the generation of the extra-replicas is finished, the output grids are evolved using
evolven3fit.
Then, the pyCompressor.postgans
module (in a similar fashion as postfit) creates a
symbolic link of both the original and the generated PDF sets into the LHAPDF data directory.
The new enhanced Monte Carlo set of PDF replicas is then used as input to the compressor.
Once the compression is finished, a folder is created in the main directory with the folowing
structure:
<PRIOR_PDF_NAME>_enhanced
├── filter.yml
├── input-runcard.json
├── losses_info.json
├── nnfit
│ ├── <PDF_NAME>_enhanced.info
│ ├── replica_<REPLICA_INDEX>
│ │ ├── <PDF_NAME>_enhanced.dat
│ │ └── <PDF_NAME>.exportgrid
│ └── ...
└── compress_<PRIOR_PDF_NAME>_enhanced_<NB_COMPRESSED_REPLICAS>_output.dat
where:
losses_info.json stores the losses of the generator and the critic/discriminator for the GANs model.
filter.yml contains the information on the theory ID use to reproduce the prior replicas.
input-runcard.json is a copy of the input parameters that were fed to the GANs.
nnfit has more or less the same folder structure as the output from n3fit. It contains the a replica_$REPLICA_INDEX that contains a .exportgrid file used by evolven3fit for the evolution. That is where the evolved grid in the format .dat is also stored.
compress_<PDF_NAME>_enhanced_<NB_COMPRESSED_REPLICAS>_output.dat contains the index of the reduced replicas along with the final ERF value.
If enhance
is instead set to False, the folder will just simply be:
<PRIOR_PDF_NAME>_enhanced
└── compress_<PRIOR_PDF_NAME>_enhanced_<NB_COMPRESSED_REPLICAS>_output.dat
Adiabatic minimization
Since compressing from an enhanced set could be difficult due to the limitation of the minimization
algorithm, it is possible to perfrom an adiabatic minimization by setting existing_enhanced
to
True in the runcard. In this case, the minimization is perfromed in two steps: (1) a standard
compression of the prior, (2) a compression using the enhanced set but using as a starting point
the space in which the best from the standard compression was generated.
PDF grid and Validation plot
To generate the reduced Monte Carlo set of PDF replicas, simply run:
get-grid -i <PRIOR_PDF_NAME>/compressed_<PDF_NAME>_<NB_COMPRESSED>_output.dat
Note that if the compression is done from an enhanced set, the output folder will be append by _enhanced.
Finally, to check that the reduced Monte Carlo set indeed faithfully reproduces the statistics of the
prior, ERF plots for each of the estimator can be generated and compared to a random selection. To generate
the ERF validation plots, enter in the erfs_output
folder and run the following:
validate --random erf_randomized.dat --reduced erf_reduced.dat
Controlling the parallelization
The backend of pycompressor is the JIT compiler [numba](https://numba.pydata.org) and it is numba who controls the parallelization of the calculations within the code. The number of cores to be used can be controlled with the appropiate settings to the following environmental variables:
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
export NUMBA_NUM_THREADS=4
An interface to control the numba number of threads is also provided as the command line argument threads
.
Note that in no case can threads
be greater than the environmental variable (if given) NUMBA_NUM_THREADS
.
pycomp runcards/runcard.yml --threads 4