«  Other routines   ::   Contents   ::   Utils  »

Data module

This is the data class in some detail, this class is useful to read the data in a specified format and to read the inputs parameters as specified in the Input file template.

Module author: Matias Carrasco Kind

data.bootstrap_index(N, SS)

Returns bootstrapping indexes of sample N from array of indices

  • N (int) – size of boostrap sample
  • SS (int) – extract indexes from 0 to SS

array of bootstrap indices

Return type:

int array

class data.catalog(Pars, cat_type='train', L1=0, L2=-1, rank=0)

Creates a catalog instance for training or testing

  • Pars (class) – Class of parameters read from inputs files
  • cat_type (str) – ‘train’ or ‘test’ file (names are taken from Pars class)
  • L1 (int) – keep only entries between L1 and L2
  • L2 (int) – keep only entries between L1 and L2
get_XY(curr_at='all', bootstrap='no')

Creates X and Y methods based on catalog, using random realization or bootstrapping, after this both X and Y are loaded and ready to be used

  • curr_at (dict) – dictionary of attributes to be used (like a subsample of them), ‘all’ by default
  • bootstrap (str) – Bootstrapping sample? (‘yes’/’no’)

Saves X, Y oob (and no-oob) data if required and original catalog


Is X already loaded in memory?


Is Y already loaded in memory?


Loads the random catalog with the realizations

make_random(outfileran='', ntimes=-1)

Actually makes the random realizations :param str outfileran: output file (not needed) :param int ntimes: taken from class Pars unless otherwise indicated


Creates oob data and separates it from the no-oob data for further tests :param float frac: Fraction of the data to be separated, taken from class Pars (default is 1/3)


Samples from the list of attributes

Parameters:nsample (int) – size of subsample
Returns:dictionary with subsample attributes and their locations
data.create_random_realizations(AT, F, N, keyatt)

Create random realizations using error in magnitudes, saves a temporarily file on train data directory. Uses normal distribution

  • AT (dict) – dictionary with columns names and colum index
  • F (float) – Training data
  • N (int) – Number of realizations
  • keyatt (str) – Attribute name to be predicted or classifed

Returns an array with random realizations

data.make_AT(cols, attributes, keyatt)

Creates dictionary used on all routines


Make sure all columns have different names, and error columns are the same as attribute columns with a ‘e’ in front of it, ex. ‘mag_u’ and ‘emag_u’

  • cols (str) – str array with column names from file
  • attributes (str) – attributes to be used from those columns
  • keyatt (str) – Attribute to be predicted or classified

dictionary, each key correspond to an attribute and itself a dictionary where ‘ind’ is the column index and ‘eind’ is the error column for the same attribute, ex., A={u:{‘ind’=1, ‘eind’=6}}

Return type:


data.read_catalog(filename, myrank=0, check='no', get_ng='no', L_1=0, L_2=-1, A_T='')

Read the catalog, either for training or testing currently accepting ascii tables, numpy tables

  • filename (str) – Filename of the catalod
  • myrank (int) – current processor id, for parallel reading (not implemented)
  • check (str) – To check the code, only uses 200 lines of catalog
  • get_ng (str) – Just get the total number og galaxies in the catalog
  • L_1 (int) – if passed get catalog between L_1 and L_2
  • L_2 (int) – if passed get catalog between L_1 and L_2

The whole catalog

Return type:

float array

«  Other routines   ::   Contents   ::   Utils  »