sparc.docparser module

A module to parse the latex documents provided by SPARC and convert to its Python API

Created on Wed Mar 1 15:32:31 EST 2023

Tian Tian (alchem0x2a@gmail.com)

class sparc.docparser.SparcDocParser(directory='.', main_file='*Manual.tex', intro_file='Introduction.tex', params_from_intro=True, parse_version=True)[source]

Bases: object

Parses LaTeX documentation of SPARC-X and converts it into a Python API.

This class extracts parameter information from LaTeX source files, organizing it into a structured format that can be easily used in Python. It supports parsing of version details, parameter types, units, and other relevant information.

version

Parsed SPARC version, based on the documentation.

Type:

str

parameter_categories

Categories of parameters extracted.

Type:

list

parameters

Extracted parameters with detailed information.

Type:

dict

other_parameters

Additional parameters not categorized.

Type:

dict

find_main_file(main_file_pattern)[source]

Finds the main LaTeX file based on a pattern.

get_include_files()[source]

Retrieves a list of included LaTeX files.

parse_version(parse)[source]

Parses and sets the SPARC version.

parse_parameters()[source]

Extracts parameters from LaTeX files.

postprocess()[source]

Applies hard-coded post-processing to some parameters.

to_dict()[source]

Converts parsed information into a dictionary.

json_from_directory(directory, include_subdirs, **kwargs)[source]

Class method to create JSON from a directory.

json_from_repo(url, version, include_subdirs, **kwargs)[source]

Class method to create JSON from a repository.

find_main_file(main_file_pattern)[source]

Finds the main LaTeX file that matches the given pattern, e.g. Manual.tex or Manual_cyclix.te

Parameters:

main_file_pattern (str) – Pattern to match the main LaTeX file name.

Returns:

Path to the main LaTeX file.

Return type:

Path

Raises:

FileNotFoundError – If no or multiple files match the pattern.

get_include_files()[source]

Retrieves a list of LaTeX files included in the main LaTeX document, e.g. Manual.tex.

Returns:

A list of paths to the included LaTeX files.

Return type:

list

classmethod json_from_directory(directory='.', include_subdirs=True, **kwargs)[source]

Recursively add parameters from all Manual files :param directory: The directory to the LaTeX files, e.g. <sparc-root>/doc/.LaTeX :type directory: str or PosixPath :param include_subdirs: If true, also parse the manual files in submodules, e.g. cyclix, highT :type include_subdirs: bool

Returns:

Formatted json-string of the API

Return type:

str

classmethod json_from_repo(url='https://github.com/SPARC-X/SPARC.git', version='master', include_subdirs=True, **kwargs)[source]

Download the source code from git and use json_from_directory to parse :param url: URL for the repository of SPARC, default is “https://github.com/SPARC-X/SPARC.git” :type url: str :param version: Git version or commit hash of the SPARC repo :type version: str :param include_subdirs: If true, also parse the manual files in submodules, e.g. cyclix, highT :type include_subdirs: bool

Returns:

Formatted json-string of the API

Return type:

str

parse_parameters()[source]

The actual thing for parsing parameters

Sets:

parameters (dict): All parsed parameters parameter_categoris (list): List of categories other_parameters (dict): Any parameters that are not included in the categories

parse_version(parse=True)[source]

Parses and sets the SPARC version based on the C-source file, if possible. The date for the SPARC code is parsed from initialization.c in the “YYYY.MM.DD” format.

Parameters:

parse (bool) – Whether to parse the version from the documentation.

Sets:
self.version (str): The parsed version in ‘YYYY.MM.DD’ format or None,

if either parse=False, or the C-source code is missing

postprocess()[source]

Use the hardcoded dict prostprocess_items to fix some issues

to_dict()[source]

Output a json dict from current document parser

Returns:

All API schemes in dict

Return type:

dict

sparc.docparser.contain_only_bool(text)[source]

Check if a string only contains 0 1 or spaces

sparc.docparser.convert_comment(text)[source]

Used to remove TeX-specific commands in description and remarks as much as possible

Parameters:

text (str) – Raw LaTeX code for the comment section in manual

Returns:

Sanitized plain text

Return type:

str

sparc.docparser.convert_tex_default(text, desired_type=None)[source]

Convert default values as much as possible. The desire type will convert the default values to the closest format

Currently supported conversions 1. Remove all surrounding text modifiers (texttt) 2. Remove all symbol wrappers $ 3. Convert value to single or array

Parameters:
  • text (str) – Raw text string for value

  • desired_type (str or None) – Data type to be converted to. If None, preserve the string format

Returns:

Value converted from raw text

Return type:

converted

sparc.docparser.convert_tex_example(text)[source]

Convert TeX codes of examples as much as possible The examples follow the format SYMBOL: values (may contain new lines) :param text: Single or multiline LaTeX contents :type text: str

Returns:

Sanitized literal text

Return type:

str

sparc.docparser.convert_tex_parameter(text)[source]

Conver a TeX string to non-escaped name (for parameter only) :param text: Parameter name in LaTeX format :type text: str

Returns:

Text with sanitized parameter

Return type:

str

sparc.docparser.is_array(text)[source]

Simply try to convert a string into a numpy array and compare if length is larger than 1 it is only used to compare a float / int value

sparc.docparser.sanitize_default(param_dict)[source]

Sanitize the default field 1. Create an extra field default_remark that copies original default 2. Use convert_tex_default to convert values as much as possible

This function should be called after sanitize_type

sparc.docparser.sanitize_description(param_dict)[source]

Sanitize the description and remark field

Parameters:

param_dict (dict) – Raw dict for one parameter entry

Returns:

Sanitized parameter dict with comment, remark and description

converted to human-readable formats

Return type:

dict

sparc.docparser.sanitize_type(param_dict)[source]

Sanitize the param dict so that the type are more consistent

For example, if type is Double / Integer, but parameter is a vector, make a double vector or integer vector

sparc.docparser.text2value(text, desired_type)[source]

Convert raw text to a desired type

Parameters:
  • text (str) – Text contents for the value

  • desired_type (str) – Target data type from ‘string’, ‘integer’, ‘integer array’, ‘double’, ‘double array’, ‘bool’, ‘bool array’

Returns:

Value converted to the desired type

Return type:

converted