Container

csv

rxsci.container.csv.create_line_parser(dtype=None, none_values=[], separator=',', escapechar='\\', ignore_error=False, schema_name='x')

creates a parser for csv lines

Parameters:

dtype – [Optional] A list of (name, type) tuples, or a typing.NamedTuple class. When set to None, then the csv header is used to create a schema where all columns are parsed as strings.
none_values – [Optional] Values to consider as None values
separator – [Optional] Token used to separate each columns
ignore_error – [Optional] when set to True, any line that does not match the provided number of columns raise an error an stop the parsing. When set to False, error lines are skipped.

Returns:

A Parsing function, that can parse text lines as specified in the parameters.

rxsci.container.csv.dump(header=True, separator=',', escapechar='\\', newline='\n')

dumps an observable to csv.

The source must be an Observable.

Parameters:

header – [Optional] indicates whether a header line must be added.
separator – [Optional] Token used to separate each columns.
newline – [Optional] Character(s) used for end of line.

Returns:

An observable string items, where each item is a csv line.

rxsci.container.csv.dump_to_file(filename, header=True, separator=', ', escapechar='\\', newline='\n', encoding=None, open_obj=<built-in function open>)

dumps each item to a csv file.

The source must be an Observable.

Parameters:

filename – Path of the file to read or a file object
header – [Optional] indicates whether a header line must be added.
separator – [Optional] Token used to separate each columns.
newline – [Optional] Character(s) used for end of line.
content (encoding [Optional] Encoding used to parse the text)
open_obj – [Optional] A custom function used to open the provided file.

Returns:

An empty observable that completes on success when the source observable completes or completes on error if there is an error while writing the csv file.

rxsci.container.csv.load(parse_line=<function create_line_parser.<locals>._parse>, skip=0)

Loads a csv observable.

The source observable must emit one csv row per item The source must be an Observable.

Parameters:

parse_line – A line parser, e.g. created with create_line_parser
skip – number of items to skip before parsing (excluding the header)

Returns:

An observable of namedtuple items, where each key is a csv column

rxsci.container.csv.load_from_file(filename, parse_line=<function create_line_parser.<locals>._parse>, skip=0, encoding=None, open_obj=<built-in function open>)

Loads a csv file.

This factory loads the provided file and returns its content as an observable emitting one item per line.

Parameters:

filename – Path of the file to read or a file object
parse_line – A line parser, e.g. created with create_line_parser
skip – [Optional] Number of lines to skip before parsing (excluding the header)
content (encoding [Optional] Encoding used to parse the text)
open_obj – [Optional] A custom function used to open the provided file.

Returns:

An observable of namedtuple items, where each key is a csv column

json

rxsci.container.json.dump(newline='\n')

dumps an observable to JSON.

If the source observable emits several items, then they are framed as JSON line. The source must be an Observable.

Parameters:: newline – [Optional] Character(s) used for end of line.
Returns:: An observable of string items, where each item is a JSON string.

rxsci.container.json.dump_to_file(filename, newline='\n', encoding='utf-8', compression=None, open_obj=<built-in function open>)

dumps each item to a JSON file.

The source must be an Observable.

The open_obj function must return a file-like object. Its prototype is:: open_obj(filename: str, mode: str, encoding: str) -> file-like object

Parameters:

filename – Path of the file to read or a file object
newline – [Optional] Character(s) used for end of line.
content (encoding [Optional] Encoding used to parse the text)
open_obj – A function to open the source file.

Returns:

An empty observable that completes on success when the source observable completes or completes on error if there is an error while writing the csv file.

rxsci.container.json.load(skip=0, ignore_error=False)

Loads a json observable.

The source observable must emit one JSON string per item The source must be an Observable.

Parameters:

skip – number of items to skip before parsing
ignore_error – Ignore errors while parsing JSON

Returns:

An observable of dicts corresponding to the source json content.

rxsci.container.json.load_from_file(filename, lines=True, skip=0, ignore_error=False, encoding='utf-8', compression=None, open_obj=<built-in function open>)

Loads a json file.

This factory loads the provided file. The format of the returned observable depends on the lines parameter.

The open_obj function must return a file-like object. Its prototype is:: open_obj(filename: str, mode: str, encoding: str) -> file-like object

Parameters:

filename – Path of the file to read or a file object
lines – Parse file as a JSON Line when set to True, as a single JSON object otherwise.
skip – [Optional] Number of lines to skip before parsing
ignore_error – Ignore errors while parsing MessagePack
content (encoding [Optional] Encoding used to parse the text)
[Optional] (compression) – ‘gzip’ or ‘zstd’
open_obj – A function to open the source file.

Returns:

An observable of objects.

parquet

rxsci.container.parquet.dump_to_file(filename, schema, batch_size=1024, row_group_size=None, compression='snappy', encryption_properties=None, open_obj=<built-in function open>)

dumps each item to a parquet file.

The source must be an Observable.

Parameters:

filename (str) – Path of the file to write or a file object
schema (Schema) – The schema of the data to write.
batch_size (int) – Size of internal batches when writing in the parquet file
row_group_size (Optional[int]) – [Optional] Size of a row group. See pyarrow.parquet.ParquetWriter for more informaton.
compression (str) – [Optional] compression codec: none, snappy, gzip, brotli, lz4, zstd.
encryption_properties (Optional[FileEncryptionProperties]) – [Optional] encryption configuration.

Returns:

An empty observable that completes on success when the source observable completes or completes on error if there is an error while writing the parquet.

rxsci.container.parquet.load_from_file(filename, batch_size=1024, decryption_properties=None, open_obj=<built-in function open>)

loads a parquet file.

This factory loads the provided parquet file.

The open_obj function must return a file-like object. Its prototype is: open_obj(filename: str, mode: str, encoding: str) -> file-like object

Parameters:

filename (str) – Path of the file to write or a file object
batch_size (int) – Size of internal batches when writing in the parquet file
decryption_properties (Optional[FileDecryptionProperties]) – [Optional] decryption configuration.
open_obj – A function to open the source file.

Returns:

An observable of objects.