Container
csv
- rxsci.container.csv.create_line_parser(dtype=None, none_values=[], separator=',', escapechar='\\', ignore_error=False, schema_name='x')
creates a parser for csv lines
- Parameters:
dtype – [Optional] A list of (name, type) tuples, or a typing.NamedTuple class. When set to None, then the csv header is used to create a schema where all columns are parsed as strings.
none_values – [Optional] Values to consider as None values
separator – [Optional] Token used to separate each columns
ignore_error – [Optional] when set to True, any line that does not match the provided number of columns raise an error an stop the parsing. When set to False, error lines are skipped.
- Returns:
A Parsing function, that can parse text lines as specified in the parameters.
- rxsci.container.csv.dump(header=True, separator=',', escapechar='\\', newline='\n')
dumps an observable to csv.
The source must be an Observable.
- Parameters:
header – [Optional] indicates whether a header line must be added.
separator – [Optional] Token used to separate each columns.
newline – [Optional] Character(s) used for end of line.
- Returns:
An observable string items, where each item is a csv line.
- rxsci.container.csv.dump_to_file(filename, header=True, separator=', ', escapechar='\\', newline='\n', encoding=None, open_obj=<built-in function open>)
dumps each item to a csv file.
The source must be an Observable.
- Parameters:
filename – Path of the file to read or a file object
header – [Optional] indicates whether a header line must be added.
separator – [Optional] Token used to separate each columns.
newline – [Optional] Character(s) used for end of line.
content (encoding [Optional] Encoding used to parse the text)
open_obj – [Optional] A custom function used to open the provided file.
- Returns:
An empty observable that completes on success when the source observable completes or completes on error if there is an error while writing the csv file.
- rxsci.container.csv.load(parse_line=<function create_line_parser.<locals>._parse>, skip=0)
Loads a csv observable.
The source observable must emit one csv row per item The source must be an Observable.
- Parameters:
parse_line – A line parser, e.g. created with create_line_parser
skip – number of items to skip before parsing (excluding the header)
- Returns:
An observable of namedtuple items, where each key is a csv column
- rxsci.container.csv.load_from_file(filename, parse_line=<function create_line_parser.<locals>._parse>, skip=0, encoding=None, open_obj=<built-in function open>)
Loads a csv file.
This factory loads the provided file and returns its content as an observable emitting one item per line.
- Parameters:
filename – Path of the file to read or a file object
parse_line – A line parser, e.g. created with create_line_parser
skip – [Optional] Number of lines to skip before parsing (excluding the header)
content (encoding [Optional] Encoding used to parse the text)
open_obj – [Optional] A custom function used to open the provided file.
- Returns:
An observable of namedtuple items, where each key is a csv column
json
- rxsci.container.json.dump(newline='\n')
dumps an observable to JSON.
If the source observable emits several items, then they are framed as JSON line. The source must be an Observable.
- Parameters:
newline – [Optional] Character(s) used for end of line.
- Returns:
An observable of string items, where each item is a JSON string.
- rxsci.container.json.dump_to_file(filename, newline='\n', encoding='utf-8', compression=None, open_obj=<built-in function open>)
dumps each item to a JSON file.
The source must be an Observable.
- The open_obj function must return a file-like object. Its prototype is:
open_obj(filename: str, mode: str, encoding: str) -> file-like object
- Parameters:
filename – Path of the file to read or a file object
newline – [Optional] Character(s) used for end of line.
content (encoding [Optional] Encoding used to parse the text)
open_obj – A function to open the source file.
- Returns:
An empty observable that completes on success when the source observable completes or completes on error if there is an error while writing the csv file.
- rxsci.container.json.load(skip=0, ignore_error=False)
Loads a json observable.
The source observable must emit one JSON string per item The source must be an Observable.
- Parameters:
skip – number of items to skip before parsing
ignore_error – Ignore errors while parsing JSON
- Returns:
An observable of dicts corresponding to the source json content.
- rxsci.container.json.load_from_file(filename, lines=True, skip=0, ignore_error=False, encoding='utf-8', compression=None, open_obj=<built-in function open>)
Loads a json file.
This factory loads the provided file. The format of the returned observable depends on the lines parameter.
- The open_obj function must return a file-like object. Its prototype is:
open_obj(filename: str, mode: str, encoding: str) -> file-like object
- Parameters:
filename – Path of the file to read or a file object
lines – Parse file as a JSON Line when set to True, as a single JSON object otherwise.
skip – [Optional] Number of lines to skip before parsing
ignore_error – Ignore errors while parsing MessagePack
content (encoding [Optional] Encoding used to parse the text)
[Optional] (compression) – ‘gzip’ or ‘zstd’
open_obj – A function to open the source file.
- Returns:
An observable of objects.
parquet
- rxsci.container.parquet.dump_to_file(filename, schema, batch_size=1024, row_group_size=None, compression='snappy', encryption_properties=None, open_obj=<built-in function open>)
dumps each item to a parquet file.
The source must be an Observable.
- Parameters:
filename (
str
) – Path of the file to write or a file objectschema (
Schema
) – The schema of the data to write.batch_size (
int
) – Size of internal batches when writing in the parquet filerow_group_size (
Optional
[int
]) – [Optional] Size of a row group. See pyarrow.parquet.ParquetWriter for more informaton.compression (
str
) – [Optional] compression codec: none, snappy, gzip, brotli, lz4, zstd.encryption_properties (
Optional
[FileEncryptionProperties
]) – [Optional] encryption configuration.
- Returns:
An empty observable that completes on success when the source observable completes or completes on error if there is an error while writing the parquet.
- rxsci.container.parquet.load_from_file(filename, batch_size=1024, decryption_properties=None, open_obj=<built-in function open>)
loads a parquet file.
This factory loads the provided parquet file.
The open_obj function must return a file-like object. Its prototype is: open_obj(filename: str, mode: str, encoding: str) -> file-like object
- Parameters:
filename (
str
) – Path of the file to write or a file objectbatch_size (
int
) – Size of internal batches when writing in the parquet filedecryption_properties (
Optional
[FileDecryptionProperties
]) – [Optional] decryption configuration.open_obj – A function to open the source file.
- Returns:
An observable of objects.