io¶
analysis_helpers.io
¶
load_data(filename)
¶
Load data from a file
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
the name of the file to load |
required |
Returns:
| Name | Type | Description |
|---|---|---|
list |
the data from the file |
Source code in src/analysis_helpers/io.py
iter_file_dfs(paths, branches, tree_name, chunk_size=100000, progress=True, n_workers=1)
¶
Yield one DataFrame per ROOT file.
The function opens each file with uproot, iterates a tree in chunks, and concatenates chunks from the same file into a single DataFrame. Files that cannot be opened/read are skipped.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
paths
|
Iterable[str]
|
Iterable of ROOT file paths. |
required |
branches
|
Branch names to read from the tree. |
required | |
tree_name
|
Name of the TTree inside each file. |
required | |
chunk_size
|
Number of entries per chunk while iterating. |
100000
|
|
progress
|
If True, wrap file iteration with a tqdm progress bar. |
True
|
|
n_workers
|
Number of worker threads used to load files. Use |
1
|
Yields:
| Type | Description |
|---|---|
str
|
tuple[str, pandas.DataFrame]: Pair |
DataFrame
|
readable file containing at least one chunk. |
Source code in src/analysis_helpers/io.py
load_df_incremental(paths, branches, tree_name, chunk_size=100000, progress=True, n_workers=1)
¶
Load and concatenate data from multiple ROOT files.
This function consumes :func:iter_file_dfs and performs one final
concatenation across files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
paths
|
Iterable of ROOT file paths. |
required | |
branches
|
Branch names to read from each tree. |
required | |
tree_name
|
Name of the TTree inside each file. |
required | |
chunk_size
|
Number of entries per chunk while iterating. |
100000
|
|
progress
|
If True, show a file-level progress bar. |
True
|
|
n_workers
|
Number of worker threads used to load files. Use |
1
|
Returns:
| Type | Description |
|---|---|
|
pandas.DataFrame: Concatenated dataframe for all readable files. |
|
|
If no data is read, returns an empty dataframe with |
|
|
columns. |
Source code in src/analysis_helpers/io.py
cache_is_valid(paths, cache_path)
¶
Check whether a cache file is up to date with respect to inputs.
The cache is considered valid when it exists and no readable input file has a modification time newer than the cache file.
Missing input files are ignored, while other stat/mtime failures force the cache to be considered invalid.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
paths
|
Iterable of input file paths. |
required | |
cache_path
|
Path to the cache file. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
True if cache can be reused, False if it should be rebuilt. |