Skip to content

LightCurve

pyvartools.LightCurve is the primary data container for a single photometric time series. Internally it stores data in a pandas DataFrame. The three standard columns — t (time), mag (magnitude), and err (magnitude uncertainty) — are treated specially when present, but all three are optional; any combination of columns is accepted, and pyvartools automatically informs VARTOOLS of the column layout when the light curve is processed. An optional name attribute provides a string label that VARTOOLS uses as the light curve identifier in its output table.


Construction methods

LightCurve.from_file(path, format=None, t_col=<unset>, mag_col=<unset>, err_col=<unset>, hdu=1, name='')

Load a light curve from disk. The file format is auto-detected from the file extension:

  • Files whose names end in .fits, .fit, or .fts (case-insensitive) are read as FITS binary tables using astropy.io.fits. Every column in the table HDU is loaded into the resulting LightCurve under its original FITS column name. The user must explicitly tell from_file which (if any) columns correspond to t, mag, and err via t_col=, mag_col=, err_col=:

    • Pass a string FITS column name to map that column to t/mag/err.
    • Pass None to indicate the LC has no such column (vartools defaults t=NR, mag=0, err=1 at run time).
    • Leaving any of the three unset raises ValueError listing the available FITS columns so you can decide.
  • ASCII files (anything not matching the FITS extensions) are whitespace-delimited. If the file has three or more columns the first three are named t, mag, err and any further columns are named col4, col5, …. If the file has fewer than three columns the columns are named col1, col2, … (since the semantic meaning cannot be inferred). Compressed ASCII files (.gz, .Z, .bz2) are detected by extension and read transparently. t_col, mag_col, err_col are FITS-specific and have no effect for ASCII inputs.

All columns present in the resulting DataFrame are accessible by name to any command when a Pipeline run method is called.

name defaults to the stem of path (filename without directory or extension) if not supplied.

Column layout for disk-based pipeline runs

from_file always reads columns 1–3 as t, mag, err for ASCII files. If your ASCII file has a different layout (e.g. time in column 3), load the file manually or use from_arrays, then pass the correct mapping to Pipeline.run_file() via its columns parameter rather than here.

# ASCII, columns 1-2-3
lc = vt.LightCurve.from_file("EXAMPLES/2")

# Gzipped ASCII — handled transparently.
lc_gz = vt.LightCurve.from_file("EXAMPLES/1.gz")

# FITS: explicit column mapping is required.  All FITS columns also land
# in the LC's DataFrame under their original names.
lc = vt.LightCurve.from_file(
    "EXAMPLES/2.fits",
    t_col="time", mag_col="mag", err_col="err",
)
# lc._df.columns == ['t', 'mag', 'err', 'time', 'mag', 'err', ...]

# FITS where no t-equivalent column exists — pass None, and vartools
# defaults t=NR at run time.
# lc = vt.LightCurve.from_file(
#     "no_time.fits", t_col=None, mag_col="flux", err_col="flux_err",
# )

LightCurve.from_arrays(t, mag, err, aux=None, name='')

Construct a LightCurve directly from NumPy arrays (or anything that converts to a 1-D NumPy array). All three arrays must have the same length.

t, mag, and err are all optional — pass None (or omit them) when VARTOOLS will generate or ignore those vectors internally. aux is an optional dict for any additional named columns. All columns present in the resulting DataFrame are accessible by name to any command when a Pipeline run method is called.

import numpy as np

t       = np.linspace(0, 30, 300)
mag     = 10.0 + 0.1 * np.sin(2 * np.pi * t / 2.3)
err     = np.full(300, 0.01)
airmass = 1.0 + 0.5 * np.abs(np.sin(np.pi * t / 15.0))

# Standard three columns plus an extra
lc = vt.LightCurve.from_arrays(t, mag, err, aux={"airmass": airmass}, name="my_star")

# Only t and mag — no uncertainty column
lc2 = vt.LightCurve.from_arrays(t=t, mag=mag)

# Only auxiliary data — no standard columns at all
phase_arr = (t % 2.3) / 2.3          # phase-fold at 2.3 d period
flux_arr  = 1.0 - 0.01 * np.sin(2 * np.pi * phase_arr)
lc3 = vt.LightCurve.from_arrays(aux={"phase": phase_arr, "flux": flux_arr})

LightCurve.from_files(paths, name='', lcnum_col='lcnum', sort=True, **read_kwargs)

Read several light-curve files and combine them into a single LightCurve. Each file is loaded via from_file, the resulting frames are concatenated, and an integer lcnum_col is filled in (0 for the first file, 1 for the second, …). The combined frame is time-sorted by default.

from_files is appropriate when a single combined LC needs to be fed to Pipeline.run(lc) or to any non-batch entry point. (For combining files inside the pipeline run itself, see Pipeline.run_combinelc() / run_combinelcs().)

lc = vt.LightCurve.from_files(["EXAMPLES/2", "EXAMPLES/3"])
print(list(lc._df.columns))        # ['t', 'mag', 'err', 'lcnum']
print(lc._df["lcnum"].max())       # 1

# Preserve file order instead of time-sorting:
lc2 = vt.LightCurve.from_files(["EXAMPLES/2", "EXAMPLES/3"], sort=False)

# Rename the source-file column:
lc3 = vt.LightCurve.from_files(["EXAMPLES/2", "EXAMPLES/3"], lcnum_col="segment")

For FITS inputs, forward the required t_col= / mag_col= / err_col= keyword arguments to from_file (FITS has no defaults — see the from_file description for the rationale):

# lc = vt.LightCurve.from_files(
#     ["sectorA.fits", "sectorB.fits"],
#     t_col="BJD_TDB", mag_col="MAG", err_col="ERR_MAG",
# )

LightCurve.from_dataframe(df, name='')

Wrap an existing pd.DataFrame. Any DataFrame is accepted — columns named t, mag, and err are treated as the standard VARTOOLS vectors when present; all other columns are preserved as auxiliary variables. None of the three standard columns are required.

import pandas as pd

df = pd.read_csv("EXAMPLES/example.csv")   # columns: t, mag, err
lc = vt.LightCurve.from_dataframe(df, name="example")

LightCurve.from_timeseries(ts, mag_col='mag', err_col='err', name='')

Build a LightCurve from an astropy.timeseries.TimeSeries object. The time column is extracted from ts.time and converted to numeric days (BJD or the native scale of the Time object). mag_col and err_col select the magnitude and uncertainty columns by name.

import numpy as np
from astropy.timeseries import TimeSeries
from astropy.time import Time
import astropy.units as u

# Build a TimeSeries from arrays (e.g. after reading a mission file with astropy)
times = Time(2459000 + np.linspace(0, 30, 300), format="jd", scale="tdb")
ts = TimeSeries(time=times)
ts["sap_flux"]     = (10.5 + 0.08 * np.sin(2 * np.pi * np.linspace(0, 30, 300) / 2.3)) * u.mag
ts["sap_flux_err"] = np.full(300, 0.005) * u.mag

lc = vt.LightCurve.from_timeseries(ts, mag_col="sap_flux", err_col="sap_flux_err")

Properties

Property Type Description
.t np.ndarray or None Time values (days), or None if the column is absent.
.mag np.ndarray or None Magnitude (or flux) values, or None if absent.
.err np.ndarray or None Per-point magnitude uncertainties, or None if absent.
.name str String label for this light curve.
.scalars dict[str, float] Per-star scalar variables (see below).
.cols list[str] All column names, including auxiliary columns (see below).
.shape tuple[int, int] (n_observations, n_columns) — mirrors DataFrame.shape.
.fitsheader astropy.io.fits.Header or None Preserved FITS metadata (see below).

All three array properties return None when the corresponding column is not present in the underlying DataFrame. To modify data, create a new LightCurve from the modified arrays.

Auxiliary columns

Extra columns beyond the standard t / mag / err (e.g. airmass, pixel coordinates, quality flags, per-point reference mags) are stored alongside the standard three in the same backing DataFrame. Access them by name with lc[col]:

import numpy as np
import pyvartools as vt

t = np.arange(100, dtype=float)
mag = 10.0 + 0.01 * np.sin(t / 5)
err = np.full(100, 0.01)
airmass = 1.5 + 0.001 * t
lc = vt.LightCurve.from_arrays(
    t, mag, err,
    aux={"airmass": airmass},
    name="star1",
)

# List all columns
print(lc.cols)                  # ['t', 'mag', 'err', 'airmass']

# Test for membership
print("airmass" in lc)          # True
print("missing" in lc)          # False

# Read a column — returns a numpy array (no copy when the dtype is
# numpy-compatible).  `lc['t']` is equivalent to `lc.t` for the
# three standard columns.
print(lc["airmass"][:3])        # [1.5   1.501 1.502]

Indexing with a missing name raises KeyError; indexing with a non-string raises TypeError. For row-level access, use lc.to_dataframe() (which returns a copy).

When the LightCurve is passed to a Pipeline, pyvartools automatically builds a -inputlcformat flag listing every column name, so any vartools command (for example expr, changevariable, linfit) can reference the auxiliary column by its name:

# The 'airmass' column is visible to -expr because it's in the DataFrame.
result = lc.expr("detrended = mag - 0.05*airmass")
print("detrended" in result.lc.to_dataframe().columns)   # True

.scalars

The canonical per-star scalar store. A dictionary of scalar variables associated with this light curve, populated automatically by pyvartools when a run captures its output LC (capture_lc=True) and used as input to the next segment in a chain.

For single-LC runs, read the scalars directly as result.lc.scalars; for batches, BatchResult.lcscalars collates every LC's dict into a DataFrame. The values live here, not in the Result, so to change what flows into subsequent chain segments mutate result.lc.scalars directly (or build a new LightCurve with a different scalars= kwarg).

Its primary role is to carry state across chained pyvartools commands: when a Result.LS(...).expr(...) chain runs, the prior segment's output columns (and any user-defined scalars) are attached to the next segment's input LightCurve via this dict, so downstream analytic expressions can reference them by name.

Keys are the raw vartools variable names (for OUTCOLUMN values carried forward this is the name with the _N suffix, e.g. "LS_Period_1_0"; for user-defined scalars it is the bare name, e.g. "myvar"). Values are Python float / int scalars.

# Typical flow — users rarely touch .scalars directly; it's populated
# by the chain machinery and harvested from -printallscalars output.
r1 = lc.LS(0.5, 10.0, 0.1)              # r1.lc.scalars is empty on first run
r2 = r1.expr("doubled=2*LS_Period_1_0", vartype="scalar")
# Inside the .expr() call, pyvartools attaches r1's output values to the
# input LightCurve's .scalars and injects them into the new vartools run
# via `-expr const`.  See the Chaining API docs for details.

r2.lc.scalars["doubled"]       # == 2 * r1.vars["LS_Period_1_0"]

You can also construct a LightCurve with an explicit scalars dict — useful for manual chaining or for seeding values in tests:

lc = vt.LightCurve.from_arrays(
    t=t, mag=mag, err=err,
    scalars={"P0": 1.234, "offset": 0.05},
)
# Subsequent commands referencing "P0" or "offset" in expressions will
# resolve to these values.

.fitsheader

The FITS header metadata carried by this light curve, or None if it did not come from a FITS file. Stored as an astropy.io.fits.Header instance — a proper FITS header, not a plain dict, so it supports COMMENT / HISTORY cards, keyword comments, and correct card ordering.

When LightCurve.from_file() reads a FITS file, the primary HDU's header is merged with the data-extension HDU's header (extension wins on conflict), structural keywords are filtered out, and the result is stored here. When .to_file() writes the light curve back to FITS, these preserved keywords are re-emitted onto the output's primary HDU. The data extension gets freshly-derived structural keywords from the current DataFrame — so the column layout matches what the LightCurve actually contains.

Keywords that are stripped on read and write (they must be redetermined from the current data):

  • Global structure: SIMPLE, BITPIX, EXTEND, XTENSION, PCOUNT, GCOUNT, TFIELDS, NAXIS, NAXISn, EXTNAME, EXTVER, END.
  • Per-column: TTYPEn, TFORMn, TDISPn, TSCALn, TZEROn, TNULLn, TBCOLn, TUNITn, TCTYPn, TCRPXn, TCRVLn, TCDLTn, TCUNIn, TCROTn, TDIMn.

Everything else round-trips faithfully, including observational metadata (TELESCOP, OBJECT, DATE-OBS, …) and COMMENT / HISTORY cards.

import os, tempfile
import pyvartools as vt
from astropy.io import fits

# EXAMPLES/10.fits has columns named "time", "mag", "err".
lc = vt.LightCurve.from_file("EXAMPLES/10.fits",
                             t_col="time", mag_col="mag", err_col="err")

# Edit / annotate like a regular astropy Header
lc.fitsheader["TELESCOP"] = "HATPI"
lc.fitsheader["OBJECT"]   = "V1234 Cyg"
lc.fitsheader.add_history("Filtered and detrended with pyvartools")

# Write it out — observational keys go on the primary HDU,
# column structure is regenerated on the extension HDU.
with tempfile.TemporaryDirectory() as td:
    out = os.path.join(td, "annotated.fits")
    lc.to_file(out)
    # Verify the round-trip preserved everything that matters.
    with fits.open(out) as hdul:
        print(hdul[0].header["TELESCOP"])    # 'HATPI'
        print(hdul[0].header["OBJECT"])      # 'V1234 Cyg'
        print(hdul[1].header["TFIELDS"])     # 3 (re-derived, not preserved)

You can also attach a header to a LightCurve built from arrays:

import numpy as np
from astropy.io import fits

hdr = fits.Header()
hdr["TELESCOP"] = "HATPI"
hdr["OBJECT"]   = "Synthetic"

t = np.linspace(0, 30, 300)
lc = vt.LightCurve.from_arrays(
    t, 10.0 + 0.05 * np.sin(t), np.full(300, 0.02),
    fitsheader=hdr,
)

Passing a plain dict is accepted too — it is converted to a Header internally.


Conversion methods

.to_dataframe()pd.DataFrame

Return the full underlying DataFrame, including any auxiliary columns. The index is a default integer index; the three core columns are always named t, mag, and err.

df = lc.to_dataframe()
print(df.head())

.to_arrays()(t, mag, err)

Return the three core arrays as a tuple of NumPy arrays. Equivalent to (lc.t, lc.mag, lc.err) but convenient for unpacking.

t, mag, err = lc.to_arrays()

.to_timeseries()astropy.timeseries.TimeSeries

Convert to an astropy TimeSeries. The t column is converted to an astropy.time.Time object (scale 'tdb', format 'jd'). Magnitude and uncertainty are stored as columns named mag and err. Auxiliary columns are included as additional columns.

import tempfile

ts = lc.to_timeseries()
with tempfile.NamedTemporaryFile(suffix=".ecsv", delete=False) as f:
    ts.write(f.name, format="ascii.ecsv", overwrite=True)

.to_file(path, format=None) — serialise to disk

Write the light curve to an ASCII or FITS file. format is auto-detected from the file extension when omitted (.fits/.fit/.fts → FITS; everything else → ASCII). ASCII output is whitespace-separated with 10 decimal places of precision and no header row, matching the vartools default. FITS output requires astropy.

import tempfile, pyvartools as vt

lc = vt.LightCurve.from_file("EXAMPLES/2")

with tempfile.TemporaryDirectory() as td:
    lc.to_file(f"{td}/out.txt")             # ASCII (auto by extension)
    lc.to_file(f"{td}/out.dat", format="ascii")  # ASCII (explicit)
    lc.to_file(f"{td}/out.fits")            # FITS (auto by extension)

Utility methods

.plot(ax=None, **kwargs)matplotlib.axes.Axes

Quick-look plot of the light curve. Uses ax.errorbar when an err column is present, otherwise falls back to ax.plot. The y-axis is inverted automatically to match astronomical magnitude convention; the light curve's name is used as the plot title when set. Any keyword arguments are forwarded to the underlying errorbar / plot call, so defaults like fmt, markersize, color, etc. can be overridden.

Raises ValueError if the light curve has no t or mag column. Requires matplotlib.

import matplotlib
matplotlib.use("Agg")   # headless backend — skip for interactive use
import matplotlib.pyplot as plt
import pyvartools as vt

lc = vt.LightCurve.from_file("EXAMPLES/2")

# Default quick-look with errorbars
ax = lc.plot()

# Override styling (e.g. for a larger, coloured-blue figure on existing axes)
fig, ax = plt.subplots(figsize=(6, 3))
lc.plot(ax=ax, color="C0", markersize=2)

Special methods

Operation Result
len(lc) Number of data points.
lc[col] Column as a numpy array (KeyError if missing, TypeError for non-string keys).
col in lc True if col is a column name.
repr(lc) Summary string — e.g. LightCurve(name='my_star', n=300, cols=['t', 'mag', 'err']).

Full example

import numpy as np
import pyvartools as vt

# From file (auto-detects ASCII)
lc = vt.LightCurve.from_file("EXAMPLES/2")

# From FITS — t_col/mag_col/err_col are required (no defaults).  Pass the
# matching FITS column names, or None for any LC that doesn't have that
# column.
lc = vt.LightCurve.from_file("EXAMPLES/example.fits",
                              t_col="BJD", mag_col="Mag", err_col="Err")

# From arrays
t = np.linspace(0, 30, 300)
mag = 10.0 + 0.1*np.sin(2*np.pi*t/2.3)
err = np.full(300, 0.01)
lc = vt.LightCurve.from_arrays(t, mag, err, name="my_star")

print(lc)       # LightCurve(name='my_star', n=300, cols=['t', 'mag', 'err'])
print(len(lc))  # 300

# Round-trip through a DataFrame
df = lc.to_dataframe()
lc2 = vt.LightCurve.from_dataframe(df, name="my_star")