DNA-Spot/Trace Data core table

Requirement level: required

Summary

This is the mandatory core table of the 4DN FISH-omics Format for Chromatin Tracing. This table is used to record and exchange the primary results of Chromatin Tracing experiments. The Table is organized around individual DNA bright Spots that are spatially linked together in a three-dimensional (3D) polymeric Trace using a 3D polymeric tracing algorithm. As a result, all Spots that share the same Trace_ID, by definition belong to the same Trace.

Each row reports the X, Y, Z localization, and the Trace assignment (i.e., Trace_ID) of a FISH-omics bright Spot and corresponds to a specific genomic DNA target sequence identified by chromosome ID (Chrom), and by start (Chrom_Start) and end (Chrom_End) chromosome coordinates. In this table the reported X, Y, Z coordinates are assumed to result from post-processing and quality control procedures and therefore correspond to the final localization of the DNA target under study.

At a minimum the Table has to have 8 columns in the following order: Spot_ID, Trace_ID, X, Y, Z, Chrom, Chrom_Start, Chrom_End. These are required. Additionally in case sub-cellular structures, cells or extra cellular structures (e.g., Tissue) are identified as part of this experiment, this table has to mandatorily include the ID of the Sub_Cellular, Cell or Extra Cellular Structure Region of Interest (ROI) each Spot/Trace is associated with.

All other spot properties must be kept in the two additional tables Spot Quality table and Spot Biological Data table, indexed by Spot_ID and as described in the instructions for those tables. Additionally, in the case in which the final localization of DNA target results from combining multiple detection events (e.g., by combining localization events from different focal planes or times), the underlying raw data can be recorded in the corresponding Spot Demultiplexing table table as described in the instructions of that table.

Finally, Spot_ID identifiers are unique across the entire dataset, thus allowing to identify unambiguously a Spot in the Spot Quality table, Spot Biological Data table and Spot Demultiplexing table.

NOTE: Also RNA Spots have a Spot_ID (in the RNA-Spot Data table). Thus, when assigning an identifier to each Spot, make sure that this is unique not only within the DNA-Spot/Trace Data core table, but also in the RNA-Spot Data table if present.

Example

##FOF-CT_version=v0.1
##Table_namespace=4dn_FOF-CT_core
##genome_assembly=GRCh38
##XYZ_unit=micron
#Software_Title: ChrTracer3
#Software_Type: SpotLoc+Tracing
#Software_Authors: Mateo, LJ; Sinnott-Armstrong, N; Boettiger, AN
#Software_Description: ChrTracer3 software was developed for analysis of raw DNA labeled images. As an input, it takes an.xlsx table containing information and folder names of the DNA experiment. As an output, it returns tab delimited.txt files with drift-corrected x, y, z positions for all labeled barcodes. These can be used directly to calculate the nm scale distances between all pairs of labeled loci. The current version of the software as of this writing is ChrTracer3.
#Software_Repository: https://github.com/BoettigerLab/ORCA-public
#Software_PreferredCitationID: https://doi.org/10.1038/s41596-020-00478-x
#lab_name: Nobel
#experimenter_name: John Doe
#experimenter_contact: john.doe@email.com
#additional_tables: 4dn_FOF-CT_quality, 4dn_FOF-CT_rna, 4dn_FOF-CT_trace, 4dn_FOF-CT_cell
##columns=(Spot_ID, Trace_ID, X, Y, Z, Chrom, Chrom_Start, Chrom_End, Cell_ID)
1, 1, 14.43, 41.43, 1.23, chr1, 0001, 1000, 1
2, 1, 14.83, 41.83, 1.83, chr1, 1001, 2000, 1
3, 1, 15.83, 42.83, 1.33, chr1, 2001, 3000, 1
4, 2, 20.43, 50.43, 1.23, chr1, 0002, 2000, 1
5, 2, 21.83, 60.83, 1.83, chr1, 1002, 3000, 1

File Header

  • The first line in the header is always “##FOF-CT_version=vX.X”

  • The second line in the header is always “##Table_namespace=4dn_FOF-CT_core”

The header MUST contain a mandatory set of fields that describe the algorithm(s) that were used to identify and localize bright Spots and to connect them to form Traces. In case more than one algorithm were used, please use the same set of fields for each of the algorithm used.

The columns for this table are mandatory and do not need to be described in the header.

Name

Description

Example

Conditional requirement conditions

##FOF-CT_version=

Version of the FOF format used in this case.

v0.1

##Table_namespace=

Identifier for this type of table. Value must be as in the example.

4dn_FOF-CT_core

#lab_name:

name of the lab where the experiment was performed.

Nobel

#experimenter_name:

name of the person performing the experiment.

John Doe

#experimenter_contact:

email address of the person performing the experiment.

john.doe@email.com

#description:

A free-text, description of the experiment and of the data recorded in this table. This description should provide a clear understanding of the process utilized to produce the data and contain sufficient details to ensure interpretation and reproducibility.

#Software_Title:

The name of the Software(s) that were used in this case for localizing individual FISH-omics bright Spots and/or to produce three-dimensional (3D) polymeric chromatin Traces.

ChrTracer3

#Software_Type:

The type of this Software. Allowed values: SpotLoc, Tracing, SpotLoc+Tracing, Segmentation, QC, Other

SpotLoc+Tracing

#Software_Authors:

The Name(s) of the individual Author(s) of this Software. In case there are more than one Authors, individual names should be listed as follows, Doe, John; Smith, Jane; etc,.

Mateo, LJ; Sinnott-Armstrong, N; Boettiger, AN

#Software_Description:

A free-text, description of this Software. This description should provide a detailed understanding of the algortithm and of the analysis parameters that were used, in order to guarantee interpretation and reproducibility.

ChrTracer3 software was developed for analysis of raw DNA labeled images. As an input, it takes an.xlsx table containing information and folder names of the DNA experiment. As an output, it returns tab delimited.txt files with drift-corrected x, y, z positions for all labeled barcodes. These can be used directly to calculate the nm scale distances between all pairs of labeled loci. The current version of the software as of this writing is ChrTracer3.

#Software_Repository:

The URL of any repository or archive where the Software executable release can be obtained.

https://github.com/BoettigerLab/ORCA-public

#Software_PreferredCitationID:

The Unique Identifier for the preferred/primary publication describing this Software. Examples include, Digital Object Identifier (DOI), PubMed Central Identifier (PMCID), ArXiv.org ID etc,.

https://doi.org/10.1038/s41596-020-00478-x

#additional_tables:

list of the additional tables being submitted. Note: use a comma to separate each table name from the next.

4dn_FOF-CT_rna, 4dn_FOF-CT_quality, 4dn_FOF-CT_bio, 4dn_FOF-CT_trace, 4dn_FOF-CT_cell

##genome_assembly=

Genome build. Note: the 4DN Data Portal only accepts GRCh38 for human and GRCm38 for mouse.

GRCh38

##XYZ_unit=

The unit used to represent the XYZ location of bright Spots in this table. Note: use micron (instead of µm) to avoid problems with special, Greek symbols. Other allowed values are: nm, mm etc.

micron

##columns=

list of the data column headers used in the table. Note: enclose the column headers and use a comma to separate each header name from the next.

(Spot_ID, X, Y, Z)

Data Columns

As with all other Spot Data tables in this format, each row corresponds to data associated with an individual Spot.

The first columns are always: Spot_ID, Trace_ID, X, Y, Z, Chrom, Chrom_Start, Chrom_End. Additionally in case sub-cellular structures, cells or extra cellular structures are identified as part of this experiment, the subsequent columns must mandatorily be Sub_Cell_ROI_ID, Cell_ID or Extra_Cell_ROI_ID, respectively.

The order of the rows is at user’s discretion.

Name

Description

Example

Conditional requirement conditions

Spot_ID

A unique identifier for this bright Spot.

Trace_ID

In case multiple DNA Spots are connected to form 3D polymer traces of chromatin fibers (such as in ORCA; https://doi.org/10.1038/s41596-020-00478-x), this fields reports a unique identifier for the DNA trace the Spot belongs to. Note: this is used to connect Spots that are part of the same polymeric Trace. It is also used to connect data in this table with any Trace specific measurements such as nascent RNA expression, recorded in the corresponding Trace Data table.

1

X

The sub-pixel X coordinate of this bright Spot. NOTE: the reported X position is understood to be the one resulting from any performed post-processing correction procedures (i.e. drift correction, chromatic correction etc).

Y

The sub-pixel Y coordinate of this bright Spot. NOTE: the reported Y position is understood to be the one resulting from any performed post-processing correction procedures (i.e. drift correction, chromatic correction etc).

Z

The sub-pixel Z coordinate of this bright Spot. NOTE: the reported Z position is understood to be the one resulting from any performed post-processing correction procedures (i.e. drift correction, chromatic correction etc).

Chrom

Chromosome name. Because BED (Browser Extensible Data) is the de facto exchange bioinformatics format for genomic data, the BED terminology was used here.

chr3, chrY, chr2_random

Chrom_Start

Start coordinate on the Chromosome for the sequence associated with this bright Spot (the first base on the chromosome is numbered 0). Because BED (Browser Extensible Data) is the de facto exchange bioinformatics format for genomic data, the BED terminology was used here.

0

Chrom_End

Stop coordinate on the Chromosome for the sequence associated with this bright Spot. This position is non-inclusive, unlike Chrom_Start. Because BED (Browser Extensible Data) is the de facto exchange bioinformatics format for genomic data, the BED terminology was used here.

1000

Sub_Cell_ROI_ID

If known, this field reports the unique identifier for a Region of Interest (ROI) that represents the boundaries of a sub-cellular structure a given Spot/Trace is associated with. Note: this is used to connect individual Spot/Traces that are part of the same ROI. It is also used to connect data in this table with any ROI specific measurements such as boundaries, intensities or volume, recorded in the corresponding Sub-Cell ROI Data table.

1

Conditional requirement: this column is mandatory if data in this table can be associated with a Sub_Cell_ROI identified as part of this experiment.

Cell_ID

If known, this field reports the unique identifier for the Cell a given Spot/Trace is associated with. Note: this is used to connect individual Spot/Traces that are part of the same Cell. It is also used to connect data in this table with any Cell specific measurements such as boundaries, intensities and volume, recorded in the corresponding Cell Data table.

1

Conditional requirement: this column is mandatory if data in this table can be associated with a Cell identified as part of this experiment.

Extra_Cell_ROI_ID

If known, this field reports the unique identifier for a Region of Interest (ROI) that represents the boundaries of a extracellular structure (e.g., Tissue) a given Spot/Trace is associated with. Note: this is used to connect individual Spot/Traces that are part of the same ROI. It is also used to connect data in this table with any ROI specific measurements such as boundaries, intensities and volume, recorded in the corresponding Extra-Cell ROI Data table.

1

Conditional requirement: this column is mandatory if data in this table can be associated with a extracellular structure ROI (e.g., Tissue) identified as part of this experiment.