DNA-Spot/Trace Data core table

Requirement level: required

Namespace: 4dn_FOF-CT_core

Summary

This is the mandatory core table of the 4DN FISH-omics Format for Chromatin Tracing. This table is used to record and exchange the primary results of Chromatin Tracing experiments, both in the case in which the genome under study is un-modified and in the case in which it contains INSERTIONS or DELETIONS.

The core table is organized around individual DNA bright Spots that generally are spatially linked together in a three-dimensional (3D) polymeric Trace using a 3D polymeric tracing algorithm. As a result, all Spots that share the same Trace_ID, by definition belong to the same Trace.

In this table, each row reports the X, Y, Z localization, and the Trace assignment (i.e., Trace_ID) of a FISH-omics bright Spot and corresponds to a specific genomic DNA target sequence identified by chromosome ID (Chrom), and by start (Chrom_Start) and end (Chrom_End) chromosome coordinates. In this table the reported X, Y, Z coordinates are assumed to result from post-processing and quality control procedures and therefore correspond to the final localization of the DNA target under study.

Tip

The 4DN Data Portal only accepts GRCh38 for human and GRCm38 for mouse. For other species follow these instructions. In addition, in case the genome under study contains INSERTION/DELETIONs follow also these Instructions for when the genome under study is modified.

At a minimum the Table has to have 8 columns in the following order: Spot_ID, Trace_ID, X, Y, Z, Chrom, Chrom_Start, Chrom_End. These are required. Additionally in case sub-cellular structures, cells or extra cellular structures (e.g., Tissue) are identified as part of this experiment, this table has to mandatorily include the ID of the Sub_Cellular, Cell or Extra Cellular Structure Region of Interest (ROI) each Spot/Trace is associated with.

All other spot properties must be kept in the two additional tables Spot Quality table and Spot Biological Data table, indexed by Spot_ID and as described in the instructions for those tables. Additionally, in the case in which the final localization of DNA target results from combining multiple detection events (e.g., by combining localization events from different focal planes or times), the underlying raw data can be recorded in the corresponding Spot Demultiplexing table table as described in the instructions of that table.

Tip

Spot_ID identifiers are unique across the entire dataset, thus allowing to identify unambiguously a Spot in the Spot Quality table, Spot Biological Data table and Spot Demultiplexing table.

Warning

All MANDATORY header fields and column names are indicated in bold. All conditionally required header fields and column names are indicated in italics.

Instructions for when the genome under study is modified

Instructions for reporting the location of DNA Spots and Traces in case the genome under study contains insertions or deletions:

  1. Add the custom-build prefix to the genome build name and introduce a descriptive name detailing the nature of the genome modification.

  2. Insert the following additional fields in the File header

  3. ##modification to indicate the nature and location of the modification

  4. ##VCF_File_name to indicate the name of the mandatory Variant Call Format (VCF) file to be included with the FOF-CT dataset to report the nature and location of the genome modification.

  5. ##VCF_version to indicate the VCF version used for the VCF file describing the nature and location of the genome modification.

    • Attach a separate VCF file with your FOF-CT dataset to describe the nature and location of the genome modification.

    • In the Chrom column insert the name of the inserted or deleted DNA fragment.

    • In the ChromStart and ChromEnd columns insert the Start and End coordinates of the target chromosome segment with respect to the INSERTION or DELETION.

File Header

  • For full instructions see File Header

  • The first line in the header is always ##FOF-CT_Version=vX.X.

  • The second line in the header is always ##Table_Namespace=4dn_FOF-CT_mapping.

Tip

The header MUST contain a mandatory set of fields that describe any Software tool that was used to produce/process data in this table. If more than one software tool was used, please repeat a set of Software-fields for describing each of them.

All columns for this table are mandatory and do not need to be described in the header.

Name

Description

Example

Conditional requirement conditions

##FOF-CT_Version=

Version of the FOF format used in this case.

v1.0

##Table_Namespace=

Identifier for this type of table. Value must be as in the example.

4dn_FOF-CT_core

##Genome_Assembly=

Genome build. Notes: (1) the 4DN Data Portal only accepts GRCh38 for human and GRCm38 for mouse. For other species see https://data.4dnucleome.org/search/?type=Organism; (2) in case the genome under study contains an iNSERTION or a DELETION, indicate this by adding the mandatory custom-build prefix to the build name and using a descriptive name indicating the nature of the genome modification (e.g., GRCm38+pJT039(insertion).

GRCh38

Conditional requirement: if the genome under study contains an INSERTION or a DELETION, this field MUST use the custom-build prefix and contain a descriptive name indicating the nature of the genome modification (e.g., GRCm38+pJT039(insertion).

##Modification=

In case the genome under study contains an iNSERTION or a DELETION, this field is used to provide a description of the nature and genomic position of DNA inseertion or deletion.

pJT039:chr3(insertion 0001-2500)

Conditional requirement: this MUST be reported if the genome under study contains an INSERTION or a DELETION.

##VCF_File_Name=

In case the genome under study contains an iNSERTION or a DELETION, this field is used to provide the name of the Variant Call Format (VCF) file that MUST be included with the dataset to report the nature and location of the genome INSERTION or DELETION.

pJT039:chr3.vcf

Conditional requirement: this MUST be reported if the genome under study contains an INSERTION or a DELETION.

##VCF_Version=

In case the genome under study contains an iNSERTION or a DELETION, this field is used to provide the VCF version used for the VCF file describing the nature and location of the genome INSERTION or DELETION.

v4.2

Conditional requirement: this MUST be reported if the genome under study contains an INSERTION or a DELETION.

##XYZ_Unit=

If relevant, the unit used to represent XYZ locations or distances in this table. Note: use micron to avoid problem with special, Greek symbols. Other allowed values should be drawn from SI units of Length. Examples: ‘nm’, ‘micron’ ‘mm’ etc.

micron

#Lab_Name:

name of the lab where the experiment was performed.

Nobel

#Experimenter_Name:

name of the person performing the experiment.

John Doe

#Experimenter_Contact:

email address of the person performing the experiment.

john.doe@email.com

#Description:

A free-text, description of the experiment and of the data recorded in this table. This description should provide a clear understanding of the process utilized to produce the data and contain sufficient details to ensure interpretation and reproducibility.

#Software_Title:

The name of the Software tool that was used to produce the results reported in this table. If more than one software tool was used, please repeat a set of Software-fields for describing each of them.

ChrTracer3

#Software_Type:

The type of this Software used to produce results recorded in this table. Allowed values: SpotLoc, Tracing, SpotLoc+Tracing, Segmentation, QC, Other

SpotLoc+Tracing

#Software_Authors:

The Name(s) of the individual Author(s) of this Software. In case there are more than one Authors, individual names should be listed as follows, Doe, John; Smith, Jane; etc,.

Mateo, LJ; Sinnott-Armstrong, N; Boettiger, AN

#Software_Description:

A free-text, description of this Software. This description should provide a detailed understanding of the algortithm and of the analysis parameters that were used, in order to guarantee interpretation and reproducibility.

ChrTracer3 software was developed for analysis of raw DNA labeled images. As an input, it takes an.xlsx table containing information and folder names of the DNA experiment. As an output, it returns tab delimited.txt files with drift-corrected x, y, z positions for all labeled barcodes. These can be used directly to calculate the nm scale distances between all pairs of labeled loci. The current version of the software as of this writing is ChrTracer3.

#Software_Repository:

The URL of any repository or archive where the Software executable release can be obtained.

https://github.com/BoettigerLab/ORCA-public

#Software_PreferredCitationID:

The Unique Identifier for the preferred/primary publication describing this Software. Examples include, Digital Object Identifier (DOI), PubMed Central Identifier (PMCID), ArXiv.org ID etc,.

https://doi.org/10.1038/s41596-020-00478-x

#Additional_Tables:

list of the additional tables being submitted. Note: use a comma to separate each table name from the next.

4dn_FOF-CT_rna, 4dn_FOF-CT_quality, 4dn_FOF-CT_bio, 4dn_FOF-CT_trace, 4dn_FOF-CT_cell

##Columns=

list of the data column headers used in the table. Note: enclose the column headers and use a comma to separate each header name from the next.

(Spot_ID, Trace_ID, X, Y, Z, Chrom, Chrom_Start, Chrom_End, Cell_ID, Sub_Cell_ROI_ID, Extra_Cell_ROI_ID)

Data Columns

As with all other Spot Data tables, each row corresponds to data associated with an individual Spot.

The first columns are always: Spot_ID, Trace_ID, X, Y, Z, Chrom, Chrom_Start, Chrom_End. Additionally in case sub-cellular structures, cells or extra cellular structures are identified as part of this experiment, the subsequent columns must mandatorily be Sub_Cell_ROI_ID, Cell_ID or Extra_Cell_ROI_ID, respectively.

The order of the rows is at user’s discretion.

core_columns

Name

Description

Example

Conditional requirement conditions

Spot_ID

A unique identifier for this bright DNA Spot.

1

Trace_ID

In ball-and-stick Chromatin Tracing experiments, tracing algorithms are used to interlink individual DNA Spots in 3D polymeric traces delineating the conformation of chromatin fibers (such as in ORCA). As such, this fields reports a unique identifier for the Trace each Spot belongs to. Note: The purpose of this field is to connect Spots that are part of the same polymeric Trace. It is also used to connect individual Traces recorded in this table with global Trace properties reported elsewhere in the dataset (i.e., Trace Data table and RNA Spot Data table).

1

X

The sub-pixel X coordinate of this bright Spot. NOTE: the reported X position is understood to be the one resulting from any performed post-processing correction procedures (i.e. drift correction; chromatic correction etc).

14.43

Y

The sub-pixel Y coordinate of this bright Spot. NOTE: the reported Y position is understood to be the one resulting from any performed post-processing correction procedures (i.e. drift correction; chromatic correction etc).

41.43

Z

The sub-pixel Z coordinate of this bright Spot. NOTE: the reported Z position is understood to be the one resulting from any performed post-processing correction procedures (i.e. drift correction; chromatic correction etc).

1.23

Chrom

Chromosome name. Because BED (Browser Extensible Data) is the de facto exchange bioinformatics format for genomic data; the BED terminology was used here.

chr3; chrY; chr2_random

Chrom_Start

Start coordinate on the Chromosome for the sequence associated with this bright Spot (the first base on the chromosome is numbered 0). Because BED (Browser Extensible Data) is the de facto exchange bioinformatics format for genomic data; the BED terminology was used here.

0

Chrom_End

Stop coordinate on the Chromosome for the sequence associated with this bright Spot. This position is non-inclusive; unlike Chrom_Start. Because BED (Browser Extensible Data) is the de facto exchange bioinformatics format for genomic data; the BED terminology was used here.

1000

Sub_Cell_ROI_ID

If known; this field reports the unique identifier for a Region of Interest (ROI) that represents the boundaries of a sub-cellular structure a given Spot/Trace is associated with. Note: this is used to connect individual Spot/Traces that are part of the same ROI. It is also used to connect data in this table with any ROI specific measurements such as boundaries; intensities or volume; recorded in the corresponding Sub-Cell ROI Data table.

1

Conditional requirement: this column is mandatory if Sub-cellular structures (e.g.; Nucleus; Nucleolus etc.) were identified as part of this experiment and were reported in a dedicated Sub-Cell ROI Data table

and if data in this table can be associated with individual Sub_Cell_ROIs.

Cell_ID

If known, this field reports the unique identifier for the Cell a given Spot/Trace is associated with. Note: this is used to connect individual Spot/Traces that are part of the same Cell. It is also used to connect data in this table with any Cell specific measurements such as boundaries, intensities and volume, recorded in the corresponding Cell Data table.

1

Conditional requirement: This column is mandatory if Cells were identified as part of this experiment and were reported in a dedicated Cell Data table, and if data in this table can be associated with individual Cells.

Extra_Cell_ROI_ID

If known, this field reports the unique identifier for a Region of Interest (ROI) that represents the boundaries of a extracellular structure (e.g., Tissue) a given Spot/Trace is associated with. Note: this is used to connect individual Spot/Traces that are part of the same ROI. It is also used to connect data in this table with any ROI specific measurements such as boundaries, intensities and volume, recorded in the corresponding Extra-Cell ROI Data table.

1

Conditional requirement: this column is mandatory if Extracellular structures (e.g., Tissue, etc.) were identified as part of this experiment and were reported in a dedicated Extra-Cell ROI Data table, and if data in this table can be associated with individual Extra_Cell_ROIs.

Example without INSERTION/DELETION

##FOF-CT_Version=v1.0
##Table_Namespace=4dn_FOF-CT_core
##Genome_Assembly=GRCh38
##XYZ_Unit=micron
#Lab_Name: Nobel
#Experimenter_Name: John Doe
#Experimenter_Contact: john.doe@email.com
#Description: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sagittis est mollis, pulvinar tortor mattis, dignissim nisi. Nunc tincidunt volutpat lacus vitae bibendum.
#Software_Title: ChrTracer3
#Software_Type: SpotLoc+Tracing
#Software_Authors: Mateo, LJ; Sinnott-Armstrong, N; Boettiger, AN
#Software_Description: ChrTracer3 software was developed for analysis of raw DNA labeled images. As an input, it takes an.xlsx table containing information and folder names of the DNA experiment. As an output, it returns tab delimited.txt files with drift-corrected x, y, z positions for all labeled barcodes. These can be used directly to calculate the nm scale distances between all pairs of labeled loci. The current version of the software as of this writing is ChrTracer3.
#Software_Repository: https://github.com/BoettigerLab/ORCA-public
#Software_PreferredCitationID: https://doi.org/10.1038/s41596-020-00478-x
#Additional_Tables: 4dn_FOF-CT_quality, 4dn_FOF-CT_rna, 4dn_FOF-CT_trace, 4dn_FOF-CT_cell
##Columns=(Spot_ID, Trace_ID, X, Y, Z, Chrom, Chrom_Start, Chrom_End, Cell_ID)
1, 1, 14.43, 41.43, 1.23, chr1, 0001, 1000, 1
2, 1, 14.83, 41.83, 1.83, chr1, 1001, 2000, 1
3, 1, 15.83, 42.83, 1.33, chr1, 2001, 3000, 1
4, 2, 20.43, 50.43, 1.23, chr1, 0002, 2000, 1
5, 2, 21.83, 60.83, 1.83, chr1, 1002, 3000, 1

Example with INSERTION/DELETION

Warning

In case your reference genome has insertions or deletions, please remember to follow these Instructions for when the genome under study is modified

##FOF-CT_version=v1.0
##Table_namespace=4dn_FOF-CT_core
##genome_assembly=custom-build:GRCm38+pJT039(insertion) 
##modification=pJT039:chr3(insertion 2001-3000)
##VCF_File_name=pJT039:chr3.vcf
##VCF_version=v4.2
##XYZ_unit=micron
#lab_name: Nobel
#experimenter_name: John Doe
#experimenter_contact: john.doe@email.com
#Software_Title: ChrTracer3
#Software_Type: SpotLoc+Tracing
#Software_Authors: Mateo, LJ; Sinnott-Armstrong, N; Boettiger, AN
#Software_Description: ChrTracer3 software was developed for analysis of raw DNA labeled images. As an input, it takes an.xlsx table containing information and folder names of the DNA experiment. As an output, it returns tab delimited.txt files with drift-corrected x, y, z positions for all labeled barcodes. These can be used directly to calculate the nm scale distances between all pairs of labeled loci. The current version of the software as of this writing is ChrTracer3.
#Software_Repository: https://github.com/BoettigerLab/ORCA-public
#Software_PreferredCitationID: https://doi.org/10.1038/s41596-020-00478-x
#additional_tables: 4dn_FOF-CT_quality, 4dn_FOF-CT_rna, 4dn_FOF-CT_trace, 4dn_FOF-CT_cell
##columns=(Spot_ID, Trace_ID, X, Y, Z, Chrom, Chrom_Start, Chrom_End, Cell_ID)
1, 1, 14.43, 41.43, 1.23, chr3, 0001, 1000, 1
2, 1, 14.83, 41.83, 1.83, chr3, 1001, 2000, 1
3, 1, 15.83, 42.83, 1.33, pJT039, 2, 2500, 1
4, 2, 20.43, 50.43, 1.23, chr4, 0002, 2000, 1
5, 2, 21.83, 60.83, 1.83, chr4, 1002, 3000, 1