RNA Spot Data table

Requirement level: optional

Recommended: Yes

Namespace: 4dn_FOF-CT_rna

Summary

This table is optionally used to store and share RNA Spot data that was collected as part of this experiment.

Each row represents a detected RNA bright Spot and corresponds to the location of a specific RNA transcript.

At a minimum, one needs to know the RNA_Spot_ID, the X, Y, Z coordinates of each spot, the Gene_ID, the RNA_name and an additional ID used to link this data with other tables in this format (i.e., Trace_ID, Sub_Cell_ROI_ID, Cell_ID and/or Extra_Cell_ROI_ID).

In addition, in case multiple transcripts are associated with the same Gene_ID and the FISH probes are capable of distinguishing them, Transcript_ID MUST also be reported. Thus, at a minimum there needs to be 6 (or 7) data columns. These are required. All other data columns are optional.

In this table the reported X, Y and Z coordinates are assumed to result from post-processing and quality control procedures performed on primary localization events and therefore correspond to what is considered the best-bet location of the RNA molecule under study.

In the case of multiplexed FISH experiments (i.e., MERFISH) in which the final location of RNA molecule results from combining multiple detection events (e.g., by combining individual Localization events detected in separate planes or images), the underlying raw data can be recorded in the corresponding Spot Demultiplexing table as described in the instructions of that table.

Tip

RNA_Spot_ID identifiers are unique across the entire dataset, thus allowing to identify unambiguously a Spot in the Spot Quality table, Spot Biological Data table and Spot Demultiplexing table.

Warning

All MANDATORY header fields and column names are indicated in bold. All conditionally required header fields and column names are indicated in italics.

File Header

  • For full instructions see File Header

  • The first line in the header is always ##FOF-CT_Version=vX.X.

  • The second line in the header is always ##Table_Namespace=4dn_FOF-CT_mapping.

Tip

The header MUST contain a mandatory set of fields that describe any Software tool that was used to produce/process data in this table. If more than one software tool was used, please repeat a set of Software-fields for describing each of them.

The header MUST include a detailed description of each optional columns used.

Name

Description

Example

Conditional requirement conditions

##FOF-CT_Version=

Version of the FOF format used in this case.

v1.0

##Table_Namespace=

Identifier for this type of table. Value must be as in the example.

4dn_FOF-CT_rna

##Genome_Assembly=

Genome build. Notes: (1) the 4DN Data Portal only accepts GRCh38 for human and GRCm38 for mouse. For other species see https://data.4dnucleome.org/search/?type=Organism; (2) in case the genome under study contains an iNSERTION or a DELETION, indicate this by adding the mandatory custom-build prefix to the build name and using a descriptive name indicating the nature of the genome modification (e.g., GRCm38+pJT039(insertion).

GRCh38

Conditional requirement: if the genome under study contains an INSERTION or a DELETION, this field MUST use the custom-build prefix and contain a descriptive name indicating the nature of the genome modification (e.g., GRCm38+pJT039(insertion).

##Gene_ID_Type=

The field used to report the type of unique ID used to identify the Gene encoding for the targeted RNA transcript.

Ensemble_V38

##Transcript_ID_Type=

The field used to report the type of unique ID used to identify the targeted RNA transcript.

Ensemble_V38

Conditional requirement: this MUST be reported if multiple transcripts are associated with the same Gene_ID and the FISH probes are capable of distinguishing them.

##XYZ_Unit=

If relevant, the unit used to represent XYZ locations or distances in this table. Note: use micron to avoid problem with special, Greek symbols. Other allowed values should be drawn from SI units of Length. Examples: ‘nm’, ‘micron’ ‘mm’ etc.

micron

#Lab_Name:

name of the lab where the experiment was performed.

Nobel

#Experimenter_Name:

name of the person performing the experiment.

John Doe

#Experimenter_Contact:

email address of the person performing the experiment.

john.doe@email.com

#Description:

A free-text, description of the experiment and of the data recorded in this table. This description should provide a clear understanding of the process utilized to produce the data and contain sufficient details to ensure interpretation and reproducibility.

#Software_Title:

The name of the Software tool that was used to produce the results reported in this table. If more than one software tool was used, please repeat a set of Software-fields for describing each of them.

AlgorithmXYZ

#Software_Type:

The type of this Software used to produce results recorded in this table. Allowed values: SpotLoc, Tracing, SpotLoc+Tracing, Segmentation, QC, Other

Segmentation

#Software_Authors:

The Name(s) of the individual Author(s) of this Software. In case there are more than one Authors, individual names should be listed as follows, Doe, John; Smith, Jane; etc,.

John Doe

#Software_Description:

A free-text, description of this Software. This description should provide a detailed understanding of the algortithm and of the analysis parameters that were used, in order to guarantee interpretation and reproducibility.

A pretty clear description

#Software_Repository:

The URL of any repository or archive where the Software executable release can be obtained.

https://github.com/repo_name_goes_here

#Software_PreferredCitationID:

The Unique Identifier for the preferred/primary publication describing this Software. Examples include, Digital Object Identifier (DOI), PubMed Central Identifier (PMCID), ArXiv.org ID etc,.

https://doi.org/doi_goes_here

#Additional_Tables:

list of the additional tables being submitted. Note: use a comma to separate each table name from the next.

4dn_FOF-CT_core, 4dn_FOF-CT_quality, 4dn_FOF-CT_bio, 4dn_FOF-CT_trace, 4dn_FOF-CT_cell

##Columns=

list of the data column headers used in the table. Note: enclose the column headers and use a comma to separate each header name from the next.

(Spot_ID, X, Y, Z, RNA_name, Gene_ID, Transcript_ID, Trace_ID, Sub_Cell_ROI_ID, Cell_ID, Extra_Cell_ROI_ID)

Data Columns

As with all other RNA Spot Data tables in this format, each row corresponds to data associated with an individual RNA_Spot.

The first columns are always: RNA_Spot_ID, X, Y, Z, RNA_name, Gene_ID, Trace_ID, followed by Transcript_ID if applicable, and by one or more of the following Sub-Cell_ROI_ID, Cell_ID and/or Extra_Cell_ROI_ID. The order of the other columns is at user’s discretion. The order of the rows is at user’s discretion.

Name

Description

Example

Conditional requirement conditions

RNA_Spot_ID

A unique identifier for this bright Spot.

1

X

The sub-pixel X coordinate of this bright Spot. NOTE: the reported X position is understood to be the one resulting from any performed post-processing procedures (i.e. drift correction, chromatic correction etc).

14.43

Y

The sub-pixel Y coordinate of this bright Spot. NOTE: the reported Y position is understood to be the one resulting from any performed post-processing procedures (i.e. drift correction, chromatic correction etc).

14.43

Z

The sub-pixel Z coordinate of this bright Spot. NOTE: the reported Z position is understood to be the one resulting from any performed post-processing procedures (i.e. drift correction, chromatic correction etc).

1.23

RNA_Name

This is the official name of the Gene the targeted RNA is transcribed from.

ACTB

Gene_ID

This is the official ID for the Gene encoding for the targeted RNA transcript.

ENSG00000075624

Transcript_ID

This is the official ID for the targeted RNA transcript. This field is required in case the same Gene has multiple different Transcripts and the FISH probe used in this case is capable of distinguishing between them.

ENST00000646664.1

Conditional requirement: this MUST be reported if multiple transcripts are associated with the same Gene_ID and the FISH probes are capable of distinguishing them.

Trace_ID

This fields reports the unique identifier for a DNA Trace identified as part of this experiment. Note: The purpose of this field is to associate the location of individual RNA Spots (i.e., nascent RNA transcript) recorded in this table, with the corresponding Trace recorded in the DNA-Spot/Trace Data core table, and with global Trace properties recorded in the Trace Data table.

1

Sub_Cell_ROI_ID

1

Conditional requirement: this column is mandatory if Sub-cellular structures (e.g., Nucleus, Nucleolus etc.) were identified as part of this experiment and were reported in a dedicated Sub-Cell ROI Data table, and if data in this table can be associated with individual Sub_Cell_ROIs.

Cell_ID

If known, this fields reports the unique identifier for the Cell a given Spot is associated with. Note: this is used to connect individual Spots that are part of the same Cell. It is also used to connect data in this table with any Cell specific measurements such as boundaries, intensities and volume, recorded in the corresponding Cell Data table.

1

Conditional requirement: This column is mandatory if Cells were identified as part of this experiment and were reported in a dedicated Cell Data table, and if data in this table can be associated with individual Cells.

Extra_Cell_ROI_ID

If known, this fields reports the unique identifier for a Region of Interest (ROI) that represents the boundaries of a extracellular structure (e.g., Tissue) a given Spot is associated with. Note: this is used to connect individual Spots that are part of the same ROI. It is also used to connect data in this table with any ROI specific measurements such as boundaries, intensities and volume, recorded in the corresponding Extra-Cell ROI Data table.

1

Conditional requirement: this column is mandatory if Extracellular structures (e.g., Tissue, etc.) were identified as part of this experiment and were reported in a dedicated Extra-Cell ROI Data table, and if data in this table can be associated with individual Extra_Cell_ROIs.

Example

##FOF-CT_Version=v1.0
##Table_Namespace=4dn_FOF-CT_rna
##Genome_Assembly=GRCh38
##XYZ_Unit=micron
##Gene_ID_type=Ensemble_V38
#Lab_Name: Nobel
#Experimenter_Name: John Doe
#Experimenter_Contact: john.doe@email.com
#Description: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sagittis est mollis, pulvinar tortor mattis, dignissim nisi. Nunc tincidunt volutpat lacus vitae bibendum.
#Software_Title: Xyz
#Software_Type: SpotLoc
#Software_Authors: Janet Doette
#Software_Description: A pretty clear description.
#Software_Repository: https://github.com/repo_name_goes_here
#Software_PreferredCitationID: https://doi.org/doi_goes_here
#Additional_Tables: 4dn_FOF-CT_core, 4dn_FOF-CT_quality, 4dn_FOF-CT_cell
##Columns=(Spot_ID, X, Y, Z, RNA_name, Gene_ID, Transcript_ID, Trace_ID)
001, 14.43, 41.43, 1.23, ACTB, ENSG00000075624, ENST00000646664.1, 001
002, 14.83, 41.83, 1.83, GAPDH, ENSG00000111640, ENST00000229239.10, 001
003, 15.83, 42.83, 1.33, MB, ENSG00000198125, ENST00000397326.7, 001