Cosmograph Data Kit
Transform your data into Cosmograph-ready formats with our Data Kit utility functions. They handle data preparation, generate configurations, and provide statistical insights - everything you need to visualize your data easily and effectively.
Data Kit functions
Cosmograph Data Kit offers three specialized async functions that share a common configuration pattern but serve different use cases:
prepareCosmographData(config, pointsData, linksData)
This is the recommended way to prepare data for Cosmograph. It converts your input data directly into CosmographData
- Cosmographās internal format that is ready for immediate use without additional processing.
Output: Returns a Promise
that resolves to an object containing:
- Points data as
CosmographData
- Links data as
CosmographData
(if provided) - Generated Cosmograph configuration
- Summary statistics for points and links data
prepareCosmographDataFiles(config, pointsData, linksData)
When you need to handle the prepared data as files (for storage or network transfer), this function converts your data into binary Blob
objects. Like prepareCosmographData()
, it returns a Promise
with the same structure, but the points and links data come as Blob
.
Output: Returns a Promise
that resolves to an object containing:
- Points data as
Blob
- Links data as
Blob
(if provided) - Generated Cosmograph configuration
- Summary statistics for points and links data
downloadCosmographData(config, pointsData, linksData)
Prepares your data and automatically downloads the resulting files:
- Points data file
- Links data file (if provided)
- Configuration JSON file
You can customize output filenames using the outputFilename
property in both links
and points
configuration objects.
Output: Initiates file downloads and returns a Promise
that resolves to an object containing:
- Configuration object for Cosmograph
- Summary statistics for points and links data
Both prepareCosmographDataFiles
and downloadCosmographData
support specifying the output format (.csv
, .arrow
, or .parquet
) via outputFormat
in the config
. If not specified, defaults to .parquet
.
Function arguments
All three functions accept the following arguments:
Parameter | Type | Description |
---|---|---|
config | CosmographDataPrepConfig | Configuration object defining how to prepare the data |
pointsData | CosmographInputData | Points data in any supported format (Arrow Table, CSV, JSON, Parquet, or URL) |
linksData | CosmographInputData | Optional links data in any supported format |
Outputs
All functions return a Promise
that resolves to an object with the following properties:
Property | Description |
---|---|
points * | Prepared points data in the specified format |
links * | Prepared links data (when provided) |
pointsSummary | Statistical information about points including: - Column names and types - Aggregates for each column ( count , min , max , approx_unique , avg , std , q25 , q50 , q75 )- Percentage of NULL values |
linksSummary | Statistical information about links (when provided) |
cosmographConfig | Ready-to-use Cosmograph configuration for prepared data generated from your settings |
* Available in prepareCosmographData
and prepareCosmographDataFiles
only. downloadCosmographData
returns only configuration and statistics while initiating data file downloads.
Data Kit configuration structure
Configure data preparation using CosmographDataPrepConfig
interface that includes following properties:
Property | Type | Description |
---|---|---|
points | CosmographDataPrepPointsConfig | Configuration for the points table |
links | CosmographDataPrepLinksConfig | (Optional) Configuration for the links table |
outputFormat * | string | (Optional) Output format for prepared data: csv , arrow , or parquet . Defaults to parquet |
* outputFormat
has no effect when using prepareCosmographData
because it prepares data into the CosmographData
format.
Points configuration
To prepare your points data for Cosmograph, you need to specify the required and optional properties in the points configuration object.
Required properties
You must provide either:
pointId
: The column name that uniquely identifies each point in your dataset.
If your dataset doesnāt have a candidate for
pointId
column and youāre not using links, you should providepointId: undefined
. This will automatically generate columns with enumerated point ids and indexes for your data based on items count. See this example.
OR
linkSourceBy
andlinkTargetsBy
: If you want to generate points from your links data, specify the column names containing the source and target identifiers of each link. This option only works if you also provide links data.
Optional properties
If you use a separate data source for points generation (not link-based), you can also include the following optional properties to enhance your graph:
pointColorBy
: The column containing the color for each point.
The pointColorBy
property itself accepts only color values as string
or RGBA [r, g, b, a]
format. To create custom color mappings, you can pair it with pointColorByFn
(need to be provided into the Cosmograph config) that allows you to dynamically generate colors based on your data, regardless of the data type in the pointColorBy
column. This function takes the pointColorBy
values (of any type), point index, and should return a color as a string
or [r, g, b, a]
array.
pointSizeBy
: The column for the values that determine point sizes.
The pointSizeBy
works exactly like the pointColorBy
, but accepts numeric values. If you need custom size mappings regardless of the data type, you can provide pointSizeByFn
in the Cosmograph config that will transform pointSizeBy
values.
pointLabelBy
: The column containing the label for each point. Labels will be automatically displayed on the graph to identify points using values from this column.
Can be paired with pointLabelFn
for custom label generation.
pointLabelWeightBy
: The column containing the weight for each point label. Higher weights make labels more likely to be shown.
pointLabelWeightBy
accepts float values from 0 to 1. Can be paired with pointLabelWeightFn
.
-
pointXBy
: The column containing the x-coordinate for each point. If provided along withpointYBy
, Cosmograph will position points based on these coordinates. -
pointYBy
: The column containing the y-coordinate for each point. If provided along withpointXBy
, Cosmograph will position points based on these coordinates. -
pointIncludeColumns
: Array of additional column names to include in the points data. This is useful if you want to include extra data attributes for each point that you can use later in custom behaviors, components likeCosmographTimeline
, or styles.
Links configuration
Required properties:
linkSourceBy
: The column name that contains the source of the link.linkTargetsBy
: An array of column names that contain the targets of the link (will be merged into one target column).
Additional properties:
linkColorBy
: The column name containing the color for each link.
Can be paired with linkColorByFn
(need to be provided into the Cosmograph config) that allows you to dynamically generate colors based on your data, regardless of the data type in the linkColorBy
column. This function takes the linkColorBy
values (of any type), link index, and should return a color as a string
or an array of [r, g, b, a]
array.
linkWidthBy
: The column name containing the width for each link.
Accepts numeric values, can be paired with linkWidthByFn
.
linkArrowBy
: The column name containing the booleans indicating whether each link should have an arrow.
Accepts boolean values, can be paired with linkArrowByFn
.
linkStrengthBy
: The column name containing the strength for each link.
Accepts numeric values, can be paired with linkStrengthByFn
.
linkIncludeColumns
: An array of additional column names to include in the links data.
CSV-specific properties
For CSV inputs, additional properties csvParseTimeFormat
and csvColumnTypesMap
help handle special parsing cases.
These property only takes effect when the source data is in CSV format.
-
csvParseTimeFormat
: The time format to use when parsing CSV data if automatic time parsing fails. -
csvColumnTypesMap
: A mapping of column names to data types for CSV parsing when automatic parsing fails.
Usage example:
const dataConfig = {
points: {
pointIdBy: 'id',
pointLabelBy: 'id',
pointSizeBy: 'comments',
pointIncludeColumns: ['date'],
outputFilename: 'custom-points-filename',
csvParseTimeFormat: 'YYYY-MM-DD',
csvColumnTypesMap: {
id: 'VARCHAR',
comments: 'FLOAT',
topic: 'VARCHAR',
date: 'DATE',
},
},
}
Cosmograph Data Kit provides a log for the preparation process. If something goes wrong, you can find the error message in the browser console. It will also warn about columns that are missing from the data source or required columns that are not provided in the configuration.
Configuration examples
Common configuration
const config = {
points: {
pointIdBy: 'id', // Required: Unique identifier for each point
pointColorBy: 'color', // Optional: Color of the points
pointSizeBy: 'value', // Optional: Size of the points
},
links: {
linkSourceBy: 'source', // Required: Source of the link
linkTargetsBy: ['target'], // Required: Targets of the link
linkColorBy: 'color', // Optional: Color of the links
linkWidthBy: 'value', // Optional: Width of the links
},
}
Generating points and links from only links-containing dataset
You can create points dataset for Cosmograph even if you have only one file with transactions data:
const config = {
points: {
linkSourceBy: 'source_column', // Column containing the link source
linkTargetsBy: ['target_column', 'target_column2'], // Columns containing the link targets
},
links: {
linkSourceBy: 'source_column',
linkTargetsBy: ['target_column', 'target_column2'],
// ... other link options
},
};
Automatically generate point identifiers and indexes
Provide pointIdBy
property with undefined
value to automatically generate columns with enumerated point ids and indexes for your data by items count.
const config = {
points: {
pointIdBy: undefined,
},
};
Functions usage examples
Prepare data with Data Kit functions
This example only covers data preparing. See next one for preparing and uploading data into Cosmograph.
import { downloadCosmographData, prepareCosmographData, prepareCosmographDataFiles } from '@cosmograph/cosmograph'
// Exmaple data
const pointsData = [
{ id: '1', color: 'red', value: 10 },
{ id: '2', color: 'blue', value: 20 },
]
const linksData = [
{ source: '1', target: '2', color: 'green', value: 5 },
]
// Exmaple configuration
const config = {
points: {
pointIdBy: 'id',
pointColorBy: 'color',
pointSizeBy: 'value',
outputFilename: 'custom-points-filename',
},
links: {
linkSourceBy: 'source',
linkTargetsBy: ['target'],
linkColorBy: 'color',
linkWidthBy: 'value',
outputFilename: 'custom-links-filename',
},
}
// downloadCosmographData: Prepares data and downloads files and names them according to the `outputFilename` in configuration
downloadCosmographData(config, pointsData, linksData)
.then(({cosmographConfig, pointsSummary, linksSummary}) => {
console.log('Cosmograph config:', cosmographConfig)
console.log('Points data summary:', pointsSummary)
console.log('Links data summary:', linksSummary)
})
.catch((error) => {
console.error('Error:', error)
})
// prepareCosmographData: Prepares data to an Arrow table
prepareCosmographData(config, pointsData, linksData)
.then((result) => {
if (result) {
const { points, links, cosmographConfig, pointsSummary, linksSummary } = result
console.log('Arrow points:', points)
console.log('Arrow links:', links)
console.log('Cosmograph config:', cosmographConfig)
console.log('Points data summary:', pointsSummary)
console.log('Links data summary:', linksSummary)
}
})
.catch((error) => {
console.error('Error:', error)
})
// prepareCosmographDataFiles: Prepares data as blobs
prepareCosmographDataFiles(config, pointsData, linksData)
.then((result) => {
if (result) {
const { points, links, cosmographConfig, pointsSummary, linksSummary } = result
console.log('Blob points:', points)
console.log('Blob links:', links)
console.log('Cosmograph config:', cosmographConfig)
console.log('Points data summary:', pointsSummary)
console.log('Links data summary:', linksSummary)
}
})
.catch((error) => {
console.error('Error:', error)
})
Prepare data and upload it into Cosmograph
Prepare data with configuration and upload it into Cosmograph using prepareCosmographData
.
import React, { useState } from 'react'
import { CosmographProvider, Cosmograph } from '@cosmograph/react'
import { prepareCosmographData } from '@cosmograph/cosmograph'
const ReactExample = (): JSX.Element => {
const [config, setConfig] = useState({
// you can add some initial Cosmograph configuration here like simulation settings
})
const [files, setFiles] = useState<{ pointsFile: File | null, linksFile: File | null }>({ pointsFile: null, linksFile: null })
const handleFileChange = (type: 'pointsFile' | 'linksFile') => async (event: React.ChangeEvent<HTMLInputElement>): Promise<void> => {
const file = event.target.files?.[0]
if (file) {
setFiles(prevFiles => {
const updatedFiles = { ...prevFiles, [type]: file }
prepareAndSetConfig(updatedFiles.pointsFile, updatedFiles.linksFile)
return updatedFiles
})
}
}
const prepareAndSetConfig = async (pointsFile: File | null, linksFile: File | null): Promise<void> => {
if (pointsFile) {
const dataPrepConfig = {
points: {
pointIdBy: 'id',
pointColorBy: 'color',
pointSizeBy: 'value',
},
links: {
linkSourceBy: 'source',
linkTargetsBy: ['target'],
linkColorBy: 'color',
linkWidthBy: 'value',
},
}
const result = await prepareCosmographData(dataPrepConfig, pointsFile, linksFile)
if (result) {
const { points, links, cosmographConfig } = result
setConfig({ points, links, ...cosmographConfig })
}
}
}
return (
<CosmographProvider>
<Cosmograph {...config} />
<input type="file" accept=".csv,.arrow,.parquet,.json" onChange={handleFileChange('pointsFile')} />
<input type="file" accept=".csv,.arrow,.parquet,.json" onChange={handleFileChange('linksFile')} />
</CosmographProvider>
)
}
export default ReactExample