Cosmograph v1 to v2 Migration Guide
Cosmograph now uses Apache Arrow format for efficient data storage and DuckDB-Wasm for rapid data operations. These changes have significantly boosted performance and reduced memory footprint, allowing Cosmograph to easily handle rendering of tens of millions of points.
Hovever, this update requires specific configurations for points and links data. Below you will find a description and examples with the basic configuration options Cosmograph now requires.
Data & configuration
As in the v1, Cosmograph needs at least points
to render. If links
data is provided and valid, they will be rendered as well.
In this section, weāll focus on the minimal required configuration options for both points and links. For a comprehensive list of all available properties, refer to the CosmographConfig documentation.
Input data formats
Cosmograph now accepts the following formats for the points
and links
input data:
File
: A file object representing a CSV (.csv, .tsv), JSON (.json), Apache Parquet (.parquet, .pq), or Apache Arrow (.arrow) file containing the data.string
: An URL of data or astring
representing table name in the external DuckDB-Wasm instance if provided.Table
: An instance of the Apache ArrowTable
class containing the data.Uint8Array
orArrayBuffer
: A typed array or array buffer containing the binary representation of the points data in Apache Arrow format.Record<string, unknown>[]
: An array of objects, where each object represents a point (or link) and its properties.
Points configuration
The minimal required configuration options for points data to render it is:
points
: The points data.pointIdBy
: Unique identifier column for each point.pointIndexBy
: Ordinal index column of each point from 0 to x (unique points count). This index is used for efficient lookup and referencing.
You can find full list of points properties here.
Links configuration
The minimal required configuration options for links data is:
links
: The links data.linkSourceBy
: Unique identifier column that containspointIdBy
of the source point of the link.linkSourceIndexBy
: The index column of the source point of the link. This corresponds to thepointIndexBy
of the point identified bylinkSourceBy
.linkTargetBy
: Unique identifier column that containspointIdBy
of the target point of the link.linkTargetIndexBy
: The index column of the target point of the link. This corresponds to thepointIndexBy
of the point identified bylinkTargetBy
.
You can find full list of link properties here.
Limitations
Itās important to note that if the required indices (pointIndexBy
, linkSourceIndexBy
, and linkTargetIndexBy
) are not provided, Cosmograph wonāt be able to render your data. Additionally, if you have multiple targets for each link, youāll need to adjust your data to fit the new format.
The key differences are the introduction of pointIndexBy
, linkSourceIndexBy
, and linkTargetIndexBy
properties, which optimize the referencing of source and target points in a link. Instead of comparing the unique identifiers, Cosmograph now uses the indices of the points for faster lookups and comparisons. This optimization enables Cosmograph to handle larger datasets with improved performance and reduced memory overhead, resulting in a more responsive and performant visualization experience. Thatās why in v2 youāll need to provide indexes in the input data for Cosmograph to work properly.
Another limitation is that in v2 you can provide only a single target for each link using the linkTargetBy
property. If you want to include multiple targets for links in Cosmograph, youāll need to modify your links data to include all targets into one column.
However, thereās good news! Weāve created a tool that will help you easily handle these data preparation tasks.
Cosmograph Data Kit
Cosmograph Data Kit is a set of helper functions that prepare data for Cosmograph v2. It simplifies the migration process and helps avoid confusion in data configuration for Cosmograph v2.
These functions prepares your data into formats that Cosmograph recognizes and generates all necessary indexes if your data doesnāt have them. Old data formats are still supported, but youāll need to process them through our Cosmograph Data Kit. These functions also generate a ready-to-use configuration for Cosmograph, tailored specifically to your data.
Learn how to use it here.
Upload data into Cosmograph
Below are ready-to-use examples demonstrating how to upload data into Cosmograph with basic configuration.
Note that
pointColorBy
,pointSizeBy
,linkColorBy
, andlinkWidthBy
properties are optional. They are included in these examples for demonstration purposes only and are not required in the data/configuration.
import React, { useState, useEffect } from 'react'
import { CosmographProvider, Cosmograph } from '@cosmograph/react'
const ReactCosmographExample = () => {
const [data, setData] = useState<{ points?: File; links?: File }>()
const [config, setConfig] = useState({
pointIdBy: 'id',
pointIndexBy: 'idx',
pointColorBy: 'color',
pointSizeBy: 'value',
linkSourceBy: 'source',
linkSourceIndexBy: 'sourceidx',
linkTargetBy: 'target',
linkTargetIndexBy: 'targetidx',
linkColorBy: 'color',
linkWidthBy: 'value',
})
const handlePointsFileChange = (event: React.ChangeEvent<HTMLInputElement>): void => {
const file = event.target.files?.[0]
if (file) {
setData((prevData) => ({ ...prevData, points: file }))
}
}
const handleLinksFileChange = (event: React.ChangeEvent<HTMLInputElement>): void => {
const file = event.target.files?.[0]
if (file) {
setData((prevData) => ({ ...prevData, links: file }))
}
}
return (
<div>
<Cosmograph {...config} points={data?.points} links={data?.links} />
<input type="file" onChange={handlePointsFileChange} />
<input type="file" onChange={handleLinksFileChange} />
</div>
)
}
Using Custom DuckDB Connection
Cosmograph v2 supports custom DuckDB connections. You can pass points
and links
as strings representing existing table names with related data. When provided as strings, the custom DuckDB connection fetches relevant data from these specified tables.
Note: Ensure that
points
andlinks
tables are prepared in the appropriate Cosmograph format and contain required columns.
duckDBConnection
string | {
duckdb: AsyncDuckDB;
connection?: AsyncDuckDBConnection
}
The connection string or WasmDuckDBConnection
object with instance of the DuckDB database with its connection.
Here are examples of how to provide a custom DuckDB connection into Cosmograph:
// Assuming you have a DuckDB-Wasm instance with data somewhere
// const db = new duckdb.AsyncDuckDB(logger, worker)
// const connection = await db.connect()
const [config, setConfig] = useState({
points: 'existing_points_table',
links: 'existing_links_table',
pointIdBy: 'id',
pointIndexBy: 'idx',
linkSourceBy: 'source',
linkSourceIndexBy: 'sourceidx',
linkTargetBy: 'target',
linkTargetIndexBy: 'targetidx',
})
return (
<Cosmograph
duckDBConnection={{ duckdb: db, connection: connection }}
{...config}
/>
)