Cosmograph v1 to v2 Migration Guide

Cosmograph now uses Apache Arrow format for efficient data storage and DuckDB-Wasm for rapid data operations. These changes have significantly boosted performance and reduced memory footprint, allowing Cosmograph to easily handle rendering of tens of millions of points.

Hovever, this update requires specific configurations for points and links data. Below you will find a description and examples with the basic configuration options Cosmograph now requires.

Data & configuration

As in the v1, Cosmograph needs at least points to render. If links data is provided and valid, they will be rendered as well.

In this section, we’ll focus on the minimal required configuration options for both points and links. For a comprehensive list of all available properties, refer to the CosmographConfig documentation.

Input data formats

Cosmograph now accepts the following formats for the points and links input data:

  • File: A file object representing a CSV (.csv, .tsv), JSON (.json), Apache Parquet (.parquet, .pq), or Apache Arrow (.arrow) file containing the data.
  • string: An URL of data or a string representing table name in the external DuckDB-Wasm instance if provided.
  • Table: An instance of the Apache Arrow Table class containing the data.
  • Uint8Array or ArrayBuffer: A typed array or array buffer containing the binary representation of the points data in Apache Arrow format.
  • Record<string, unknown>[]: An array of objects, where each object represents a point (or link) and its properties.

Points configuration

The minimal required configuration options for points data to render it is:

  • points: The points data.
  • pointIdBy: Unique identifier column for each point.
  • pointIndexBy: Ordinal index column of each point from 0 to x (unique points count). This index is used for efficient lookup and referencing.

You can find full list of points properties here.

The minimal required configuration options for links data is:

  • links: The links data.
  • linkSourceBy: Unique identifier column that contains pointIdBy of the source point of the link.
  • linkSourceIndexBy: The index column of the source point of the link. This corresponds to the pointIndexBy of the point identified by linkSourceBy.
  • linkTargetBy: Unique identifier column that contains pointIdBy of the target point of the link.
  • linkTargetIndexBy: The index column of the target point of the link. This corresponds to the pointIndexBy of the point identified by linkTargetBy.

You can find full list of link properties here.

Limitations

It’s important to note that if the required indices (pointIndexBy, linkSourceIndexBy, and linkTargetIndexBy) are not provided, Cosmograph won’t be able to render your data. Additionally, if you have multiple targets for each link, you’ll need to adjust your data to fit the new format.

The key differences are the introduction of pointIndexBy, linkSourceIndexBy, and linkTargetIndexBy properties, which optimize the referencing of source and target points in a link. Instead of comparing the unique identifiers, Cosmograph now uses the indices of the points for faster lookups and comparisons. This optimization enables Cosmograph to handle larger datasets with improved performance and reduced memory overhead, resulting in a more responsive and performant visualization experience. That’s why in v2 you’ll need to provide indexes in the input data for Cosmograph to work properly.

Another limitation is that in v2 you can provide only a single target for each link using the linkTargetBy property. If you want to include multiple targets for links in Cosmograph, you’ll need to modify your links data to include all targets into one column.

However, there’s good news! We’ve created a tool that will help you easily handle these data preparation tasks.

Cosmograph Data Kit

Cosmograph Data Kit is a set of helper functions that prepare data for Cosmograph v2. It simplifies the migration process and helps avoid confusion in data configuration for Cosmograph v2.

These functions prepares your data into formats that Cosmograph recognizes and generates all necessary indexes if your data doesn’t have them. Old data formats are still supported, but you’ll need to process them through our Cosmograph Data Kit. These functions also generate a ready-to-use configuration for Cosmograph, tailored specifically to your data.

Learn how to use it here.

Upload data into Cosmograph

Below are ready-to-use examples demonstrating how to upload data into Cosmograph with basic configuration.

Note that pointColorBy, pointSizeBy, linkColorBy, and linkWidthBy properties are optional. They are included in these examples for demonstration purposes only and are not required in the data/configuration.


import React, { useState, useEffect } from 'react'
import { CosmographProvider, Cosmograph } from '@cosmograph/react'
 
const ReactCosmographExample = () => {
  const [data, setData] = useState<{ points?: File; links?: File }>()
  const [config, setConfig] = useState({
    pointIdBy: 'id',
    pointIndexBy: 'idx',
    pointColorBy: 'color',
    pointSizeBy: 'value',
    linkSourceBy: 'source',
    linkSourceIndexBy: 'sourceidx',
    linkTargetBy: 'target',
    linkTargetIndexBy: 'targetidx',
    linkColorBy: 'color',
    linkWidthBy: 'value',
  })
 
  const handlePointsFileChange = (event: React.ChangeEvent<HTMLInputElement>): void => {
    const file = event.target.files?.[0]
    if (file) {
      setData((prevData) => ({ ...prevData, points: file }))
    }
  }
 
  const handleLinksFileChange = (event: React.ChangeEvent<HTMLInputElement>): void => {
    const file = event.target.files?.[0]
    if (file) {
      setData((prevData) => ({ ...prevData, links: file }))
    }
  }
 
  return (
    <div>
      <Cosmograph {...config} points={data?.points} links={data?.links} />
      <input type="file" onChange={handlePointsFileChange} />
      <input type="file" onChange={handleLinksFileChange} />
    </div>
  )
}

Using Custom DuckDB Connection

Cosmograph v2 supports custom DuckDB connections. You can pass points and links as strings representing existing table names with related data. When provided as strings, the custom DuckDB connection fetches relevant data from these specified tables.

Note: Ensure that points and links tables are prepared in the appropriate Cosmograph format and contain required columns.

duckDBConnection
string | { 
  duckdb: AsyncDuckDB; 
  connection?: AsyncDuckDBConnection 
}

The connection string or WasmDuckDBConnection object with instance of the DuckDB database with its connection.

Here are examples of how to provide a custom DuckDB connection into Cosmograph:

// Assuming you have a DuckDB-Wasm instance with data somewhere
// const db = new duckdb.AsyncDuckDB(logger, worker)
// const connection = await db.connect()
 
const [config, setConfig] = useState({
  points: 'existing_points_table',
  links: 'existing_links_table',
  pointIdBy: 'id',
  pointIndexBy: 'idx',
  linkSourceBy: 'source',
  linkSourceIndexBy: 'sourceidx',
  linkTargetBy: 'target',
  linkTargetIndexBy: 'targetidx',
})
 
return (
  <Cosmograph
    duckDBConnection={{ duckdb: db, connection: connection }}
    {...config}
  />
)