Australian Startup Dashboard Generator v1

I recently moved to Sydney after finishing my PhD in Canberra. I have been in the University ecosystem for 10 years, both as a student and an employee at the student union, so I've been looking at companies that aren't related to Universities. Anyone who's worked with me knows I love to combine diverse pieces of information into a central ontology (like proteins structures with different types of data). I thought I'd try and apply that logic to create dashboards for the companies that I've been looking at. I wanted a central place where I could see the maturity of a company, what their funding looks like and what's going on with their R&D.

Turns out, collecting, organising and displaying this kind of data gets complicated quite quickly. I really just wanted to simulate the experience of going to an org's website and clicking through, searching key people and publications etc, but do it through an agent so it would be all recorded. The data quality is quite variable, but I figured something like Claude would be able to handle it by going through multiple news articles. There are paid APIs and RSS feeds available from aggregators or third-party services, where it's all organised and annotated, but I thought I'd just try to make the best of what's publicly available. I thought I'd run through how it works, what I think needs fixing, and what functionalities I want to add. I'd really love some insight from someone who makes dashboards or is interested in these type of outputs. I'll go through the following at a high-level:

—How the agents collect and store data

—How the data is displayed

Data collection

Phase 0 – Resolve

The whole program takes a company URL as its only input. It can also take just the name of the company, and find the best fitting company. The program then sends Claude Haiku 4.5 to the website and gathers the name, founding date, description, and if it's a publicly traded company it's ASX ticker ID. This seems overkill, but these values don't change overtime (except for a company rename) and only needs to be recorded once. This uses about 30K tokens.

Phase 1 – News

This is the most fragile and expensive part of the data collection. I would love to replace this with RSS feeds. A Claude Sonnet 4.6 call gets 20 web searches and produces an unstructured blob of text that includes URLs and numbers. It's crudely cut into "funding", "team" and "R&D" for use by later agents. This costs between 100K and 300K tokens.

Phase 2 – Team + Funding

These are simultaneous Haiku calls (<5k tokens each) which extract the information about funding rounds/grants and team members into JSON files, either as "Events" or "Team" entries. Details that could not be extracted are set to "null". The program then checks through "null" entries on key people and funding rounds, and does a simple search query based on the context around the missing information.

Phase 3 – Science

Requires Team extraction goes first. First extracts R&D milestones from news dump using Haiku call (<5k tokens). Then searches team members tagged by a Regex (regular expression) as "Researchers" on Semantic Scholar API. Data comes out nicely organised so put directly into JSON format, filtering for keywords collected in Phase 0, to ensure they're related to the org.

Phase 4 – Patents

IP Australia has a really nice API, very functional. Simple call to return the ordered list of patents associated with the canonical name collected in Phase 0.

Phase 5 – Render

Organises the data into an easily processable structure to render the HTML. Accrues funding rounds, publications and patents into sums at different time points.

Displaying the data

I wanted to make the timeline of the company easily viewable and auditable. The "events" are categorised in to "financial", "people" and "R&D". I still need to figure out a better filter for what are the salient "events" that should be organised here, but at the moment they are recorded either as patents/publications (very easy to get time stamped) or news articles where the information was gathered from.

I also have a graph below of the running metrics. This functionality is going to be really important for later iterations, as it's intuitive to think about these things over time. When you hover over an event, that time point in the graph shows as a vertical line:

This is followed by a "Funding & Investors" card, which separates grants/equity funding. The values from this feed into the graph above. Converting between currencies is a straightforward call to Frankfurter REST API, which allows me to convert the currencies at the date that the funding is attributed. This is shaky accuracy, considering the dates of funding may be blurry, and the funds may not necessarily be exchanging currency, which makes the total asset estimation a bit shaky. This will have implications for future iterations, so recording this in the database as the reported amount/currency/date is the best option for now.

The "Team" card uses a regex to display key people, with their role in the company, their title and their brief pedigree. Given the scientific slant on this dashboard, I've done my best to filter it such that founders, CSO/CTOs, directors/heads of research teams are displayed. Each org has their own way of describing these titles, so it's a very long regex with many exceptions.

The "Research & R&D" card was probably the most difficult to organise the display logic for. Obviously, when you read the papers and see the roles people have in the company, it's clear whether the paper shows expertise for the organisation, or is a credit to the individual researcher. I think this connection is the shakiest at the moment. The R&D milestones are nice, but rely on media releases to be captured so it doesn't mean anything quantitatively as is.

Patents is deceptively difficult, the API takes a search term, and produces results, but right now it's difficult to know whether these match to the company or the patent title. This is something to be ironed out in the next version. These currently just keyword match rather than find a hard match between applying org. The API interface works really well, but I think that the integration into the knowledge base is a bit messy.

But this is a pretty functional prototype, I just need to work on getting the token cost down, and perhaps storing the data a bit better. Right now it's all in separate JSON files per startup, but ideally it would be a large SQL database where information can be retrieved consistently by structured queries. Validation and source attribution is arguably one of the more important things about this, so I have to think about how best to implement that. I'm actively working on it but please leave a comment if you have any ideas.

Australian Startup Dashboard Generator v1

Data collection

Displaying the data

Comments