Show HN: BemiDB – Postgres read replica optimized for analytics

(github.com)

207 points by exAspArk 5 days ago | 125 comments

Hi HN! We're Evgeny and Arjun, and we’re building a better way to do analytics with Postgres.

We love Postgres for its simplicity, power, and rich ecosystem. But engineers have to still get bogged down with heavyweight and expensive OLAP systems when connecting an analytics data stack.

Postgres is amazing at OLTP queries, but not for OLAP queries (large data scans and aggregations). Even in this case, we’ve still heard from countless scaling startups that they still try to use only a read replica to run analytics workloads since they don’t want to deal with the data engineering complexity of the alternative. This actually works surprising well initially, but starts to break for them as they scale or when integrating multiple data sources. Adding lots of indexes to support analytics also slows down their transactional write performance.

When growing out of “just use Postgres”, companies have to understand and wrangle complex ETL pipelines, CDC processes, and data warehouses — adding layers of complexity that defeat the simplicity that undermines their initial choice for Postgres as their data storage in the first place.

We thought there had to be a better way, so we’re building BemiDB. It’s designed to handle complex analytical queries at scale without the usual overhead. It’s a single binary that automatically syncs with Postgres data and is Postgres-compatible, so it’s like querying standard Postgres and works with all existing tools.

Under the hood, we use Apache Iceberg (with Parquet data files) stored in S3. This allows for bottomless inexpensive storage, compressed data in columnar files, and an open format that guarantees compatibility with other data tools.

We embed DuckDB as the query engine for in-memory analytics that work for complex queries. With efficient columnar storage and vectorized execution, we’re aiming for faster results without heavy infra. BemiDB communicates over the Postgres wire protocol to make all querying Postgres-compatible.

We want to simplify data stacks for companies that use Postgres by reducing complexity (single binary and S3), using non-proprietary data formats (Iceberg open tables), and removing vendor lock-in (open source). We'd love to hear your feedback! What do you think?

banditelol 5 days ago | next |

Looking at the syncer it seems like copying data to csv from the whole table everytime (?) Code: https://github.com/BemiHQ/BemiDB/blob/6d6689b392ce6192fe521a...

I cant imagine until at what scale can you do this and is there anything better we can do before using debezium to sync the data via cdc?

Edit: add code permalink

oulipo 5 days ago | prev | next |

Really cool! I have an IoT use-case where I ingest data, I want to keep like the last 3 months or so in Postgresql, and then store the old data as parquet files on S3

I planned initially to do chunks on S3 and do the analytical queries using duckdb, I'm wondering if your tool would be a good replacement?

For now I don't have that many analytical queries, I'm mostly doing visualization of the data points by querying a range (eg last 2 weeks of data for a device)

Does it then make sense to use columnar storage or am I better off with "regular Postgres"?

Or in my case does your approach provide "best of both worlds" in the sense that I could do some occasional analytical queries on past data stored on S3, and regularly access "last 3 months" data for visualization using the data stored in the regular Postgres?

Onavo 5 days ago | root | parent | next |

Use a clickhouse FDW or something similar, clickhouse has excellent integration with postgres. They also have a great embedded Python version. Their marketing isn't as good as Duckdb but in terms of stability and performance they are so much better. Duckdb is very very buggy and full of sharp edges but because of their VC funded developer marketing, you don't really hear people talking about it.

exAspArk 5 days ago | root | parent | next |

I agree that DuckDB may sometimes be buggy because it's being actively developed with a large surface area (a universal embeddable query engine that works with different storage layers if I were to simplify).

However, DuckDB (non-profit foundation) != MotherDuck (VC funded). These are two separate organizations with different goals. I see DuckDB as a tool, not as a SaaS or a VC-funded company. My hope is that it'll be adopted by other projects and not associated with just a single for-profit company.

jconline543 5 days ago | root | parent | prev | next |

If you just need the Postgres -> S3 archival pattern you described, I built a simpler focused tool: pg-archiver (https://github.com/johnonline35/pg-archiver)

It:

- Auto-archives old Postgres data to Parquet files on S3

- Keeps recent data (default 90 days) in Postgres for fast viz queries

- Uses year/month partitioning in S3 for basic analytical queries

- Configures with just PG connection string and S3 bucket

Currently batch-only archival (no real-time sync yet). Much lighter than running a full analytical DB if you mainly need timeseries visualization with occasional historical analysis.

Let me know if you try it out!

oulipo 5 days ago | root | parent |

Really cool! I'll take a look!

- can you then easily query it with duckdb / clickhouse / something else? What do you use yourself? do you have some tutorial / toy example to check?

- would it be complicated to have the real-time data be also stored somehow on S3 so it would be "transparent" to do query on historical data which includes day data?

- what typical "batch data" size makes sense, I guess doing "day batches" might be a bit small and will incurr too many "read" operations (if I have moderate amount of day data), rather than "week batches"? but then the "timelag" increases?

jconline543 5 days ago | root | parent |

Thanks for your interest! Let me address your questions:

Querying the data: Yes, you can easily query the Parquet files with DuckDB. The files are stored in a year/month partitioned structure (e.g., year=2024/month=03/iot_data_20240315_143022.parquet), which makes it efficient to query specific time ranges. I personally use DuckDB for ad-hoc analysis since it works great with Parquet. Here's a quick example:

sqlCopySELECT * FROM read_parquet('s3://my-iot-archive/year=/month=/iot_data_*.parquet') WHERE timestamp BETWEEN '2023-01-01' AND '2023-12-31'

Real-time data on S3: Currently, the tool is batch-focused. Adding real-time sync would require some architectural changes - either using CDC (Change Data Capture) or implementing a dual-write pattern. I kept it simple for now since most IoT visualization use cases I've seen focus on recent data in Postgres. If you need this feature, I'd be happy to take a look at what you want.

For data processing the tool works like this:

It identifies records older than 90 days (configurable retention period)

Processes these records in batches of 100 (also configurable) to manage memory usage

Creates Parquet files partitioned by year/month in S3

Deletes the archived records from Postgres

The key is that you always have your recent 90 days in Postgres for fast querying, while maintaining older data in a cost-effective S3 storage that you can still query when needed. You can adjust both the retention period and batch size based on your specific needs.

Let me know if you'd like me to clarify anything or if you have other questions!

jconline543 5 days ago | root | parent |

Also, for querying both recent and historical data together, you wouldn't need to modify this tool at all. You could just add a separate periodic job (e.g. hourly/daily) that copies recent data to S3:

sqlCopyCOPY (SELECT * FROM iot_data WHERE timestamp > current_date - interval '90 days')

TO 's3://bucket/recent/iot_data.parquet' (FORMAT 'parquet')

Then query everything together in DuckDB:

sqlCopySELECT * FROM read_parquet([

  's3://bucket/year=*/month=*/iot_data_\*.parquet',  -- archived data
  's3://bucket/recent/iot_data.parquet'             -- recent data
])

Much simpler than implementing real-time sync, and you still get a unified view of all your data for analysis (just with a small delay on recent data).

exAspArk 5 days ago | root | parent | prev |

Thank you!

Yes, absolutely!

1) You could use BemiDB to sync your Postgres data (e.g., partition time-series tables) to S3 in Iceberg format. Iceberg is essentially a "table" abstraction on top of columnar Parquet data files with a schema, history, etc.

2) If you don't need strong consistency and fine with delayed data (the main trade-off), you can use just BemiDB to query and visualize all data directly from S3. From a query perspective, it's like DuckDB that talks Postgres (wire protocol).

Feel free to give it a try! And although it's a new project, we plan to keep building and improving it based on user feedback.

oulipo 5 days ago | root | parent |

Thanks!

- Can you give me more info about the strong consistency and delayed data, so I can better picture it with a few examples?

- Also, is it possible to do the sync with the columnar data in "more-or-less real-time" (eg do a NOTIFY on a new write in my IoT events table, and push in the storage?)

- Would your system also be suited for a kind of "audit-log" data? Eg. if I want to have some kind of audit-table of all the changes in my database, but only want to keep a few weeks worth at hand, and then push the rest on S3, or it doesn't make much sense with that kind of data?

exAspArk 5 days ago | root | parent |

For now, BemiDB supports only full Postgres table data re-sync. We plan to enable real-time data syncing from Postgres into S3 by using logical replication (CDC), which is much more reliable than PG NOTIFY.

We use logical replication and this exact approach with our other project related to auditing and storing Postgres data changes https://github.com/BemiHQ/bemi. We're thinking about combining these approaches to leverage scalable and affordable separated storage layer on S3.

Lmk if that makes sense or if you had any more questions!

oulipo 5 days ago | root | parent |

Really interesting thanks! I guess my use-case would rather require incremental updates

Ideally it would just sync in real-time and buffer new data in the Bemi binary (with some WAL-like storage to make sure data is preserved on binary crash/reload), and when it has enough, push them on S3, etc

Is this the kind of approach you're going to take?

exAspArk 5 days ago | root | parent |

Yes, we want to use the approach like you described! We'll likely wait until enough changes are accumulated by using 2 configurable thresholds: time (like 30s) and size (like 100MB)

dantodor 5 days ago | prev | next |

Question: is it possible to use BemiDB in ... don't know how to spell it, maybe read-only mode? And by that I mean one Bemi instance that is connected to postgres source, and others that use the produced Iceberg tables to answer queries? Poor man's scalability of the query engine :) ... I would also imagine having an instance that is write-only (reads from postgres and produces the Iceberg tables) and one or many query-only engines. Other than that, great work, will definitely start using it!

exAspArk 5 days ago | root | parent |

Great ideas! We'll keep this suggestion related to read/write separation in mind. We started with a simple unified solution, but we'll keep iterating, listening and addressing any feedback :)

dangoodmanUT 5 days ago | prev | next |

This is probably the most streamlined/all-inclusive solution out of all that I've seen, but this has definitely been an extremely saturated space in 2024

dirtbag__dad 5 days ago | root | parent | next |

What are the other viable players? As mentioned in another thread to this post duckdb is not “production-ready.” That has been a non-starter for us at work.

exAspArk 5 days ago | root | parent |

That's why our current approach is to build missing or not fully functional features ourselves to move fast. For example, DuckDB performs reads from Iceberg tables not according to the spec, can't perform writes, etc.

nitinreddy88 5 days ago | prev | next |

How does update or continuous inserts get written/updated to parquet files? Architecture doesn't show nor anything in docs.

1. All the benchmarks/most of the companies, show one time data exists and try querying/compressing in different formats which is far from reality

2. Do you rewrite parquet data every time new data comes? Or partitioned by something? No examples

3. How does update/delete works. Update might be niche case. But deletion/data retention/truncation is must and I don't see how you support that

exAspArk 5 days ago | root | parent |

Our initial approach is to do full table re-syncs periodically. Our next step is to enable incremental data syncing by supporting insert/update/delete according to the Iceberg spec. In short, it'd produce "diff" Parquet files and "stitch" them using metadata (enabling time travel queries, schema evolution, etc.)

levkk 5 days ago | prev | next |

Moving data between systems is problematic. Where this product is actually needed (multi-TB databases under load) is where logical replication won't be able to sync your tables in time. Conversely, small databases where this will work don't really need columnar storage optimizations.

woodhull 5 days ago | root | parent | next |

For my use case of something similar on Clickhouse:

We load data from postgres tables that are used to build Clickhouse Dictionaries (a hash table for JOIN-ish operations).

The big tables do not arrive via real-time-ish sync from postgres but are bulk-appended using a separate infrastructure.

exAspArk 5 days ago | root | parent |

Would you be able to share how you implemented "bulk-appended using a separate infrastructure" at a high level?

exAspArk 5 days ago | root | parent | prev |

Fair point. We think that BemiDB currently can be useful when used with small and medium Postgres databases. Running complex analytics queries on Postgres can work, but it usually requires tuning it and adding indexes tailored to these queries, which may negatively impact the write performance on the OLTP side or may not be possible if these are ad-hoc queries.

> (multi-TB databases under load) is where logical replication won't be able to sync your tables in time

I think the ceiling for logical replication (and optimization techniques around it) is quite high. But I wonder what people do when it doesn't work and scale?

delive 5 days ago | root | parent |

What would you consider to be small or medium? I have a use case for analytics on ~1 billion rows that are about 1TB in postgres. Have you tried on that volume?

exAspArk 5 days ago | root | parent |

We haven't tested this with 1TB Postgres databases yet, assuming that most companies operating at this scale already built analytics data pipelines :) I'm curious if you currently move the data from this Postgres to somewhere else, or not yet?

delive 5 days ago | root | parent |

Not yet, mostly just kicked the can down the road due to costs. Like you said in another post, careful indexes on postgres get you quite far, but not nearly as flexible as a columnar DB.

I think your project is great. I suspect incremental updates will be a big feature for most uptake (one we would need to try this out at least).

VoxPelli 5 days ago | prev | next |

The AGPL license is a no-go for me.

While it’s technically true that it’s an OSI license it’s mostly used to scare away competing cloud vendors from hosting the software, which isn’t in spirit of OSS.

Have you looked into the more modern choices?

Like the Business Source License that MariaDB created and uses or the Functional Source License that Sentry created as an improvement over the Business Source License? https://fsl.software/

Both those licenses have a fair source phase that automatically resolves into an open source phase over time.

Thus one gets the best of two worlds: An honest descriptive license for protecting one’s business model + a normal permissive OSS license that ensures longevity and prevents lock-in.

senorrib 5 days ago | root | parent | next |

You’re seriously calling out a perfectly valid OSS for not being “in the spirit of OSS”, and pitching for licenses that are explicitly NOT OSS?!

AGPL couldn’t be more in the spirit of OSS. The entire free software movement started to defend the _users_ freedom, not individual companies’.

VoxPelli 5 days ago | root | parent |

AGPL is often used by startups to achieve the effect of “fair source” licenses while still being able to claim to be fully OSS.

AGPL is a poor “fair source” license and a controversial OSS license.

AGPL in “fair source” projects are always paired with Contributor License Agreements etc that ensures the company behind it owns all the rights to the project and can re-license it however they want without having to abide by AGPL themselves. Which is not in the spirit of OSS at all. And if the company goes out of business, then AGPL makes it really hard for anyone else to eventually pick up and continue on the “fair source” business model of the project.

Using a proper “fair source” license like the Business Source License or the Functional Source License will make for a better and more honest “fair source” phase of the project and both those licenses will resolve to non-controversial OSS licenses over time.

So:

- Is “fair source” in the spirit of “open source”? No

- Is AGPL an “open source” license? Yes

- Is AGPL often used to OSS-wash “fair source” projects? Yes

- Are all AGPL projects “fair source” projects? No, there exists proper OSS-projects with AGPL licenses

- Is AGPL in startup projects almost always paired with CLA:s that makes the startup play by different rules than everyone else? Yes

- Is it more in the spirit of “open source” or “fair source” to have the startup play by different rules than everyone else? In the spirit of “fair source”

- Is AGPL a good choice for “fair source” projects? No, it perpetuates the different rules between the startup and the community forever, even after the startup has ceased to exist, whereas proper “fair source” licenses convert to less extreme OSS licenses over time

Edit: The only company that I know that does an AGPL project in an “open source” spirit rather than a “fair source” spirit is Sourcehut: https://sourcehut.org/blog/2022-10-09-ip-assignment-or-lack-...

exAspArk 5 days ago | root | parent | prev | next |

Our philosophy in general is to go to a more open license over time (vs the other direction). So we might consider other more permissive OSI-approved licenses.

Would you be able to share why AGPL license is a no-go for you? I'm genuinely curious about your use case. In simple words, it'd require a company to open source their BemiDB code only if they made modifications and were distributing it to other users (allowing modifications and using it internally without any restrictions)

VoxPelli 5 days ago | root | parent | prev |

So you have Contributor License Agreements that enable you to relicense to something like GPL or MIT whenever you want to?

Because AGPL itself can not be relicensed and it even needed special wording when created to become GPL-compliant.

Since you’re a startup I believe that you use AGPL to achieve the “fair source” idea (https://fair.io/) – where you yourself can provide a hosted service without providing all your source code while hoping others won’t be as they will need to provide all theirs.

In simplified terms: It’s an anti-AWS defense. Helping you avoid being outcompeted by a big cloud vendor using your project without paying anything for it.

And AGPL is a poor “fair source” license, especially as it was never designed to be a “fair source” license but also since it doesn’t give the impression of “fair source” but rather the impression of “open source”, giving an almost deceptive and dishonest look.

And since AGPL is perpetual, unlike eg BSL and FSL, one is stuck with the “fair source” license forever, rather than having it be converted into a mainstream OSS license over time. It can eg make it hard for someone else to eventually, if your company goes away and the project gets abandoned (which is sadly the most likely outcome of most startups), pick up the project and build a company around that while using similar “fair source” principles like you.

If you are doing like what SourceHut is doing (https://sourcehut.org/blog/2022-10-09-ip-assignment-or-lack-...) and going all in on AGPL and playing by its rules yourself as well and treating it as “open source” rather than “fair source”, then well done!

I still would likely want to avoid the legalese complexity of AGPL in my stack though and try to generally stick to permissive licenses and the occasional GPL-licensed projects, like eg Linux. And eg Google has similar guidelines when using OSS-code internally.

riiii 5 days ago | root | parent | prev |

Because you want to take their hard work, modify it and not share it back to the community?

I'm not crying that "it's not for you".

VoxPelli 5 days ago | root | parent |

Absolutely not.

I want to keep the legalese in my setups at a manageable level, avoid needless lock-ins (AGPL is one of the least compatible licenses out there, code from an AGPL project can only ever be used in another AGPL project) and have it be clear when I contribute to a proprietary project in disguise (which most startup AGPL projects are, thanks to CLA:s) and when I contribute to an actual OSS project

leighleighleigh 5 days ago | prev | next |

Definitely checking this out today! I use postgres for ~30 GB of machine learning data (object detection) and have a couple workflows which go through the Postgres->Parquet->DuckDB processing route.

A couple questions, if you have time:

1. How do you guys handle multi-dimensional arrays? I've had issues with a few postgres-facing interfaces (libraries or middleware) where they believe everything is a 1D array!

2. I saw you are using pg_duckdb/duckdb under the hood. I've had issues calling plain-SQL functions defined on the postgres server, when duckdb is involved. Does BemiDB support them?

Thanks for sharing, and good luck with it!

exAspArk 5 days ago | root | parent |

Thank you, please give it a try!

Great questions:

1. We currently don't support multi-dimensional arrays, but we plan to add support for such complex data structures.

2. Would you be able to share what type of user-defined functions are these, do they do modify the data or read it?

leighleighleigh 5 days ago | root | parent |

1. good to hear! 2. The bulk of them are convenience wrappers which resolve UUIDs into other values, so most are read-only with only a single table lookup.

globular-toast 5 days ago | prev | next |

I don't get how this would do away with the need for some kind of ETL. Most apps use highly normalised schemas that are completely unsuitable for analytical users. Not to mention you wouldn't want to couple your app schema to your warehouse schema. Am I missing something? How would this replace traditional data warehousing?

exAspArk 5 days ago | root | parent |

Good point. For more complex scenarios, people would still be able to implement, for example, a Medallion Architecture to progressively improve data quality and structure. Because it is Postgres- and Iceberg-compatible (db and data), it's possible to bring more other advanced data tools when it's needed to perform data transformation and movement. Currently, we see it as a Postgres read replica for analytics. But it's easy to imagine that in the future it could be used as a standalone OSS database on top of a data lakehouse with an open format in S3.

globular-toast 4 days ago | root | parent |

Cool, I can definitely see this smoothing the path towards a full DW solution, assuming that is ever needed. Could you see it working with something like dbt, say doing transformations in a dedicated pg database then serving the transformed data to users via the read replica?

Out of interest, do you know any good resources covering the current state of data engineering? I find the area quite impenetrable compared to software engineering. Almost like much of it is trade secrets and passed down knowledge and none of it written down.

exAspArk 3 days ago | root | parent |

Our plan is to make BemiDB work with dbt by leveraging the Postgres-compatibility (supported dbt adapters https://docs.getdbt.com/docs/trusted-adapters). So it should be possible to transform data from Postgres or directly from BemiDB, which may actually perform better.

You're right, the data engineering world is complex, constantly evolving, and has many various solutions. I'd also like to know about any good resources that people use :)

For us, we mostly talked to many potential users asking about their data setups and challenges, and had many conversations with friends and experts in this field. I also read a few weekly newsletters, substracks, and follow people in this space on X (many recently started posting on Bluesky). For a deeper research, reading docs and specs, experimenting, watching talks, listening to podcasts, reading subreddits, etc.

gregw2 5 days ago | prev | next |

So it looks like you don't use postgres extensions so you can run this on an EC2 against an Aurora Postgres instance and dump files to S3 Iceberg right?

And can you then have Glue Catalog auto-crawl them and expose them in Athena? Or are they DuckDB-managed Iceberg tables essentially?

exAspArk 5 days ago | root | parent |

Exactly! You can run it on any server connecting to any Postgres, without installing custom extensions (AWS Aurora supports only a limited number of extensions https://docs.aws.amazon.com/AmazonRDS/latest/AuroraPostgreSQ...).

The Iceberg tables are created separately from the DuckDB query engine. So you should be able to read these Iceberg tables by using any other Iceberg-compatible tools and services like AWS Athena.

darkbatman 5 days ago | prev | next |

We do quite similar but through Debezium/Kafka CDC pipeline to clickhouse. This way primary database is protected. Directly querying from files from postgres might get slower eventually IMO compare.

exAspArk 4 days ago | root | parent |

This is a great DIY setup. We're hoping to compress this stack and simplify it down to a single binary

jakozaur 5 days ago | prev | next |

Cool. Every database or data source (e.g. CRM) should produce Iceberg format for you.

Though a little sceptical of embedding DuckDB. It is easy and better to isolate Read/Write paths, and it has a lot of other benefits.

exAspArk 5 days ago | root | parent |

Iceberg for the win!

We actually separate Read/Write paths. BemiDB reads by levering DuckDB as a query engine. And it writes to Iceberg completely separately from DuckDB. I'm curious if that's what you imagined.

jakozaur 5 days ago | root | parent |

Ideally, I would love many places writing to the Iceberg catalogue and then using it as a data lake.

In the data lake, I would have a gateway provisioning DuckDB on demand for each user.

partdavid 5 days ago | prev | next |

How does this replicate Postgres data? I glanced at the code and saw that it exports to a CSV file then writes out an Iceberg table for an initial snapshot--does it use Postgres logical replication?

exAspArk 5 days ago | root | parent |

Full table re-syncing is our initial solution. Using Postgres logical replication is next on our roadmap!

oulipo 5 days ago | root | parent |

So if we have a very large table, it means each time I replicate (how often does that happen?) it needs to rewrite the entire table and upload it to S3, right?

gigatexal 5 days ago | prev | next |

Query Engine: embeds the DuckDB query engine to run analytical queries. Storage Layer: uses the Iceberg table format to store data in columnar compressed Parquet files.

Smart. Imma test this out for sure.

canadiantim 5 days ago | prev | next |

How does this compare to ParadeDB? Seems to occupy the same space

exAspArk 5 days ago | root | parent |

We love ParadeDB and their team. Their primary focus is search (Elasticsearch on Postgres), but they also have the pg_analytics Postgres extension (foreign data wrappers and embedded DuckDB).

The biggest difference is in a Postgres extension vs a separate OLAP process. We want to allow anyone with just Postgres to be able to perform analytics queries without affecting resources in the transactional database, building and installing extensions (might not be possible with some hosting providers), dealing with dependencies and their versions when upgrading Postgres, manually syncing data from Postgres to S3, etc.

neeleshs 5 days ago | prev | next |

Congratulations! I was looking and pg_analytics from ParadeDB hoping this use case would be solved (the dump from pg to parquet part), but it doesnt yet do it.

How does it handle updates?

exAspArk 5 days ago | root | parent |

Thank you!

The pg_analytics Postgres extension partially supports different file formats. We bet big on Iceberg open table format, which uses Parquet data files under the hood.

Our initial approach is to do periodic full table resyncing. The next step is to support incremental Iceberg operations like updates. This will involve creating a new "diff" Parquet file and using the Iceberg metadata to point to the new file version that changes some rows. Later this will enable time travel queries, schema evolution, etc.

anentropic 5 days ago | prev | next |

I'm looking for low latency queries over not-very-big data (40-100M rows) in user-facing dashboards

How does the latency of Iceberg-on-S3 compare to say an EBS volume?

exAspArk 5 days ago | root | parent |

I'd say that querying data from S3 is not ideal when low-latency queries are required. Generally, there could be a few roundtrip requests to fetch metadata (JSON, Avro) and data (Parquet) files, which may lead to around 1s or so latency. However, we have caching on our roadmap (it could be just a simple TTL for the fetched data or some more sophisticated caching depending on the synced & queried data)

anentropic 5 days ago | root | parent |

How tied is it to S3?

Could I easily point it to Iceberg tables on another storage target?

exAspArk 4 days ago | root | parent |

Yes!

BemiDB natively supports two storage layers, a local disk and S3 (we assumed that most people would choose this in production environments to simplify management).

When I query Iceberg tables stored on SSD, it works superfast.

hoerzu 5 days ago | prev | next |

Can you give an example if I have 5gig (2 million rows)

How will it be created differently for columnar access?

exAspArk 5 days ago | root | parent |

We ran some benchmarks (TPC-H, designed for OLAP) with ~10M records https://github.com/BemiHQ/BemiDB#benchmark

The BemiDB storage layer produced ~300MB columnar Parquet files (with ZSTD compression) vs 1.6GB of data in Postgres.

Sesse__ 5 days ago | root | parent |

Does TPC-H SF1 really take _one and a half hours_ for you on regular Postgres? Last time I tried (in the form of DBT-3), it was 22 queries and most of them ran in a couple seconds.

exAspArk 5 days ago | root | parent |

Interesting. I haven't used the DBT-3 kit, does it add any indexes? I manually added these Postgres indexes https://github.com/BemiHQ/BemiDB/blob/main/benchmark/data/cr... to reduce the main bottlenecks on SF0.1 and reduce the total time from 1h23m13s to 1.5s. But SF1 still took more than 1h

Sesse__ 5 days ago | root | parent |

It adds a bunch of indexes, yes. I don't think anyone really runs TPC-H unindexed unless they are using a database that plain doesn't support it; it wouldn't really give much meaningful information.

Edit: I seemingly don't have these benchmarks anymore, and I'm not going to re-run them now, but I found a very (_very_) roughly similar SF10 run clocking in around seven minutes total. So that's the order of magnitude I would be expecting, given ten times as much data.

exAspArk 5 days ago | root | parent |

Got it, thanks for sharing it! We'll try to look into DBT-3 and the indexes it creates to test with SF10

winddude 5 days ago | prev | next |

difference to something like duckdb?

woodhull 5 days ago | prev | next |

As much as DuckDB is cute I've mostly come to believe that Clickhouse is the perfect thing to pair Postgres with. This is especially true now that they've acquired PeerDB and are integrating it into the Clickpipes cloud product.

DuckDB is neat, and I understand why a company like BemiDB would build their product on top of it, but as a prospective customer embedded databases are a weird choice for serious workloads when there are other good open-source solutions like Clickhouse available.

maxmcd 5 days ago | root | parent | next |

Using duckdb and apache iceberg means that you can run read replicas without any operational burden. Clickhouse is amazing, but they do not allow you to mount dumb read replicas to object storage (yet).

I can imagine this product is a very elegant solution for many types of companies/teams/workloads.

zX41ZdbW 5 days ago | root | parent |

You can mount read replicas on object storage in ClickHouse.

Example:

    CREATE DATABASE test;
    USE test;

    CREATE TABLE hackernews_history UUID '66491946-56e3-4790-a112-d2dc3963e68a'
    (
        `update_time` DateTime DEFAULT now(),
        `id` UInt32,
        `deleted` UInt8,
        `type` Enum8('story' = 1, 'comment' = 2, 'poll' = 3, 'pollopt' = 4, 'job' = 5),
        `by` LowCardinality(String),
        `time` DateTime,
        `text` String,
        `dead` UInt8,
        `parent` UInt32,
        `poll` UInt32,
        `kids` Array(UInt32),
        `url` String,
        `score` Int32,
        `title` String,
        `parts` Array(UInt32),
        `descendants` Int32
    )
    ENGINE = ReplacingMergeTree(update_time)
    ORDER BY id
    SETTINGS disk = disk(readonly = true, type = 's3_plain_rewritable', endpoint = 'https://clicklake-test-2.s3.eu-central-1.amazonaws.com/', use_environment_credentials = false);
And you can try it right now.

Install ClickHouse:

    curl https://clickhouse.com/ | sh
    ./clickhouse local
Run the query above to attach the table.

The table is updated in real time. For example, here is your comment:

    :) SELECT * FROM hackernews_history WHERE text LIKE '%Clickhouse is amazing%' ORDER BY update_time \G

    Row 1:
    ──────
    update_time: 2024-04-06 16:35:28
    id:          39785472
    deleted:     0
    type:        comment
    by:          mightybyte
    time:        2024-03-21 22:59:20
    text:        I&#x27;ll second this.  Clickhouse is amazing.  I was actually using it today to query some CSV files.  I had to refresh my memory on the syntax so if anyone is interested:<p><pre><code>  clickhouse local -q &quot;SELECT foo, sum(bar) FROM file(&#x27;foobar.csv&#x27;, CSV) GROUP BY foo FORMAT Pretty&quot;
    </code></pre>
    Way easier than opening in Excel and creating a pivot table which was my previous workflow.<p>Here&#x27;s a list of the different input and output formats that it supports.<p><a href="https:&#x2F;&#x2F;clickhouse.com&#x2F;docs&#x2F;en&#x2F;interfaces&#x2F;formats" rel="nofollow">https:&#x2F;&#x2F;clickhouse.com&#x2F;docs&#x2F;en&#x2F;interfaces&#x2F;formats</a>
    dead:        0
    parent:      39784942
    poll:        0
    kids:        [39788575]
    url:         
    score:       0
    title:       
    parts:       []
    descendants: 0

    Row 2:
    ──────
    update_time: 2024-04-06 18:07:34
    id:          31334599
    deleted:     0
    type:        comment
    by:          richieartoul
    time:        2022-05-11 00:54:31
    text:        Not really. Clickhouse is amazing, but if you want to run it at massive scale you’ll have to invest a lot into sharding and clustering and all that. Druid is more distributed by default, but doesn’t support as sophisticated of queries as Clickhouse does.<p>Neither Clickhouse nor Druid can hold a candle to what Snowflake can do in terms of query capabilities, as well as the flexibility and richness of their product.<p>That’s just scratching the surface. They’re completely different product categories IMO, although they have a lot of technical &#x2F; architectural overlap depending on how much you squint.<p>Devil is in the details basically.
    dead:        0
    parent:      31334527
    poll:        0
    kids:        [31334736]
    url:         
    score:       0
    title:       
    parts:       []
    descendants: 0

    Row 3:
    ──────
    update_time: 2024-11-07 22:29:09
    id:          42081672
    deleted:     0
    type:        comment
    by:          maxmcd
    time:        2024-11-07 22:13:12
    text:        Using duckdb and apache iceberg means that you can run read replicas without any operational burden. Clickhouse is amazing, but they do not allow you to mount dumb read replicas to object storage (yet).<p>I can imagine this product is a very elegant solution for many types of companies&#x2F;teams&#x2F;workloads.
    dead:        0
    parent:      42080385
    poll:        0
    kids:        []
    url:         
    score:       0
    title:       
    parts:       []
    descendants: 0

    3 rows in set. Elapsed: 3.981 sec. Processed 42.27 million rows, 14.45 GB (10.62 million rows/s., 3.63 GB/s.)
    Peak memory usage: 579.26 MiB.

oulipo 5 days ago | root | parent | next |

When I try your code I get this, any idea?

Query id: daa202a3-874c-4a68-9e3c-974560ba4624

Elapsed: 0.092 sec.

Received exception: Code: 499. DB::Exception: The AWS Access Key Id you provided does not exist in our records. (Code: 23, S3 exception: 'InvalidAccessKeyId'): While processing disk(readonly = true, type = 's3_plain_rewritable', endpoint = 'https://clicklake-test-2.s3.eu-central-1.amazonaws.com/', use_environment_credentials = false). (S3_ERROR)

zX41ZdbW 4 days ago | root | parent |

It can be a wrong AWS profile in your configuration. In that case, you can correct the configuration, or use an empty server, docker container, or even CloudShell.

oulipo 5 days ago | root | parent | prev | next |

Very interesting, can you give more info on how this could be used for instance in my IoT case where I want to keep the last 3 months (say) of data in Postgres, and dump old data in parquet/iceberg on S3, and be able to do analytical queries on the past data? Would that be hard to do?

And how does the real-time update work? Could I make it so that my latest data is incrementally sync'd on S3 (eg "the last 3-months block" is incrementally updated efficiently each time there is new data) ?

Do you have example code / setup for this?

zX41ZdbW 4 days ago | root | parent |

You can store all data in ClickHouse (on S3 or on local storage); there is no need to separate historical and real-time data.

To insert data into ClickHouse, you use the INSERT query to insert data as frequently as you'd like.

Alternatively, you can set up continuous replication from Postgres to ClickHouse, which is available in ClickHouse Cloud.

mdaniel 4 days ago | root | parent | prev | next |

> 3 rows in set. Elapsed: 3.981 sec. Processed 42.27 million rows, 14.45 GB (10.62 million rows/s., 3.63 GB/s.)

Since you were running $(./clickhouse local) does that mean the query downloaded 14.45GB out of S3 to your machine? The 3.981s seems to imply "no," but I struggle to think what meaning that output would otherwise try to convey

zX41ZdbW 4 days ago | root | parent |

This is the amount of processed data after decompression. The amount of compressed data is less. You can press whitespace during the query run to see the detailed metrics.

    ReadBufferFromS3Bytes                     8.95 GB
The amount of compressed data read from S3 is 8.95 GB. Note: it sounds quite large, interesting why compression is less than 2x on this dataset. Most likely, it uses just lz4.

I recommend trying it because it works everywhere. You can run ClickHouse on your laptop, on a VM, or even in Cloudshell in AWS.

maxmcd 5 days ago | root | parent | prev | next |

Whoops, I forgot that tables maintained in the db are not the same as remote archives. :|

zX41ZdbW 4 days ago | root | parent |

The setup in my example uses a table updated by one ClickHouse server and observed by an infinite amount of read-only replicas, which could run anywhere.

exAspArk 5 days ago | root | parent | prev | next |

ClickHouse is definitely a popular choice nowadays. I'm curious whether you self-host ClickHouse or use their Cloud? We wanted to make BemiDB as simple to run as possible with a single binary and object storage (vs large machines, big disks, clustering, running Temporal for CDC, etc.)

hipadev23 5 days ago | prev | next |

I don’t totally understand the fascination with storing analytical data on S3. It’s not fast, and if you’re in a write heavy environment it’s definitely not cheap either.

What’s with the avoidance of clickhouse or duckdb paired with insanely fast EBS or even physically attached storage? You can still backup to s3, but using s3 for live analytics queries is missing out on so much of the speed.

exAspArk 5 days ago | root | parent | next |

My few cents:

- Compute and storage separation simplifies managing a system making compute "ephemeral"

- Compute resources can be scaled separately without worrying about scaling storage

- Object storage provides much higher durability (99.999999999% on S3) compared to disks

- Open table formats on S3 become a universal interface in the data space allowing to bring many other data tools if necessary

- Costs at scale can actually be lower since there is no data transfer cost within the same region. For example, you can check out WarpStream (Kafka on object storage) case studies that claim saving 5-10x

potamic 5 days ago | root | parent | prev | next |

Would love the authors to pitch in with their use cases, but I think most people simply do not need sub millisecond analytics. This is mostly replacing typical spark pipelines where you're okay with sub second latencies.

S3 is the cheapest, fully managed storage you can get that can scale infinitely. When you're already archiving to S3, doubling it for analytics saves cost and simplifies data management.

hipadev23 4 days ago | root | parent |

Sub-millisecond? Literally nobody in large data analysis is doing that. Can you cite some sources or show me what sort of setups (with any DB tech) that you're able to run meaningful queries on S3 sources with even sub-second latency?

broner 3 days ago | root | parent |

I'm doing this with clickhouse querying parquet files on S3 from an EC2 instance in the same region as the S3 bucket (yes DuckDB pretty similar). S3 time to first byte within AWS is 50ms and I get close to saturating an big EC2 instance's 100Gb link doing reads. For OLTP type queries fetching under 1 MB you'll see ~4 round trips + transfer time of compressed data so 150-200 ms latency.

hipadev23 3 days ago | root | parent |

Are you using s3 local cache? Do you have heavy writes? What type of s3 disk type, if any, are you using? (s3, s3_plain, s3_plain_rewritable)? Or are you just using the s3 functions.

Clickhouse is amazing but I still struggle getting it working efficiently on s3, especially writes.

broner 3 days ago | root | parent |

My workload is 100% read. Querying zstd parquet on s3 standard. Neither clickhouse nor duckdb has a great s3 driver, which is why smart people like https://www.boilingdata.com/ wrote their own. I compared a handful of queries and found DuckDB makes a lot of round trips and Clickhouse takes the opposite approach and just reads the entire parquet file.

nine_k 5 days ago | root | parent | prev | next |

S3 is a protocol understood by "everyone". If you're on AWS, as many are, it's basically the only natural choice. But a number of cloud providers, and a bunch of self-hostable software offer an S3 interface.

Clickhouse on local NVMe is one possible solution, but then you are married to that solution. An S3 interface is more universal and allows you to mix and match your tools, even though this comes at some expense.

ozgrakkurt 5 days ago | root | parent | prev | next |

It is immense amounts of marketing and being lazy to implement actual local storage with replication etc. It just makes everything easier and is marketed a lot.

cocoflunchy 5 days ago | prev | next |

What I would really love is a dead simple way to: 1) connect to my transactional Postgres db 2) define my materialized views 3) have these views update in realtime 4) query these views with a fast engine

And ideally have the whole thing open source and be able to run it in CI

We tried peerdb + clickhouse but Clickhouse materialized views are not refreshed when joining tables.

Right now we’re back to standard materialized views inside Postgres refreshed once a day but the full refreshes are pretty slow… the operational side is great though, a single db to manage.

paurora 5 days ago | prev |

very cool!! We have the same vision with pg_moooncake: https://github.com/Mooncake-Labs/pg_mooncake/tree/main

From what I understand, the BemiDB experience is akin to PeerDB + Clickhouse. It's not really a Postgres extension?

Glad open table formats are becoming mainstream, for everyone.

exAspArk 5 days ago | root | parent | next |

Thanks!

We love the pg_moooncake extension (and pg_duckdb used under the hood). Although our approaches are slightly different. Long-term, we want to allow anyone to use BemiDB by using native Postgres logical replication without installing any extensions (many Postgres hosting providers impose their restrictions, upgrading versions might be challenging, OLAP queries may affect OLTP performance if within the same database, etc.)

shayonj 5 days ago | root | parent | prev |

Absolutely stoked for pg_mooncake. I really want to see some of these things happening inside PG and taking advantage of the PG internals + native storage. Only bummer is adoption by places where users are currently, say Aurora. But thats probably a problem for another day :)

P.S The integration with something like Neon is really cool to see.