py3status v3.29

Almost 5 months after the latest release (thank you COVID) I’m pleased and relieved to have finally packaged and pushed py3status v3.29 to PyPi and Gentoo portage!

This release comes with a lot of interesting contributions from quite a bunch of first-time contributors so I thought that I’d thank them first for a change!

Thank you contributors!

  • Jacotsu
  • lasers
  • Marc Poulhiès
  • Markus Sommer
  • raphaunix
  • Ricardo Pérez
  • vmoyankov
  • Wilmer van der Gaast
  • Yaroslav Dronskii

So what’s new in v3.29?

Two new exciting modules are in!

  • prometheus module: to display your promQL queries on your bar
  • watson module: for the watson time-tracking tool

Then some interesting bug fixes and enhancements are to be noted

  • py3.requests: return empty json on remote server problem fix #1401
  • core modules: remove deprectated function, fix type annotation support (#1942)

Some modules also got improved

  • battery_level module: add power consumption placeholder (#1939) + support more battery paths detection (#1946)
  • do_not_disturb module: change pause default from False to True
  • mpris module: implement broken chromium mpris interface workaround (#1943)
  • sysdata module: add {mem,swap}_free, {mem,swap}_free_unit, {mem,swap}_free_percent + try to use default intel/amd sensors first
  • google_calendar module: fix imports for newer google-python-client-api versions (#1948)

Next version of py3status will certainly drop support for EOL Python 3.5!

Python scylla-driver: how we unleashed the Scylla monster’s performance

At Scylla summit 2019 I had the chance to meet Israel Fruchter and we dreamed of working on adding shard-awareness support to the Python cassandra-driver which would be known as scylla-driver.

A few months later, when Israel reached out to me on the Scylla-Users #pythonistas Slack channel with the first draft PR I was so excited that I jumped in the wagon to help!

The efforts we put into the scylla-driver paid off and significantly improved the performance of the production applications that I had the chance to switch to using it by 15 to 25%!

Before we reached those numbers and even released the scylla-driver to PyPi, EuroPython 2020 RFP was open and I submitted a talk proposal which was luckily accepted by the community.

So I had the chance to deep-dive into Cassandra vs Scylla architecture differences, explain the rationale behind creating the scylla-driver and give Python code details on how we implemented it and the challenges we faced doing so. Check my talk spage for

I also explained that I wrote an RFC on the scylla-dev mailing list which lead the developers of Scylla to implement a new connection-to-shard algorithm that will allow clients connecting to a new listening port to select the actual shard they want a connection to.

This is an expected major optimization from the current mostly random way of connecting to all shards and I’m happy to say that it’s been implemented and is ready to be put to use by all the scylla drivers.

I’ve recently been contacted by PyCon India and other Python related conferences organizers for a talk so I’ve submitted one to PyCon India where I hope I’ll be able to showcase even better numbers thanks to the new algorithm!

After my Europython talk we also had very interesting discussions with Pau Freixes about his work on a fully asynchronous Python driver that wraps the C++ driver to get the best possible performance. First results are absolutely amazing so if you’re interested in this, make sure to give it a try and contribute to the driver!

Stay tuned for more amazing query latencies 😉

py3status v3.28 – goodbye py2.6-3.4

The newest version of py3status starts to enforce the deprecation of Python 2.6 to 3.4 (included) initiated by Thiago Kenji Okada more than a year ago and orchestrated by Hugo van Kemenade via #1904 and #1896.

Thanks to Hugo, I discovered a nice tool by @asottile to update your Python code base to recent syntax sugars called pyupgrade!

Debian buster users might be interested in the installation war story that @TRS-80 kindly described and the final (and documented) solution found.

Changelog since v3.26

  • drop support for EOL Python 2.6-3.4 (#1896), by Hugo van Kemenade
  • i3status: support read_file module (#1909), by @lasers thx to @dohseven
  • clock module: add “locale” config parameter to change time representation (#1910), by inemajo
  • docs: update debian instructions fix #1916
  • mpd_status module: use currentsong command if possible (#1924), by girst
  • networkmanager module: allow using the currently active AP in formats (#1921), by Benoît Dardenne
  • volume_status module: change amixer flag ordering fix #1914 (#1920)

Thank you contributors

  • Thiago Kenji Okada
  • Hugo van Kemenade
  • Benoît Dardenne
  • @dohseven
  • @inemajo
  • @girst
  • @lasers

Scylla Summit 2019

I’ve had the pleasure to attend again and present at the Scylla Summit in San Francisco and the honor to be awarded the Most innovative use case of Scylla.

It was a great event, full of friendly people and passionate conversations. Peter did a great full write-up of it already so I wanted to share some of my notes instead…

This a curated set of topics that I happened to question or discuss in depth so this post is not meant to be taken as a full coverage of the conference.

Scylla Manager version 2

The upcoming version of scylla-manager is dropping its dependency on SSH setup which will be replaced by an agent, most likely shipped as a separate package.

On the features side, I was a bit puzzled by the fact that ScyllaDB is advertising that its manager will provide a repair scheduling window so that you can control when it’s running or not.

Why did it struck me you ask?

Because MongoDB does the same thing within its balancer process and I always thought of this as a patch to a feature that the database should be able to cope with by itself.

And that database-do-it-better-than-you motto is exactly one of the promises of Scylla, the boring database, so smart at handling workload impacts on performance that you shouldn’t have to start playing tricks to mitigate them… I don’t want this time window feature on scylla-manager to be a trojan horse on the demise of that promise!

Kubernetes

They almost got late on this but are working hard to play well with the new toy of every tech around the world. Helm charts are also being worked on!

The community developed scylla operator by Yannis is now being worked on and backed by ScyllaDB. It can deploy, scale up and down a cluster.

Few things to note:

  • it’s using a configmap to store the scylla config
  • no TLS support yet
  • no RBAC support yet
  • kubernetes networking is lighter on the network performance hit that was seen on Docker
  • use placement strategies to dedicate kubernetes nodes to scylla!

Change Data Capture

Oh boy this one was awaited… but it’s now coming soon!

I inquired about it’s performance impact since every operation will be written to a table. Clearly my questioning was a bit alpha since CDC is still being worked on.

I had the chance to discuss ideas with Kamil, Tzach and Dor: one of the thing that one of my colleague Julien asked for was the ability for the CDC to generate an event when a tombstone is written so we could actually know when a specific data expired!

I want to stress a few other things too:

  • default TTL on CDC table is 24H
  • expect I/O impact (logical)
  • TTL tombstones can have a hidden disk space cost and nobody was able to tell me if the CDC table was going to be configured with a lower gc_grace_period than the default 10 days so that’s something we need to keep in mind and check for
  • there was no plan to add user information that would allow us to know who actually did the operation, so that’s something I asked for because it could be used as a cheap and open source way to get auditing!

LightWeight Transactions

Another so long awaited feature is also coming from the amazing work and knowledge of Konstantin. We had a great conversation about the differences between the currently worked on Paxos based LWT implementation and the maybe later Raft one.

So yes, the first LWT implementation will be using Paxos as a consensus algorithm. This will make the LWT feature very consistent while having it slower that what could be achieved using Raft. That’s why ScyllaDB have plans on another implementation that could be faster with less data consistency guarantees.

User Defined Functions / Aggregations

This one is bringing the Lua language inside Scylla!

To be precise, it will be a Lua JIT as its footprint is low and Lua can be cooperative enough but the ScyllaDB people made sure to monitor its violations (when it should yield but does not) and act strongly upon them.

I got into implementation details with Avi, this is what I noted:

  • lua function return type is not checked at creation but at execution, so expect runtime errors if your lua code is bad
  • since lua is lightweight, there’s no need to assign a core to lua execution
  • I found UDA examples, like top-k rows, to be very similar to the Map/Reduce logic
  • UDF will allow simpler token range full table scans thanks to syntax sugar
  • there will be memory limits applied to result sets from UDA, and they will be tunable

Text search

Dejan is the text search guy at ScyllaDB and the one who kindly implemented the LIKE feature we asked for and that will be released in the upcoming 3.2 version.

We discussed ideas and projected use cases to make sure that what’s going to be worked on will be used!

Redis API

I’ve always been frustrated about Redis because while I love the technology I never trusted its clustering and scaling capabilities.

What if you could scale your Redis like Scylla without giving up on performance? That’s what the implementation of the Redis API backed by Scylla will get us!

I’m desperately looking forward to see this happen!

py3status v3.20 – EuroPython 2019 edition

Shame on me to post this so long after it happened… Still, that’s a funny story to tell and a lot of thank you to give so let’s go!

The py3status EuroPython 2019 sprint

I’ve attended all EuroPython conferences since 2013. It’s a great event and I encourage everyone to get there!

The last two days of the conference week are meant for Open Source projects collaboration: this is called sprints.

I don’t know why but this year I decided that I would propose a sprint to welcome anyone willing to work on py3status to come and help…

To be honest I was expecting that nobody would be interested so when I sat down at an empty table on saturday I thought that it would remain empty… but hey, I would have worked on py3status anyway so every option was okay!

Then two students came. They ran Windows and Mac OS and never heard of i3wm or py3status but were curious so I showed them. They could read C so I asked them if they could understand how i3status was reading its horrible configuration file… and they did!

Then Oliver Bestwalter (main maintainer of tox) came and told me he was a long time py3status user… followed by Hubert Bryłkowski and Ólafur Bjarni! Wow..

We joined forces to create a py3status module that allows the use of the great PewPew hardware device created by Radomir Dopieralski (which was given to all attendees) to control i3!

And we did it and had a lot of fun!

Oliver’s major contribution

The module itself is awesome okay… but thanks to Oliver’s experience with tox he proposed and contributed one of the most significant feature py3status has had: the ability to import modules from other pypi packages!

The idea is that you have your module or set of modules. Instead of having to contribute them to py3status you could just publish them to pypi and py3status will automatically be able to detect and load them!

The usage of entry points allow custom and more distributed modules creation for our project!

Read more about this amazing feature on the docs.

All of this happened during EuroPython 2019 and I want to extend once again my gratitude to everyone who participated!

Thank you contributors

Version 3.20 is also the work of cool contributors.
See the changelog.

  • Daniel Peukert
  • Kevin Pulo
  • Maxim Baz
  • Piotr Miller
  • Rodrigo Leite
  • lasers
  • luto

Scylla: four ways to optimize your disk space consumption

We recently had to face free disk space outages on some of our scylla clusters and we learnt some very interesting things while outlining some improvements that could be made to the ScyllaDB guys.

100% disk space usage?

First of all I wanted to give a bit of a heads up about what happened when some of our scylla nodes reached (almost) 100% disk space usage.

Basically they:

  • stopped listening to client requests
  • complained in the logs
  • wouldn’t flush commitlog (expected)
  • abort their compaction work (which actually gave back a few GB of space)
  • stay in a stuck / unable to stop state (unexpected, this has been reported)

After restarting your scylla server, the first and obvious thing you can try to do to get out of this situation is to run the nodetool clearsnapshot command which will remove any data snapshot that could be lying around. That’s a handy command to reclaim space usually.

Reminder: depending on your compaction strategy, it is usually not advised to allow your data to grow over 50% of disk space...

But that’s only a patch so let’s go down the rabbit hole and look at the optimization options we have.


Optimize your schemas

Schema design and the types your choose for your columns have a huge impact on disk space usage! And in our case we indeed overlooked some of the optimizations that we could have done from the start and that did cost us a lot of wasted disk space. Fortunately it was easy and fast to change.

To illustrate this, I’ll take a sample of 100,000 rows of a simple and naive schema associating readings of 50 integers to a user ID:

Note: all those operations were done using Scylla 3.0.3 on Gentoo Linux.

CREATE TABLE IF NOT EXISTS test.not_optimized
(
uid text,
readings list<int>,
PRIMARY KEY(uid)
) WITH compression = {};

Once inserted on disk, this takes about 250MB of disk space:

250M    not_optimized-00cf1500520b11e9ae38000000000004

Now depending on your use case, if those readings at not meant to be updated for example you could use a frozen list instead, which will allow a huge storage optimization:

CREATE TABLE IF NOT EXISTS test.mid_optimized
(
uid text,
readings frozen<list<int>>,
PRIMARY KEY(uid)
) WITH compression = {};

With this frozen list we now consume 54MB of disk space for the same data!

54M     mid_optimized-011bae60520b11e9ae38000000000004

There’s another optimization that we could do since our user ID are UUIDs. Let’s switch to the uuid type instead of text:

CREATE TABLE IF NOT EXISTS test.optimized
(
uid uuid,
readings frozen<list<int>>,
PRIMARY KEY(uid)
) WITH compression = {};

By switching to uuid, we now consume 50MB of disk space: that’s a 80% reduced disk space consumption compared to the naive schema for the same data!

50M     optimized-01f74150520b11e9ae38000000000004

Enable compression

All those examples were not using compression. If your workload latencies allows it, you should probably enable compression on your sstables.

Let’s see its impact on our tables:

ALTER TABLE test.not_optimized WITH compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'};
ALTER TABLE test.mid_optimized WITH compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'};
ALTER TABLE test.optimized WITH compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'};

Then we run a nodetool compact test to force a (re)compaction of all the sstables and we get:

63M     not_optimized-00cf1500520b11e9ae38000000000004
28M mid_optimized-011bae60520b11e9ae38000000000004
24M optimized-01f74150520b11e9ae38000000000004

Compression is really a great gain here allowing another 50% reduced disk space usage reduction on our optimized table!

Switch to the new “mc” sstable format

Since the Scylla 3.0 release you can use the latest “mc” sstable storage format on your scylla clusters. It promises a greater efficiency for usually a way more reduced disk space consumption!

It is not enabled by default, you have to add the enable_sstables_mc_format: true parameter to your scylla.yaml for it to be taken into account.

Since it’s backward compatible, you have nothing else to do as new compactions will start being made using the “mc” storage format and the scylla server will seamlessly read from old sstables as well.

But in our case of immediate disk space outage, we switched to the new format one node at a time, dropped the data from it and ran a nodetool rebuild to reconstruct the whole node using the new sstable format.

Let’s demonstrate its impact on our test tables: we add the option to the scylla.yaml file, restart scylla-server and run nodetool compact test again:

49M     not_optimized-00cf1500520b11e9ae38000000000004
26M mid_optimized-011bae60520b11e9ae38000000000004
22M optimized-01f74150520b11e9ae38000000000004

That’s a pretty cool gain of disk space, even more for the not optimized version of our schema!

So if you’re in great need of disk space or it is hard for you to change your schemas, switching to the new “mc” sstable format is a simple and efficient way to free up some space without effort.

Consider using secondary indexes

While denormalization is the norm (yep.. legitimate pun) in the NoSQL world this does not mean we have to duplicate everything all the time. A good example lies in the internals of secondary indexes if your workload can compromise with its moderate impact on latency.

Secondary indexes on scylla are built on top of Materialized Views that basically stores an up to date pointer from your indexed column to your main table partition key. That means that secondary indexes MVs are not duplicating all the columns (and thus the data) from your main table as you would have to do when denormalizing a table to query by another column: this saves disk space!

This of course comes with a latency drawback because if your workload is interested in the other columns than the partition key of the main table, the coordinator node will actually issue two queries to get all your data:

  1. query the secondary index MV to get the pointer to the partition key of the main table
  2. query the main table with the partition key to get the rest of the columns you asked for

This has been an effective trick to avoid duplicating a table and save disk space for some of our workloads!

(not a tip) Move the commitlog to another disk / partition?

This should only be considered as a sort of emergency procedure or for cost efficiency (cheap disk tiering) on non critical clusters.

While this is possible even if the disk is not formatted using XFS, it not advised to separate the commitlog from data on modern SSD/NVMe disks but… you technically can do it (as we did) on non production clusters.

Switching is simple, you just need to change the commitlog_directory parameter in your scylla.yaml file.

py3status v3.17

I’m glad to announce a new (awaited) release of py3status featuring support for the sway window manager which allows py3status to enter the wayland environment!

Updated configuration and custom modules paths detection

The configuration section of the documentation explains the updated detection of the py3status configuration file (with respect of XDG_CONFIG environment variables):

  • ~/.config/py3status/config
  • ~/.config/i3status/config
  • ~/.config/i3/i3status.conf
  • ~/.i3status.conf
  • ~/.i3/i3status.conf
  • /etc/xdg/i3status/config
  • /etc/i3status.conf

Regarding custom modules paths detection, py3status does as described in the documentation:

  • ~/.config/py3status/modules
  • ~/.config/i3status/py3status
  • ~/.config/i3/py3status
  • ~/.i3/py3status

Highlights

Lots of modules improvements and clean ups, see changelog.

  • we worked on the documentation sections and content which allowed us to fix a bunch of typos
  • our magic @lasers have worked a lot on harmonizing thresholds on modules along with a lot of code clean ups
  • new module: scroll to scroll modules on your bar (#1748)
  • @lasers has worked a lot on a more granular pango support for modules output (still work to do as it breaks some composites)

Thanks contributors

  • Ajeet D’Souza
  • @boucman
  • Cody Hiar
  • @cyriunx
  • @duffydack
  • @lasers
  • Maxim Baz
  • Thiago Kenji Okada
  • Yaroslav Dronskii

Bye bye Google Analytics

A few days ago, I removed Google Analytics from my blog and trashed the associated account.

I’ve been part of the Marketing Tech and Advertising Tech industries for over 15 years. I design and operate data processing platforms (including web navigation trackers) for a living. So I thought that maybe sharing the reasons of why I took this decision might be of interest for some people. I’ll keep it short.

MY convenience is not a enough reason to send YOUR data to Google

The first and obvious question I asked myself is why did I (and so many people) set up this tracker on my web site?

My initial answer was a mix of:

  • convenience : it’s easy to set up, there’s a nice interface, you get a lot of details, you don’t have to ask yourself how it’s done, it just works
  • insight : it sounded somewhat important to know who was visiting what content and somehow know about the interest of people visiting

With also a (hopefully not too much) of:

  • pride: are some blog posts popular? if so which one and let’s try to do more like this!

I don’t think those are good enough reasons to add a tracker that sends YOUR data to Google.

Convenience kills diversity

I’m old enough to have witnessed the rise of internet and its availability to (almost) everyone. The first things I did when I could connect was create and host my own web site, it was great and looked ugly!

But while Internet could have been a catalyst for diversity, it turned out to create an over concentration on services and tools that we think are hard to live without because of their convenience (and a human tendency for mimicry).

When your choices are reduced and the mass adoption defines your standards, it’s easy to let it go and pretend you don’t care that much.

I decided to stop pretending that I don’t care. I don’t want to participate in the concentration of web navigation tracking to Google.

Open Internet is at risk

When diversity is endangered so is Open Internet. This idea that a rich ecosystem can bring their own value and be free to grow by using the data they generate or collect is threatened by the GAFA who are building walled gardens around OUR data.

For instance, Google used the GDPR regulation as an excuse to close down the access to the data collected by their (so convenient) services. If a company (or you) wants to access / query this data (YOUR data) then you can only by using their billed tools.

What should have been only a clear win for us people turned out to also benefit those super blocks and threaten diversity and Open Internet even more.

Adding Google Analytics to your web site helps Google have a larger reach and tracking footprint on the whole web: imagine all those millions of web sites visits added together, that’s where the value is for them. No wonder GA is free.

So in this regard too, I decided to stop contributing to the empowerment of Google.

This blog is Tracking Free

So from now on if you want to share your thoughts of just let me know you enjoyed a post on this blog, take the lead on YOUR data and use the comment box.

The choice is yours!

py3status v3.16

Two py3status versions in less than a month? That’s the holidays effect but not only!

Our community has been busy discussing our way forward to 4.0 (see below) and organization so it was time I wrote a bit about that.

Community

A new collaborator

First of all we have the great pleasure and honor to welcome Maxim Baz @maximbaz as a new collaborator on the project!

His engagement, numerous contributions and insightful reviews to py3status has made him a well known community member, not to mention his IRC support 🙂

Once again, thank you for being there Maxim!

Zen of py3status

As a result of an interesting discussion, we worked on defining better how to contribute to py3status as well as a set of guidelines we agree on to get the project moving on smoothly.

Here is born the zen of py3status which extends the philosophy from the user point of view to the contributor point of view!

This allowed us to handle the numerous open pull requests and get their number down to 5 at the time of writing this post!

Even our dear @lasers don’t have any open PR anymore 🙂

3.15 + 3.16 versions

Our magic @lasers has worked a lot on general modules options as well as adding support for i3-gaps added features such as border coloring and fine tuning.

Also interesting is the work of Thiago Kenji Okada @m45t3r around NixOS packaging of py3status. Thanks a lot for this work and for sharing Thiago!

I also liked the question of Andreas Lundblad @aioobe asking if we could have a feature allowing to display a custom graphical output, such as a small PNG or anything upon clicking on the i3bar, you might be interested in following up the i3 issue he opened.

Make sure to read the amazing changelog for details, a lot of modules have been enhanced!

Highlights

  • You can now set a background, border colors and their urgent counterparts on a global scale or per module
  • CI now checks for black format on modules, so now all the code base obey the black format style!
  • All HTTP requests based modules now have a standard way to define HTTP timeout as well as the same 10 seconds default timeout
  • py3-cmd now allows sending click events with modifiers
  • The py3status -n / –interval command line argument has been removed as it was obsolete. We will ignore it if you have set it up, but better remove it to be clean
  • You can specify your own i3status binary path using the new -u, –i3status command line argument thanks to @Dettorer and @lasers
  • Since Yahoo! decided to retire its public & free weather API, the weather_yahoo module has been removed

New modules

  • new conky module: display conky system monitoring (#1664), by lasers
  • new module emerge_status: display information about running gentoo emerge (#1275), by AnwariasEu
  • new module hueshift: change your screen color temperature (#1142), by lasers
  • new module mega_sync: to check for MEGA service synchronization (#1458), by Maxim Baz
  • new module speedtest: to check your internet bandwidth (#1435), by cyrinux
  • new module usbguard: control usbguard from your bar (#1376), by cyrinux
  • new module velib_metropole: display velib metropole stations and (e)bikes (#1515), by cyrinux

A word on 4.0

Do you wonder what’s gonna be in the 4.0 release?
Do you have ideas that you’d like to share?
Do you have dreams that you’d love to become true?

Then make sure to read and participate in the open RFC on 4.0 version!

Development has not started yet; we really want to hear from you.

Thank you contributors!

There would be no py3status release without our amazing contributors, so thank you guys!

  • AnwariasEu
  • cyrinux
  • Dettorer
  • ecks
  • flyingapfopenguin
  • girst
  • Jack Doan
  • justin j lin
  • Keith Hughitt
  • L0ric0
  • lasers
  • Maxim Baz
  • oceyral
  • Simon Legner
  • sridhars
  • Thiago Kenji Okada
  • Thomas F. Duellmann
  • Till Backhaus