Programming or How to Exit Vi Without Rebooting

yea that page is absolutely infested with trackers and ads and stuff - I’m glad I’m not the only one that couldn’t make sense of it

1 Like

I thought “Python Web Scraping Cookbook” was helpful when I was trying to scrape our internal Grafana dash board. It has examples from a couple of different frameworks and I was able to brute force a scraper in a weekend with a lot of googling. The downside is it is a bit old so some of the frameworks have changed, but if you like reading it may expand your background on the subject.

1 Like

thanks. I’ll definitely check that out.

when you mention grafana, at first I thought there has to be a better way to do that than scrape it, but I’m not entirely sure now. we used to run a lot of triggers based on metrics grafana was getting, but, I’m not sure if there was a way to export the actual data other than hitting data sources directly

I’m using an enterprise datadog setup for one of my projects right now and liking it a lot more.

The right way to do it is to get god damn access to the mother fucking database grafana pulls from and use that instead of trying to scrape a dash board. I’m pretty much over it now though…

1 Like

yea that’s not always possible though, particularly when wherever grafana is hosted is whitelisted to the db, but you’re not - lol godspeed

1 Like

You are correct, in my case I worked for the company and they never gave me access to the production database that I was responsible for getting data into… It kept me from saving our developers time and was pretty frustrating. Then we got bought and I have whole new levels of not having access to worry about…

2 Likes

I’ve been ranting about stack overflow lately and this is exactly a scenario SO is terrible at:

“how can i scrape grafana metrics using python?”

SO neckbeards:

“you shouldnt do that, just hit the source directly”

“yea but I can’t, so how do I do it?”

question closed

1 Like

What kind of realistic latency am I looking at if I make a normal webserver (express) connect to a redis server somewhere else and make easy operations there, what’s the response hop time in ms? like less than 30? I want to make redis hold application state that gets updated and read from a lot, maybe 1-5 times per second, that makes sense to do this right?

this is a tough question - depends on where you’re hosting it probably? is it a managed instance (highly recommend, redis is notoriously fussy). Replicated? How close is it to the server that’s hitting it?

30 milliseconds could be realistic. They have r/w latency typically measured in the order of microseconds, but that’s to disk. You’re just gonna be bound by whatever your network latency is.

If it’s a managed AWS instance or something I think it’s pretty realistic. Application state? what kind of state? Redis is not super amazing at consistency (that’s why it’s so fast), so if you need strong guarantees there, idk.

I barely know how to answer your first few questions sadly.

Basically my current board game site is a couple core box on digital ocean. On it runs a redis server only used for sessions, mongo, and a node/express app (nginx). In that app all application state is held in node local memory json blobs - someone emits a websocket change (clicks on a game event, chats, etc) and that json blob gets modified and sent back to the subbed clients.

Obviously this doesn’t scale at all. After about 250 users node starts to crap out. If it crashes or restarts all state is gone.

What I want for next thing is application state not tied to express directly i.e. multiple express instances can connect to something holding state, make async updates, receive async subbed events. I thought redis hosted somewhere, but is that not what I should do here? It has to be consistently fast preferably 100ms round trips or less.

you could go into the machine and try a few writes to see how fast. I would expect milliseconds to be reasonable especially if redis is hosted on the same machine. redis can be really memory hungry. I only have managed redis instances on kubernetes which I feel like is a different animal, they were always OOMing on their memory limits and restarting which can cause you to lose writes if it’s running in distributed mode.

Why “somewhere else”? Could you not host all the frontend & state-holding instances in more or less the same place?

you can do that, of course

i’m just a backend engineer and my perspective of this stuff is some developer trying to pull 40 billion rows at once for absolutely no reason and then coming to me like “why is my app blowing up?”

it sounds like though that redis will be a huge improvement over what you have

that actually happens quite a lot on the kubernetes clusters I manage. They’re just EC2 machines running containers with a huge layer of abstraction over it. Lots of times I have to manipulate the locality of these things so that they end up on the exact same machine host, which is a huge PITA getting things to balance and scale in a sane manner. more and more over the last few years I want to run a monolith on one single giant machine. I guess this is becoming a more common sentiment.

The problem is in my mind why would I want a box that is running redis + 4 lets say express nodes on the same VPS ($)? What if I need another 4? Now I know these can’t be “serverless” as they need a persistent ws connection, but it seems to me, as not a back end engineer at all, that the right approach is simple express boxes that cap at 100 users or so then spin up more, all connected to some cloud redis server (and cloud mongo)?

Also getting tired of DO. But its been damn easy keeping everything in one place, thats for sure.

1 Like

+1 to getting off DO

AWS is the superior product for nearly everything but it is SO damn convoluted and esoteric. GCP a bit easier to work with. i dont touch azure lol

I’ve been contracted to rebuild a company’s EKS infrastructure, all AWS. Their apps aren’t super complicated - just a few DB instances, some lambda stuff, a little cross-account bs but nothing you can’t network your way out of.

basically the only requirement is they want the number of app replicas to scale with load (reasonable) and then the cluster itself to scale with load (reasonable). the issue though is these are the most unruly apps I have ever seen. They’re java apps which is fine except on startup, they will demand as much CPU as the host can give it. So first I was like, ok, I’ll just throttle them to a reasonable CPU limit and see what happens.

well turns out their health/liveness endpoint is doing so much that if you throttle them they don’t seem to be able to start up at all and just slowly pin the CPU on whatever machine they’re on if there’s too many of them, and kubernetes is too dumb to realize it needs to scale unless a pod has requested more than it has available - so I can’t really let them all request a ton of CPU, because these are 8-16 CPU machines and typically the app’s normal workload is using less than 1/4th of one vCPU, so it’d just be super wasteful and there’s too many of them anyway (like 100 instances).

I ended up having to do some sorcery just to get them all to fire up without a cascading failure across the entire cluster, but now kubernetes wants to schedule them on just 1-2 nodes out of the 5 available. This is a problem because if too many try to scale up at once it’ll fuck up the host. more sorcery done, got it a little more evenly balanced, but still kinda dogshit - now the cluster can’t scale down unused nodes because I spread the apps all over the place, which kinda violates the requirement.

more and more I think Kubernetes and the move to microservice architecture was a mistake. I’d rather just have a 64 CPU monster VM, and dump everything on to it, abstraction be damned. I’ll just deploy with bash scripts or something. so frustrating.

So to blow off steam I’m building a discord bot, it’s oddly relaxing to write terrible python code knowing no one is going to see it or care except me.

Well, then you do more work to support remote databases :slight_smile: Yeah you don’t want serverless. As for “cloud X” (for X=redis/mongo) I would just note that, given your latency concerns, “cloud X” could also just be something you’re running either on the same machine or the same datacenter and should be very fast.

unpopular opinion: microservices good, kube (very) bad

1 Like

Yea I’ve seen solutions that just have a bunch of networked EC2’s without all that crap on top of it and it works totally fine.

The thing that pissed me off about this project is this is supposed to be what Kubernetes is amazing at, the kube-scheduler - and the amount of coaxing i need to do to get it to do very basic things is kind of nutty to me. the best thing about k8s is the tooling built around it to get it to work, not k8s itself.

it is kinda nice though taking a helm chart mostly default values, subbing in a few of your own, and wham it just works. I hate working with docker.