$subject
.
As if there aren’t enough cases already, this one found a grand entrance to the club of royally screwed ups, when it comes to messing up with your disk partitions.
I supplied this command but with fio
instead of pbench-fio
. I thought the options would be similar to what fio
supports, so I ran it with--client=<IP of VM>
which it obviously didn’t support and reacted to, with a complaint. The command I then ran using ssh, was something this:
1
|
|
But I managed to mess it up and this ran on host’s /dev/sda
instead of VM’s. Albeit, I didn’t realize the blunder in that moment and later rebooted the machine. Well then obviously it wouldn’t boot up and so it went into PXE boot mode. With 0 pendrives within my reach, I PXE booted through my office’s network and thought I’d use the liveuser to recover. But that went into the #dracut
mode. I went back home and finally used a live USB to boot into the laptop. On checking for partitions through fdisk
, I couldn’t see any. Whelp!
And in the install menu’s disk selection option, it didn’t show up the partitions either..
So then I downloaded testdisk [1] and followed those step-by-step instructions. Unfortunately, mine was a case of an encrypted ext4 partition, which isn’t what that testdisk’s version supports (for listing files) out of the box. But on a ‘quick search’, testdisk did discover 2 partitions (Bootable and Primary) and on a ‘deeper search’, other parititions with labels P and D (Primary/Deleted [1]). I chose to write the quick search partition table and the exited the tool.
Later, I ran another tool PhotoRec
[2] by the same author behind testdisk
and that helped me recover 4 txt files and 3 LUKS files. I wrote the partitions it found out, to the table. It would only make sense if I could somehow decrypt and mount/scan the .LUKS files. I used an external hard drive and saved them to it while supplying the same as an option to PhotoRec.
Note that this was a hurdle I had to cross since they’re encrypted partitions. Generally speaking, if we’ve got N encrypted partitions, but the initial few sectors containing the header volume info for passphrase are overwritten, it’s often impossible to recover data (as of what I understood from conversations in forums). But this might be considered as an advantage in cases where you’d like to mess up the hard drive on purpose, since erasing the first few bytes does the job, as the data in rest of the sectors is encrypted af.
Getting back to point, I followed [3] and mapped those LUKS files to the VG pool (although unsuccessfully). Error that I spotted (under /var/log/messages
) was somewhere along the lines of vgchange buffer io error on device sda <snip snip> logical block: LUKS I/O error -5 writing to inode
.
Side note: Neither in PhotoRec nor in TestDisk, did I attempt to re-write MBR table.
On the edge of a Post Traumatic Scan Disorder from Testdisk and photoRec, I almost gave up while recalling suggestions from a few forums, which pointed to the fact that it’s impossible to recover encrypted partitions if the header volume info wasn’t backed up. It all seemed gloomy and dark and I slept thereafter while putting another photoRec scan to work in the background.
A couple hours later, I tried to wake up the laptop screen but it wouldn’t (Fedora 24 KDE does that sometimes it seems). Trudging back to the idea of starting over, I hard rebooted and decided to install the OS, from scratch. But to my surprise, while booting up with the live USB and inside the install menu’s disk partitioning scheme option I'll configure partition myself
, I saw an “Unknown” partitioning dropdown. Now now, this didn’t show up earlier and so then I went ahead and removed all but the biggest one (which I knew was the home partition). I supplied the passphrase for older encryption, specified a mount point /home
, made other parititions (all except biosboot [5]) and Voila! all I had to do later was to reinstall some rpm packages.
So to speak, I’m not sure what gave or how it worked, but it did! Tag-lining this as a “comedy of errors and a tragedy to remember”.
PS: On recovery, I’ve lost some disk space ( ~ 80 MiB) fragmented and squeezed in between those partitions. Also, my RAM shows up as 15.4 GiB instead of 16 GiB. Or maybe that’s just how accurate a System Monitor in KDE is?!
Before running a scale test, make sure to have the following items checked off your list:
An example of this, from a distributed systems application framework could be:
Size of data transferred over network for a single workload (and for all workloads/functionalities associated with that application)
List of parameters not to be changed? CPU/RAM/Memory/Disk/partitions/Network bandwidth/OS/Installed Packages/Tuning parameters/….
I have been reading Brendon Gregg’s “Systems Performance: Enterprise and the Cloud” and am inclined to finish this soon. All of the points above, have probably been already covered in detail, in that book.
]]>I couldn’t afford to loose the data (and my shit). So I followed these steps to safely increase the size of that partition based on xfs, which existed on a logical volume already.
Self help commands and assumptions:
1
|
|
Delete and resize partition. Let’s say we’re working with Partition 1 here:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Now, reboot system.
Run partprobe
to be sure all went well.
Then, run these set of commands:
1 2 3 4 5 |
|
Refs:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
An explanation of how the following metrics are placed in the satperf monitoring dashboard, is described in the ‘Grafana Dashboard’ section. If you wanna copy these metrics, refer to this file [2] in references
The Grafana dashboard is divided into rows and each row has either a single or multiple panels. Each of those panels query specific metrics. Here’s how those metrics in the template [1] are grouped:
The rows are named as following:
A sample “CPU ALL” row with all panels:
On clicking edit in one of the panels in that row, we get the metric query frame, like this:
The templating $Cloud
, $Node
etc.. are included at the top of the dashboard, like this:
So in the jinja template [1], we have multiple “title”(s) under “rows” section and variables from [2] are replaced inside the template as illustrated in following piece of that jinja template:
1
2
3
4
5
6
7
8
9
{# Loop over per-process options here #}
{% for metrics in per_process_metrics %}
{
“collapse”: true,
“editable”: true,
“height”: “200px”,
“panels”: [
{% for panel in per_process_panels[item.process_list_name] %}
Additionally, "title": "CPU All"
in the dashboard template [1] refers to one row of the dashboard.
Note: This post is a spin off from the main satperf project.
Satperf (link) is a Red Hat Satellite Performance Benchmarking & Automation tool that makes it super easy to quickly setup your own environment with satellite components and start running workloads. It uses ansible playbooks to manage remote execution and has built-in modules for various roles that Satellite Web UI has to offer. These are carried out with the use of hammer commands (read more on Hammer) as an equivalent of Satellite API.
Briefly, it does the following activities:
Satperf runs modules through ansible playbooks, briefly described below under ‘Ansible Playbooks in Satperf` section.
This is to get the satellite installation + monitoring tools up and running.
Although there’s a README included with the project, let’s briefly go through steps needed and get an install up and running.
It isn’t packaged right now, so first, clone this satperf repo.
From project root, run: $ source ./setup
, follow the output and act accordingly (install packages / activate virtualenv).
Ensure that satperf help command succeeds: $ ./satperf.py -h
The above step should get the setup ready for configurations. Now, configure following files in the manner illustrated:
1 2 |
|
1 2 3 4 5 6 7 8 |
|
conf/satperf.conf
: Add configurations specific to satperf.1 2 3 4 5 6 7 8 9 10 11 |
|
For now, leave the sections [RHN], [Satellite], [Pbench]
untouched.
In this post, we’re only configuring satellite monitering module of satperf.
If you already have the satellite setup running, skip below to ‘MONITORING INSTALLATION’.
Run $ ./satperf.py -s
You’d be asked the following, press [Enter] or type ‘n’ to skip, if the component’s already installed, or you don’t want it.
1 2 3 4 5 |
|
Ignore any other components shown in menu, if so.
Although we’ve tested this internally and it works, but since this is currently WIP, take a look at logs if something fails, or feel free to open an issue upstream.
Run $ ./satperf.py -m
You’d be asked the following, press [Enter] or type ‘n’ to skip, if the component’s already installed, or you don’t want it.
1 2 3 4 5 6 7 8 |
|
Although we’ve tested this internally and it works, but since this is currently WIP, take a look at logs if something fails, or feel free to open an issue upstream.
Build the dashboards viewable, through Grafana, by running $ ./satperf.py -i
WIP status: facing an error in this step, similar to this one
2 major categories:
Satellite
Monitoring
For a description of metrics we collect about Satellite installation, refer to this post
Installs collectd on either/all of: satellite / capsule / docker-hosts
Installs Graphite time-series database for metric storage of collected metrics on the configured graphite host
Installs Grafana time-series metric visualization framework, on the configured grafana host
We plan to integrate this github project - sat6_healthCheck
From Satellite’s latest release:
Red Hat Satellite 6.2 is now generally available, and includes the following major enhancements:
PXE-less discovery for existing systems
Installing collectd could be trivial, although setting up monitoring for continuous time-series metric collection should be simpler. This post is aimed at helping sysadmins setup collectd and connect it to a graphite instance, so that all those metrics could later be viewed from Grafana instance.
Note:
This post is a spin off from the main satperf project. To take a look at how satperf works, refer to this post
Install collectd for your system, Install Graphite server elsewhere (recommended: separate machine).
When that’s installed, take a look at your /etc/collectd.conf
and add plugins from the list below, as suitable
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Note:
If you’re installing this on Red Hat Satellite, you might wanna make additional changes as per this template in satperf
For others, you might wanna take a look at the above mentioned link anyway, for it serves as a generic reference for /etc/collectd.conf
Replace variable names in above referenced satperf’s collectd.conf
as per following defaults / or change them as suitable:
1) 10 refers to 10 seconds
2) end results on graphite-web UI show under Metrics:
1
|
|
3) For candlepin password:
1
|
|
4) For Satellite Foreman password:
1
|
|
Once this is done, reload collectd server and check logs on graphite server to make sure you’re able to receive data
iptables -F
if unable to send collectd metricsRakefile
of octopress source code).
So here’s a really short post on how to set it up.
Make an account on this github website
..and then, make a new repository on github, called <username>.github.io
Reference: I have a repo called: http://github.com/arcolife/arcolife.github.io
Make sure you have git and ruby installed in your command line environment.
I had the following versions:
ruby 2.2.5p319 (2016-04-26 revision 54774) [x86_64-linux]
Although lower versions (such as 1.9) might work too.
git version 2.5.5
1 2 |
|
bundle install && rake install
, then IIRC, you’d get a failure.
If not, then the almighty Gods are with you on this lad ..else,
make a change to group:development
in Gemfile
by adding the following:1
|
|
This is because when I tried to install the requirements, I got an error and a recommendation to add the above. Refer to this file under my repo, if in doubt.
Be advised that this micro-step may not be necessary in future releases or in your environment.
bundle install && rake install
Run this:
1
|
|
Follow the instructions and supply input as asked.
This would clone and download your repo to a folder called source
and change your
branch from master
to source
. You should check this with git remove -v
.
I got something like this:
1 2 3 4 |
|
and on running git branch
it should display * source
.
If you then do a git status
you’d get something like this:
1 2 |
|
..which is fine. Let’s move on to next step.
Side note:
However, if in case, this fails to change your branch to source
,
i.e, you don’t have your username under urls of origin
in your output of
git remote -v
..then you’d have to do this step manually like this:
1 2 3 |
|
This should now display * source
.
1
|
|
This should start a server on http://localhost:4000
and you should be able to
view a default colorful octopress page.
Mine is with default settings so I haven’t tried this out.
1 2 3 4 5 6 7 |
|
This might take upto 5 mins sometimes or may be a matter of few seconds depending on how the update job runs and the load on github servers. Refresh and check in intervals.
1
|
|
This would generate a new post and display a message like this:
1 2 |
|
Edit that file and write up a post using markdown format.
Repeast steps in Step 6
to deploy.
Some of these tips are well covered under Octopress' blogging basics and under octopress' github README config section but just pointing them out for quickstart references:
Open _config.yml
and edit the following:
1 2 3 |
|
repeat steps from Step 6
to see changes live on your username.github.io
Edit _config.yml
and edit the following (change your usernames accordingly):
1 2 |
|
1
|
|
Adding asides
section on adding twitter feed)1 2 3 4 |
|
1
|
|
If you wanna insert images, add them under source/images
and you could either:
Refer to their path either like this:
![Sample image caption](https://raw.githubusercontent.com/username/username.github.io/master/images/sample.png)
Replace username with your github username, and images/sample.png
with whatever path your images are in,
under source/
Or give a relative path like this:
![Image caption](../images/sample.png)
The second option is hackish & works based on your directory structure. There are other ways of doing this as well
asides
(panel contents)Edit _config.yml
as follows:
1 2 |
|
You could add your custom asides / make changes to layout from files under source/_includes
folder.
For example, to add your twitter feed, you’d wanna take a look at this blog post which tells you how to add that and why it was removed as a default.
This is to track location/count of visitors to your website. Open _config.yml
and edit the following:
1
|
|
You’d need to have an account on this google analytics website to get an ID. There’s a limit on number of free IDs allotted to your google account. Check out the T&Cs on their website.
If you wanna enable Disqus
based comments, you need to have an account on this disqus admin website and add the url <username>.github.io
under trusted domains on https://<disqus admin username>.disqus.com/admin/settings/advanced/
and you need to configure your community
name (which is nothing but a setting for your site) under https://<disqus admin username>.disqus.com/admin/settings/general/
If you’d like to link up a static content (fully supporting it’s own html/css layout),
just add them under source/<new_dir>/
and you should be able to access that
under username.github.io/<new_dir>
Then, go ahead and edit _config.yml
and edit the following:
1 2 |
|
When you’re writing a blog post, change true/false or add categories
to your _posts/
1 2 3 |
|
A push to github with published: false
means the markdown has been safely
committed to source branch on your repo, but the post is not publicly visible.
This way you could keep working on the file and change to true when it’s ready.
All that said, I still haven’t added an ‘about’ section to my post,
or moved it under a custom blog/
folder successfully. Or changed the
default page to something else or made changes to default theme.
If you’ve done that, please add a comment here on how you did it
or a link to it if you’ve blogged about it. Would save me some time eh! :)
Thats it for now folks!
Share it across, post a comment if you’d like to add something to this post, or want any edits made or found a flaw in the flow or would simply like to let me know if this helped.
Cheers!
]]>Sarjitsu, or, SAR Jitsu, is a one stop shop for people who’re looking to visualize their system data, based on System Activity Report (SAR) data generated by Sysstat. Screenshots from the app are near the end of this post.
It has been open sourced under Distributed System Analysis efforts. Take a look!
It includes throwing SA binary files at an instance of sarjitsu’s web app and extracting the following valuable information out of it:
Sometimes issues can’t just be detected out of live monitoring and we might need to use historical data. For live monitoring, already tools like collectd exist. If you’ve got your data stored in form of SA binary file, Sarjitsu can visualize it.
Sarjitsu would enable you to:
Debug issues with your machine by presenting time series data collected on a minute’s granularity level. This is entirely configurable through the SAR setup that your system has.
Figure out where the bottleneck lies by presenting all the information under a common y-axis and correlate results
Visualize pbench results (additional feature). It comes with a command line tool which you could point to results dir of a pbench run and it will find all the sar.data
files and visualize them.
It is built in a manner so that it could scale up real fast, because of it’s ease of use and also because it is based on Docker containers, around which a whole ecosystem exists on scaling up. If you’re willing to use this in production, send us a holler and we’ll get back to you. (Refer to the section at the end.)
The project is divided into 5 components:
The project development status is still in alpha mode, since there have been a lot of drastic changes over the past year. In fact, I’m still waiting for a few to be included in core of grafana’s code (PR #4694 that I’ve submitted). In fact, earlier last year, I had released an elasticsearch-graphite shim of sorts, that made it possible for data to be sourced from ES into Grafana. Later on though, the native support was included and I had to move on. (Yeah it was difficult to let go of the python-django project!).
The usage of nested documents within grafana, sourced from elasticsearch, depends on presence of this feature (PR #4694). Of course though, if someone is willing to make a Fedora Copr based custom packaging out of this project, I’d be really really glad and might even invite you to my annual Great Gatsby themed party.
Once again, checkout the project at DSA and please contribute! :)
Service Discovery for individual components.
Of course it’s only as good as you make it.
So more on this later. Feel free to open a feature request on github and maybe even send a PR!
I’ve opened a list of issues and labelled them accordingly on this Github issues list for Sarjitsu. Feel free to start a discussion under a github issue’s comments section, or submit a Pull Request for it. Cheers and happy debugging!
We hangout on #pbench under FreeNode on IRC. Feel free to ping me there or tweet about it (I’m @arcolife).
If you think there is a significant improvement possible to the architecture of this project, or a potential collaboration opportunity with one of your own projects, contact us right now and we could take this further. Refer to this page on more info about contacting me or say hello on #pbench under FreeNode on IRC.
With Love,
- Team DSA
]]>My intro page has now been moved to /intro
I’ve noted down a list of posts I’ve been waiting to blog about, for a long time now.. Finally I’ve got something better than free version of wordpress to try this out. My older blog was Arcolife’s Digital Sweat and is still active. This one’s a pure work related blog, or atleast, I plan to keep it so, for now.
]]>