Compare commits
15 Commits
e7fd10d3a5
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 601b70a35d | |||
| 5e2acc054d | |||
| 6c87737393 | |||
| 46268af42d | |||
| 799cf76b44 | |||
| 939cc4c5ee | |||
| 3c28f1aa76 | |||
| 09fcf0fe6d | |||
| 90a0f6514d | |||
| 372376b445 | |||
| 337fd19dc7 | |||
| 5793523215 | |||
| 3474918972 | |||
| 59997a8831 | |||
| 67ed4869ea |
86
ElasticSearch-Common-Commands.md
Normal file
86
ElasticSearch-Common-Commands.md
Normal file
@@ -0,0 +1,86 @@
|
|||||||
|
# ElasticSearch Commands to get you into using it.
|
||||||
|
|
||||||
|
Personally, I run Elasticsearch, Kibana, Metricbeat, and Filebeat in a single docker-compose stack managed with Portainer. Kibana is useful for viewing data, although I don't like that it doesn't let you edit data. I connect TubeArchivist using the `elastic` password generated by the compose stack. I also want to use Elasticsearch for other purposes and avoid running a separate instance.
|
||||||
|
|
||||||
|
From kibana i just created an api key for ta_channels to update data within them. Here's a curl command to generate an api key without kibana below.
|
||||||
|
|
||||||
|
## Create API key scoped to specific indices (HTTP)
|
||||||
|
```
|
||||||
|
curl -s -u 'elastic:ELASTIC_PASS' \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-X POST 'http://localhost:9200/_security/api_key' \
|
||||||
|
-d '{
|
||||||
|
"name": "ta_scoped_key",
|
||||||
|
"expiration": "30d",
|
||||||
|
"role_descriptors": {
|
||||||
|
"ta_scoped_role": {
|
||||||
|
"cluster": ["monitor"],
|
||||||
|
"index": [
|
||||||
|
{ "names": ["ta_channels_*"], "privileges": ["read","write"] },
|
||||||
|
{ "names": ["ta_metadata"], "privileges": ["read","write","create_index"] }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### HTTPS (with CA)
|
||||||
|
```
|
||||||
|
curl -s --cacert /path/to/chain.pem -u 'elastic:ELASTIC_PASS' \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-X POST 'https://localhost:9200/_security/api_key' \
|
||||||
|
-d '{"name":"ta_scoped_key","expiration":"30d","role_descriptors":{"ta_scoped_role":{"cluster":["monitor"],"index":[{"names":["ta_channels_*"],"privileges":["read","write"]}]}}}'
|
||||||
|
```
|
||||||
|
|
||||||
|
Save the JSON response (it contains id and api_key), then build the ApiKey header:
|
||||||
|
|
||||||
|
Looks like this:
|
||||||
|
```
|
||||||
|
{"id":"F0eWBJ0BLX_vEATxQJuu","name":"ta_scoped_key","expiration":1763932732593,"api_key":"39RandomLettersandNumbers","encoded":"60RandomNumbersandLettsasldkfjwithA=="}
|
||||||
|
```
|
||||||
|
Use the 'encoded' key and not the 'api_key'. Not sure why, but that's what I had to use to work.
|
||||||
|
|
||||||
|
## Test using the API key
|
||||||
|
curl -s -H "Authorization: ApiKey $AUTH" http://localhost:9200/_security/_authenticate
|
||||||
|
|
||||||
|
|
||||||
|
## Creating Another User
|
||||||
|
```
|
||||||
|
curl -u 'elastic:Yourhardrandompassword' \
|
||||||
|
-X POST "http://localhost:9200/_security/user/sickprodigy" \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-d '{"password":"PasswordforUser","roles":["my_readonly_role"],"full_name":"Sick Prodigy","email":"sick@sickgaming.net"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Creating another user with full Privs (SuperUser)
|
||||||
|
I prefer to have a user with full privs other than elastic, although TubeArchivist apparently uses elastic(default superuser)
|
||||||
|
```
|
||||||
|
curl -u 'elastic:Yourhardrandompassword' -X POST "http://localhost:9200/_security/user/sickprodigy" \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-d '{
|
||||||
|
"password": "SomeHardPassword",
|
||||||
|
"roles": ["superuser"],
|
||||||
|
"full_name": "SickProdigy",
|
||||||
|
"email": "sickprodigy@sickgaming.net"
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Query certain channel within ta_channel:
|
||||||
|
The channel ID can be found on TubeArchivist, got to channel and in URL "https://tubearchivist.rcs1.top/channel/UChOve2dsTRMrW8DslLKJ9eg" after channel/ is channel ID. You can test around with query and see what comes back, but this usually bring back the exact channel you want.
|
||||||
|
```
|
||||||
|
curl -X POST "http://es:9200/ta_channel/_search?pretty" \
|
||||||
|
-H "Authorization: ApiKey "YourRandomAPIkey123455123123=="" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d'
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"query_string": {
|
||||||
|
"query": "Channel ID"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
You should get back an example like this: [ Example-channel-info-elasticsearch.json](Example-channel-info-elasticsearch.json)
|
||||||
|
|
||||||
|
I'm using this as an example to update channels in TubeArchivist that are missing data. I wish kibana would let me do it. Maybe I can Just haven't figured it out yet.
|
||||||
64
Example-channel-info-elasticsearch.json
Normal file
64
Example-channel-info-elasticsearch.json
Normal file
@@ -0,0 +1,64 @@
|
|||||||
|
{
|
||||||
|
"_index" : "ta_channel",
|
||||||
|
"_id" : "UC1ORA3oNGYuQ8yQHrC7MzBg",
|
||||||
|
"_score" : 6.799427,
|
||||||
|
"_source" : {
|
||||||
|
"channel_name" : "syncbricks",
|
||||||
|
"channel_thumb_url" : "https://yt3.googleusercontent.com/ytc/AIdro_k5ySAgLYNTLWCkt7ZNBCpQLgEdh8nd2HbVf93PIYm8qQ=s900-c-k-c0x00ffffff-no-rj",
|
||||||
|
"channel_active" : true,
|
||||||
|
"channel_description" : "syncbricks offers training, reviews, tutorials and guides on core IT, Application development, Application Maintenance, Data Centre , Cloud Services, Automation, IoT, Cyber security, Infrastructure Management, Business Intelligence, Business Process Management, Project Management, IT consulting, Open source, brands, products, servers, hosting, gadgets, service providers and much more.\nWe are focused on innovation and integration. \nOn timely basis we will be uploading Training Video, Tutorials, Speeches, Tech and Business Reviews, Product Reviews, Travel and More.",
|
||||||
|
"channel_subscribed" : false,
|
||||||
|
"channel_subs" : 35600,
|
||||||
|
"channel_views" : 0,
|
||||||
|
"channel_tabs" : [
|
||||||
|
"videos",
|
||||||
|
"streams",
|
||||||
|
"shorts"
|
||||||
|
],
|
||||||
|
"channel_last_refresh" : 1746300537,
|
||||||
|
"channel_banner_url" : "https://yt3.googleusercontent.com/oOzViYjjHZb1XKBhEhAkp36062e2NUJiS37InRd6JBhmXeba-n8QXl2YZSMB9HdUvT8_5_59=w2560-fcrop64=1,00005a57ffffa5a8-k-c0xffffffff-no-nd-rj",
|
||||||
|
"channel_tvart_url" : "https://yt3.googleusercontent.com/oOzViYjjHZb1XKBhEhAkp36062e2NUJiS37InRd6JBhmXeba-n8QXl2YZSMB9HdUvT8_5_59=s0",
|
||||||
|
"channel_id" : "UC1ORA3oNGYuQ8yQHrC7MzBg",
|
||||||
|
"channel_tags" : [
|
||||||
|
"tech tutorials",
|
||||||
|
"self-hosted automation",
|
||||||
|
"n8n",
|
||||||
|
"docker",
|
||||||
|
"proxmox",
|
||||||
|
"home lab",
|
||||||
|
"network security",
|
||||||
|
"open source tools",
|
||||||
|
"IT infrastructure",
|
||||||
|
"AI automation",
|
||||||
|
"sysadmin tips",
|
||||||
|
"Linux",
|
||||||
|
"server virtualization",
|
||||||
|
"pfSense",
|
||||||
|
"OPNSense",
|
||||||
|
"SyncBricks",
|
||||||
|
"automation tools",
|
||||||
|
"Proxmox VE",
|
||||||
|
"tech setup",
|
||||||
|
"n8n",
|
||||||
|
"firewall",
|
||||||
|
"tech tutorials",
|
||||||
|
"n8n",
|
||||||
|
"docker",
|
||||||
|
"home lab",
|
||||||
|
"IT infrastructure",
|
||||||
|
"AI automation",
|
||||||
|
"server virtualization",
|
||||||
|
"pfSense",
|
||||||
|
"SyncBricks",
|
||||||
|
"Amjid Ali",
|
||||||
|
"Proxmox VE",
|
||||||
|
"docker compose",
|
||||||
|
"n8n",
|
||||||
|
"tutorial",
|
||||||
|
"linux",
|
||||||
|
"open source",
|
||||||
|
"ngw",
|
||||||
|
"firewall"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -1,8 +1,14 @@
|
|||||||
{
|
{
|
||||||
"id": "",
|
"id": "Scripted in Video ID",
|
||||||
"channel_id": "Change To Channel ID/username",
|
"channel_id": "Change to Youtube Username",
|
||||||
"uploader": "Change To Channel ID/username",
|
"uploader": "Change to Youtube Username",
|
||||||
"title": "Example",
|
"uploader_id": "Change To Channel ID",
|
||||||
"upload_date": "",
|
"uploader_url": "https://www.youtube.com/channel/ChangeToChannelID-or-username",
|
||||||
"thumbnail": null
|
"title": "Scripted in Title",
|
||||||
|
"description": null,
|
||||||
|
"upload_date": "ToBeScriptedInYYYYMMDD",
|
||||||
|
"categories": null,
|
||||||
|
"tags": null,
|
||||||
|
"thumbnail": null,
|
||||||
|
"view_count": null
|
||||||
}
|
}
|
||||||
|
|||||||
49
README.md
49
README.md
@@ -8,39 +8,45 @@ Small collection of Bash helpers used to prepare offline / archived YouTube vide
|
|||||||
Normalize filenames and create accompanying metadata (.info.json) so TubeArchivist can ingest local archives (especially those from archive.org or other offline sources).
|
Normalize filenames and create accompanying metadata (.info.json) so TubeArchivist can ingest local archives (especially those from archive.org or other offline sources).
|
||||||
|
|
||||||
Example input filename:
|
Example input filename:
|
||||||
`20170311 (5XtCZ1Fa9ag) Terry A Davis Live Stream.mp4`
|
- Example A: `20170311 (5XtCZ1Fa9ag) Terry A Davis Live Stream.mp4`
|
||||||
|
- Example B: `20131003 - 001 - 1okW1RTPZ7Q - TempleOS Hymns #1.mp4`
|
||||||
|
|
||||||
Resulting filename and sidecar JSON:
|
Resulting filename and sidecar JSON:
|
||||||
- `20170311 Terry A Davis Live Stream [5XtCZ1Fa9ag].mp4`
|
- Example A:
|
||||||
- `20170311 Terry A Davis Live Stream [5XtCZ1Fa9ag].info.json`
|
- `20170311 Terry A Davis Live Stream [5XtCZ1Fa9ag].mp4`
|
||||||
|
- `20170311 Terry A Davis Live Stream [5XtCZ1Fa9ag].info.json`
|
||||||
|
- Example B:
|
||||||
|
- `20131003 - 001 - TempleOS Hymns #1 [1okW1RTPZ7Q].mp4`
|
||||||
|
- `20131003 - 001 - TempleOS Hymns #1 [1okW1RTPZ7Q].info.json`
|
||||||
---
|
---
|
||||||
|
|
||||||
## How it works / Usage
|
## How it works / Usage
|
||||||
1. Put all the scripts in the directory with your video files (scripts currently do not recurse into subdirectories).
|
1. Put all the scripts in the directory with your video files (scripts currently do not recurse into subdirectories).
|
||||||
2. Run them in order from the directory containing your media:
|
2. Edit 'Example.info.json'
|
||||||
|
- Update these lines (And also any other lines you want copied to each video that won't be scripted in. Null values I didn't have data for yet.)
|
||||||
```sh
|
```
|
||||||
bash convert-()-to-[].bash
|
"channel_id": "Change to Youtube Username",
|
||||||
bash move-[id]-to-end.bash
|
"uploader": "Change to Youtube Username",
|
||||||
bash create-json-alongside.bash
|
"uploader_id": "Change To Channel ID",
|
||||||
bash insert-id-into-json.bash
|
"uploader_url": "https://www.youtube.com/channel/ChangeToChannelID-or-username",
|
||||||
bash insert-title-into-json.bash
|
|
||||||
bash insert-date-into-json.bash
|
|
||||||
```
|
```
|
||||||
|
3. Run the scripts in order from the directory containing your media below:
|
||||||
|
|
||||||
Each script performs a single transformation so you can inspect results between steps.
|
Each script performs a single transformation so you can inspect results between steps.
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Scripts (order and purpose)
|
## Scripts (order and purpose)
|
||||||
1. `convert-()-to-[].bash`
|
1a. `convert-()-to-[].bash`
|
||||||
- Replace parentheses containing an ID with square brackets (e.g. `(ID)` -> `[ID]`) and clean spacing.
|
- Replace parentheses containing an ID with square brackets (e.g. `(ID)` -> `[ID]`) and clean spacing.
|
||||||
|
- If already have id at end skip to 3.
|
||||||
|
|
||||||
2. `move-[id]-to-end.bash`
|
1b. `move-find-id-to-end-filename.bash`
|
||||||
|
- Split filename into parts. Find video id between second and third " - " without brackets, adds backets, moves [id] to end of filename before extension.
|
||||||
|
- Skip 1a/2a, straight to 3.
|
||||||
|
|
||||||
|
2a. `move-[id]-to-end-filename.bash`
|
||||||
- Ensure the video ID appears at the end of the filename inside square brackets.
|
- Ensure the video ID appears at the end of the filename inside square brackets.
|
||||||
|
|
||||||
3. `create-json-alongside.bash`
|
3. `create-json-alongside-each-file.bash`
|
||||||
- Create an empty `.info.json` file for each video filename (sidecar).
|
- Create an empty `.info.json` file for each video filename (sidecar).
|
||||||
|
|
||||||
4. `insert-id-into-json.bash`
|
4. `insert-id-into-json.bash`
|
||||||
@@ -50,7 +56,7 @@ Each script performs a single transformation so you can inspect results between
|
|||||||
- Insert the cleaned title into the sidecar JSON.
|
- Insert the cleaned title into the sidecar JSON.
|
||||||
|
|
||||||
6. `insert-date-into-json.bash`
|
6. `insert-date-into-json.bash`
|
||||||
- Insert the date (if available) into the sidecar JSON.
|
- Insert the date from filename (if available) into the sidecar JSON.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -59,6 +65,7 @@ Each script performs a single transformation so you can inspect results between
|
|||||||
- Always test on a copy or run a subset first to confirm behavior.
|
- Always test on a copy or run a subset first to confirm behavior.
|
||||||
- If filenames contain unusual characters, run a quick grep for non-ASCII prior to processing.
|
- If filenames contain unusual characters, run a quick grep for non-ASCII prior to processing.
|
||||||
- Modify scripts to add dry-run mode if you want safer previews.
|
- Modify scripts to add dry-run mode if you want safer previews.
|
||||||
|
- ElasticSearch Common Commands for updates: [ElasticSearch Common Commands](ElasticSearch-Common-Commands.md)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -66,8 +73,4 @@ Each script performs a single transformation so you can inspect results between
|
|||||||
Archive used for testing:
|
Archive used for testing:
|
||||||
`https://archive.org/details/TempleOS-TheMissingVideos`
|
`https://archive.org/details/TempleOS-TheMissingVideos`
|
||||||
|
|
||||||
Processed example (after running full pipeline):
|
|
||||||
`20170311 Terry A Davis Live Stream [5XtCZ1Fa9ag].mp4`
|
|
||||||
`20170311 Terry A Davis Live Stream [5XtCZ1Fa9ag].info.json`
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|||||||
26
move-find-id-to-end-filename.bash
Normal file
26
move-find-id-to-end-filename.bash
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
shopt -s nullglob
|
||||||
|
|
||||||
|
for f in *.mp4 *.mkv *.mov *.avi; do
|
||||||
|
[ -f "$f" ] || continue
|
||||||
|
|
||||||
|
base="${f%.*}"
|
||||||
|
ext="${f##*.}"
|
||||||
|
|
||||||
|
# Match: date - number - id - rest
|
||||||
|
# Example: 20140720 - 097 - oaHzqMgnI70 - TempleOS - God for Larry Page 7_20 K
|
||||||
|
if [[ "$base" =~ ^([^[:space:]]+)[[:space:]]*-[[:space:]]*([^[:space:]]+)[[:space:]]*-[[:space:]]*([A-Za-z0-9_-]+)[[:space:]]*-[[:space:]]*(.*)$ ]]; then
|
||||||
|
datepart="${BASH_REMATCH[1]}"
|
||||||
|
numpart="${BASH_REMATCH[2]}"
|
||||||
|
id="${BASH_REMATCH[3]}"
|
||||||
|
rest="${BASH_REMATCH[4]}"
|
||||||
|
|
||||||
|
newname="${datepart} - ${numpart} - ${rest} [${id}].${ext}"
|
||||||
|
|
||||||
|
echo "Would rename: $f → $newname"
|
||||||
|
# Comment next line to test without renaming
|
||||||
|
mv -i -- "$f" "$newname"
|
||||||
|
else
|
||||||
|
echo "Skipping (pattern mismatch): $f"
|
||||||
|
fi
|
||||||
|
done
|
||||||
Reference in New Issue
Block a user